[HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll
Hi,

I am currently trying to understand what looks like really bad scalability of
9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only
marginal amounts of additional load seem to push it to 70% and the application
becomes highly unresponsive.

My current understanding basically matches the issues being addressed by various
9.2 improvements, well summarized in
http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf

An additional aspect is that, in order to address the latent risk of data loss 
corruption with WBCs and async replication, we have deliberately moved the db
from a similar system with WB cached storage to ssd based storage without a WBC,
which, by design, has (in the best WBC case) approx. 100x higher latencies, but
much higher sustained throughput.


On the new system, even with 30% user acceptable load, oprofile makes apparent
significant lock contention:

opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres


Profiling through timer interrupt
samples  %image name   symbol name
3024027.9720  postgres s_lock
5069  4.6888  postgres GetSnapshotData
3743  3.4623  postgres AllocSetAlloc
3167  2.9295  libc-2.12.so strcoll_l
2662  2.4624  postgres SearchCatCache
2495  2.3079  postgres hash_search_with_hash_value
2143  1.9823  postgres nocachegetattr
1860  1.7205  postgres LWLockAcquire
1642  1.5189  postgres base_yyparse
1604  1.4837  libc-2.12.so __strcmp_sse42
1543  1.4273  libc-2.12.so __strlen_sse42
1156  1.0693  libc-2.12.so memcpy

Unfortunately I don't have profiling data for the high-load / contention
condition yet, but I fear the picture will be worse and pointing in the same
direction.

pure speculation
In particular, the _impression_ is that lock contention could also be related to
I/O latencies making me fear that cases could exist where spin locks are being
helt while blocking on IO.
/pure speculation


Looking at the code, it appears to me that the roll-your-own s_lock code cannot
handle a couple of cases, for instance it will also spin when the lock holder is
not running at all or blocking on IO (which could even be implicit, e.g. for a
page flush). These issues have long been addressed by adaptive mutexes and 
futexes.

Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
having spun for long (not not blocked), spin even longer in future), which
appears to me to have the potential of becoming highly counter-productive.


Now that the scene is set, here's the simple question: Why all this? Why not
simply use posix mutexes which, on modern platforms, will map to efficient
implementations like adaptive mutexes or futexes?

Thanks, Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Merlin Moncure
On Tue, Jun 26, 2012 at 12:02 PM, Nils Goroll sl...@schokola.de wrote:
 Hi,

 I am currently trying to understand what looks like really bad scalability of
 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but 
 only
 marginal amounts of additional load seem to push it to 70% and the application
 becomes highly unresponsive.

 My current understanding basically matches the issues being addressed by 
 various
 9.2 improvements, well summarized in
 http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf

 An additional aspect is that, in order to address the latent risk of data 
 loss 
 corruption with WBCs and async replication, we have deliberately moved the db
 from a similar system with WB cached storage to ssd based storage without a 
 WBC,
 which, by design, has (in the best WBC case) approx. 100x higher latencies, 
 but
 much higher sustained throughput.


 On the new system, even with 30% user acceptable load, oprofile makes 
 apparent
 significant lock contention:

 opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres


 Profiling through timer interrupt
 samples  %        image name               symbol name
 30240    27.9720  postgres                 s_lock
 5069      4.6888  postgres                 GetSnapshotData
 3743      3.4623  postgres                 AllocSetAlloc
 3167      2.9295  libc-2.12.so             strcoll_l
 2662      2.4624  postgres                 SearchCatCache
 2495      2.3079  postgres                 hash_search_with_hash_value
 2143      1.9823  postgres                 nocachegetattr
 1860      1.7205  postgres                 LWLockAcquire
 1642      1.5189  postgres                 base_yyparse
 1604      1.4837  libc-2.12.so             __strcmp_sse42
 1543      1.4273  libc-2.12.so             __strlen_sse42
 1156      1.0693  libc-2.12.so             memcpy

 Unfortunately I don't have profiling data for the high-load / contention
 condition yet, but I fear the picture will be worse and pointing in the same
 direction.

 pure speculation
 In particular, the _impression_ is that lock contention could also be related 
 to
 I/O latencies making me fear that cases could exist where spin locks are being
 helt while blocking on IO.
 /pure speculation


 Looking at the code, it appears to me that the roll-your-own s_lock code 
 cannot
 handle a couple of cases, for instance it will also spin when the lock holder 
 is
 not running at all or blocking on IO (which could even be implicit, e.g. for a
 page flush). These issues have long been addressed by adaptive mutexes and 
 futexes.

 Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
 having spun for long (not not blocked), spin even longer in future), which
 appears to me to have the potential of becoming highly counter-productive.


 Now that the scene is set, here's the simple question: Why all this? Why not
 simply use posix mutexes which, on modern platforms, will map to efficient
 implementations like adaptive mutexes or futexes?

Well, that would introduce a backend dependency on pthreads, which is
unpleasant.  Also you'd need to feature test via
_POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
processes (and configure your mutexes as such when you do).  There are
probably other reasons why this can't be done, but I personally don' t
klnow of any.

Also, it's forbidden to do things like invoke i/o in the backend while
holding only a spinlock. As to your larger point, it's an interesting
assertion -- some data to back it up would help.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane
Nils Goroll sl...@schokola.de writes:
 Now that the scene is set, here's the simple question: Why all this? Why not
 simply use posix mutexes which, on modern platforms, will map to efficient
 implementations like adaptive mutexes or futexes?

(1) They do not exist everywhere.
(2) There is absolutely no evidence to suggest that they'd make things better.

If someone cared to rectify (2), we could consider how to use them as an
alternative implementation.  But if you start with let's not support
any platforms that don't have this feature, you're going to get a cold
reception.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll
Hi Merlin,

 _POSIX_THREAD_PROCESS_SHARED

sure.

 Also, it's forbidden to do things like invoke i/o in the backend while
 holding only a spinlock. As to your larger point, it's an interesting
 assertion -- some data to back it up would help.

Let's see if I can get any. ATM I've only got indications, but no proof.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll

 But if you start with let's not support any platforms that don't have this 
 feature

This will never be my intention.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Martijn van Oosterhout
On Tue, Jun 26, 2012 at 01:46:06PM -0500, Merlin Moncure wrote:
 Well, that would introduce a backend dependency on pthreads, which is
 unpleasant.  Also you'd need to feature test via
 _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
 processes (and configure your mutexes as such when you do).  There are
 probably other reasons why this can't be done, but I personally don' t
 klnow of any.

And then you have fabulous things like:

https://git.reviewboard.kde.org/r/102145/
(OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
it.)

Seems not very well tested in any case.

It might be worthwhile testing futexes on Linux though, they are
specifically supported on any kind of shared memory (shm/mmap/fork/etc)
and quite well tested.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 And then you have fabulous things like:
 https://git.reviewboard.kde.org/r/102145/
 (OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
 it.)

 Seems not very well tested in any case.

 It might be worthwhile testing futexes on Linux though, they are
 specifically supported on any kind of shared memory (shm/mmap/fork/etc)
 and quite well tested.

Yeah, a Linux-specific replacement of spinlocks with futexes seems like
a lot safer idea than let's rely on posix mutexes everywhere.  It's
still unproven whether it'd be an improvement, but you could expect to
prove it one way or the other with a well-defined amount of testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers