Hi, Apologies for resurrecting this old thread, but it seems like this is better than starting a new conversation.
We are now running 9.1.13 and have doubled the CPU and memory. So 2x 16 Opteron 6276 (32 cores total), and 64GB memory. shared_buffers set to 20G, effective_cache_size set to 40GB. We were able to record perf data during the latest incident of high CPU utilization. perf report is below: Samples: 31M of event 'cycles', Event count (approx.): 16289978380877 44.74% postmaster [kernel.kallsyms] [k] _spin_lock_irqsave 15.03% postmaster postgres [.] 0x00000000002ea937 3.14% postmaster postgres [.] s_lock 2.30% postmaster [kernel.kallsyms] [k] compaction_alloc 2.21% postmaster postgres [.] HeapTupleSatisfiesMVCC 1.75% postmaster postgres [.] hash_search_with_hash_value 1.25% postmaster postgres [.] ExecScanHashBucket 1.20% postmaster postgres [.] SHMQueueNext 1.05% postmaster postgres [.] slot_getattr 1.04% init [kernel.kallsyms] [k] native_safe_halt 0.73% postmaster postgres [.] LWLockAcquire 0.59% postmaster [kernel.kallsyms] [k] page_fault 0.52% postmaster postgres [.] ExecQual 0.40% postmaster postgres [.] ExecStoreTuple 0.38% postmaster postgres [.] ExecScan 0.37% postmaster postgres [.] check_stack_depth 0.35% postmaster postgres [.] SearchCatCache 0.35% postmaster postgres [.] CheckForSerializableConflictOut 0.34% postmaster postgres [.] LWLockRelease 0.30% postmaster postgres [.] _bt_checkkeys 0.28% postmaster libc-2.12.so [.] memcpy 0.27% postmaster [kernel.kallsyms] [k] get_pageblock_flags_group 0.27% postmaster postgres [.] int4eq 0.27% postmaster postgres [.] heap_page_prune_opt 0.27% postmaster postgres [.] pgstat_init_function_usage 0.26% postmaster [kernel.kallsyms] [k] _spin_lock 0.25% postmaster postgres [.] _bt_compare 0.24% postmaster postgres [.] pgstat_end_function_usage ...please let me know if we need to produce the report differently to be useful. We will begin reducing shared_buffers incrementally over the coming days. Dave Owens 541-359-2602 TU Facebook<https://app.getsignals.com/link?url=https%3A%2F%2Fwww.facebook.com%2Fteamunify&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=179943a8-e0fa-494a-f79a-f86a69d3abdc> | Free OnDeck Mobile Apps<https://app.getsignals.com/link?url=http%3A%2F%2Fwww.teamunify.com%2F__corp__%2Fondeck%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=504a29f5-3415-405c-d550-195aa1ca1ee3> On Tue, Oct 15, 2013 at 8:14 PM, Merlin Moncure <[email protected]> wrote: > On Tue, Oct 15, 2013 at 12:26 PM, Julien Cigar <[email protected]> wrote: > > On Tue, Oct 15, 2013 at 08:59:08AM -0700, Tony Kay wrote: > >> On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra <[email protected]> wrote: > >> > >> > On 15.10.2013 01:00, Tony Kay wrote: > >> > > Hi, > >> > > > >> > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a > >> > > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but > >> > > our webapp is configured to allocate a thread-local connection, so > >> > > those connections are rarely doing anything more than half the time. > >> > > >> > Lower your shared buffers to about 20% of your RAM, unless you've > tested > >> > it's actually helping in your particular case. It's unlikely you'll > get > >> > better performance by using more than that, especially on older > >> > versions, so it's wiser to leave the rest for page cache. > >> > > >> > It might even be one of the causes of the performance issue you're > >> > seeing, as shared buffers are not exactly overhead-free. > >> > > >> > See this for more details on tuning: > >> > > >> > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > >> > >> > >> I had followed the general directions from several sources years ago, > which > >> indicate up to 40% of RAM. We've been running very large shared buffers > for > > > > in general it's best to start with 10-15% of the RAM and no more then > > 2-4 GB > > > >> 4 years now, but it is difficult to generate a good real load without > >> testing against users, so we have not felt the need to move it around. > In > >> general, I don't tend to tinker with a setting that has been fine for > this > >> long without good reason. I've been wanting to upgrade to the newer > >> mmap-based versions of pgsql, but was waiting to re-tune this when I > did so. > >> > >> Why do you suspect that shared_buffers would cause the behavior I'm > seeing? > >> > > > > for two reasons: > > > > - some of the overhead of bgwriter and checkpoints is more or less linear > > in the size of shared_buffers, for example it could be possible that a > > large quantity of data could be dirty when a checkpoint occurs). > > > > - the OS cache is also being used for reads and writes, the larger > > shared_buffers is, the more you risk double buffering (same blocks > > in the OS cache and in the database buffer cache). > > That's good reasoning but is not related to the problem faced by the > OP. The real reason why I recommend to keep shared buffers at max > 2GB, always, is because we have major contention issues which we > presume are in the buffer area (either in the mapping or in the clock > sweep) but could be something else entirely. These issues tend to > show up on fast machines in all- or mostly- read workloads. > > We are desperate for profiles demonstrating the problem in production > workloads. If OP is willing to install and run perf in production > (which is not a bad idea anyways), then my advice is to change nothing > until we have a chance to grab a profile. These types of problems are > notoriously difficult to reproduce in test environments. > > merlin > > > -- > Sent via pgsql-performance mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
