Hi,

Apologies for resurrecting this old thread, but it seems like this is
better than starting a new conversation.

We are now running 9.1.13 and have doubled the CPU and memory.  So 2x 16
Opteron 6276 (32 cores total), and 64GB memory.  shared_buffers set to 20G,
effective_cache_size set to 40GB.

We were able to record perf data during the latest incident of high CPU
utilization. perf report is below:

Samples: 31M of event 'cycles', Event count (approx.): 16289978380877
 44.74%       postmaster  [kernel.kallsyms]             [k]
_spin_lock_irqsave
 15.03%       postmaster  postgres                      [.]
0x00000000002ea937
  3.14%       postmaster  postgres                      [.] s_lock

  2.30%       postmaster  [kernel.kallsyms]             [k]
compaction_alloc
  2.21%       postmaster  postgres                      [.]
HeapTupleSatisfiesMVCC
  1.75%       postmaster  postgres                      [.]
hash_search_with_hash_value
  1.25%       postmaster  postgres                      [.]
ExecScanHashBucket
  1.20%       postmaster  postgres                      [.] SHMQueueNext

  1.05%       postmaster  postgres                      [.] slot_getattr

  1.04%             init  [kernel.kallsyms]             [k]
native_safe_halt
  0.73%       postmaster  postgres                      [.] LWLockAcquire

  0.59%       postmaster  [kernel.kallsyms]             [k] page_fault

  0.52%       postmaster  postgres                      [.] ExecQual

  0.40%       postmaster  postgres                      [.] ExecStoreTuple

  0.38%       postmaster  postgres                      [.] ExecScan

  0.37%       postmaster  postgres                      [.]
check_stack_depth
  0.35%       postmaster  postgres                      [.] SearchCatCache

  0.35%       postmaster  postgres                      [.]
CheckForSerializableConflictOut
  0.34%       postmaster  postgres                      [.] LWLockRelease

  0.30%       postmaster  postgres                      [.] _bt_checkkeys

  0.28%       postmaster  libc-2.12.so                  [.] memcpy

  0.27%       postmaster  [kernel.kallsyms]             [k]
get_pageblock_flags_group
  0.27%       postmaster  postgres                      [.] int4eq

  0.27%       postmaster  postgres                      [.]
heap_page_prune_opt
  0.27%       postmaster  postgres                      [.]
pgstat_init_function_usage
  0.26%       postmaster  [kernel.kallsyms]             [k] _spin_lock

  0.25%       postmaster  postgres                      [.] _bt_compare

  0.24%       postmaster  postgres                      [.]
pgstat_end_function_usage

...please let me know if we need to produce the report differently to be
useful.

We will begin reducing shared_buffers incrementally over the coming days.


Dave Owens

541-359-2602
TU 
Facebook<https://app.getsignals.com/link?url=https%3A%2F%2Fwww.facebook.com%2Fteamunify&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=179943a8-e0fa-494a-f79a-f86a69d3abdc>
 | Free OnDeck Mobile
Apps<https://app.getsignals.com/link?url=http%3A%2F%2Fwww.teamunify.com%2F__corp__%2Fondeck%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=504a29f5-3415-405c-d550-195aa1ca1ee3>



On Tue, Oct 15, 2013 at 8:14 PM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Tue, Oct 15, 2013 at 12:26 PM, Julien Cigar <jci...@ulb.ac.be> wrote:
> > On Tue, Oct 15, 2013 at 08:59:08AM -0700, Tony Kay wrote:
> >> On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra <t...@fuzzy.cz> wrote:
> >>
> >> > On 15.10.2013 01:00, Tony Kay wrote:
> >> > > Hi,
> >> > >
> >> > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a
> >> > > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but
> >> > > our webapp is configured to allocate a thread-local connection, so
> >> > > those connections are rarely doing anything more than half the time.
> >> >
> >> > Lower your shared buffers to about 20% of your RAM, unless you've
> tested
> >> > it's actually helping in your particular case. It's unlikely you'll
> get
> >> > better performance by using more than that, especially on older
> >> > versions, so it's wiser to leave the rest for page cache.
> >> >
> >> > It might even be one of the causes of the performance issue you're
> >> > seeing, as shared buffers are not exactly overhead-free.
> >> >
> >> > See this for more details on tuning:
> >> >
> >> >    http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
> >>
> >>
> >> I had followed the general directions from several sources years ago,
> which
> >> indicate up to 40% of RAM. We've been running very large shared buffers
> for
> >
> > in general it's best to start with 10-15% of the RAM and no more then
> > 2-4 GB
> >
> >> 4 years now, but it is difficult to generate a good real load without
> >> testing against users, so we have not felt the need to move it around.
> In
> >> general, I don't tend to tinker with a setting that has been fine for
> this
> >> long without good reason. I've been wanting to upgrade to the newer
> >> mmap-based versions of pgsql, but was waiting to re-tune this when I
> did so.
> >>
> >> Why do you suspect that shared_buffers would cause the behavior I'm
> seeing?
> >>
> >
> > for two reasons:
> >
> > - some of the overhead of bgwriter and checkpoints is more or less linear
> > in the size of shared_buffers, for example it could be possible that a
> > large quantity of data could be dirty when a checkpoint occurs).
> >
> > - the OS cache is also being used for reads and writes, the larger
> >   shared_buffers is, the more you risk double buffering (same blocks
> >   in the OS cache and in the database buffer cache).
>
> That's good reasoning but is not related to the problem faced by the
> OP.  The real reason why I recommend to keep shared buffers at max
> 2GB, always, is because we have major contention issues which we
> presume are in the buffer area (either in the mapping or in the clock
> sweep) but could be something else entirely.  These issues tend to
> show up on fast machines in all- or mostly- read workloads.
>
> We are desperate for profiles demonstrating the problem in production
> workloads.  If OP is willing to install and run perf in production
> (which is not a bad idea anyways), then my advice is to change nothing
> until we have a chance to grab a profile.  These types of problems are
> notoriously difficult to reproduce in test environments.
>
> merlin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Reply via email to