We haven't seen any issues since we decreased shared_buffers. We also tuned some of the longer running / more frequently executed queries, so that may have had an effect as well, but my money would be on the shared_buffers change. If the issue re-appears I'll try to get a perf again and post back, but if you don't hear from me again you can assume the problem is solved.
Thank you all again for the help. -Dave On Fri, Sep 13, 2013 at 11:05 AM, David Whittaker <d...@iradix.com> wrote: > > > > On Fri, Sep 13, 2013 at 10:52 AM, Merlin Moncure <mmonc...@gmail.com>wrote: > >> On Thu, Sep 12, 2013 at 3:06 PM, David Whittaker <d...@iradix.com> wrote: >> > Hi All, >> > >> > We lowered shared_buffers to 8G and increased effective_cache_size >> > accordingly. So far, we haven't seen any issues since the adjustment. >> The >> > issues have come and gone in the past, so I'm not convinced it won't >> crop up >> > again, but I think the best course is to wait a week or so and see how >> > things work out before we make any other changes. >> > >> > Thank you all for your help, and if the problem does reoccur, we'll look >> > into the other options suggested, like using a patched postmaster and >> > compiling for perf -g. >> > >> > Thanks again, I really appreciate the feedback from everyone. >> >> Interesting -- please respond with a follow up if/when you feel >> satisfied the problem has gone away. Andres was right; I initially >> mis-diagnosed the problem (there is another issue I'm chasing that has >> a similar performance presentation but originates from a different >> area of the code). >> >> That said, if reducing shared_buffers made *your* problem go away as >> well, then this more evidence that we have an underlying contention >> mechanic that is somehow influenced by the setting. Speaking frankly, >> under certain workloads we seem to have contention issues in the >> general area of the buffer system. I'm thinking (guessing) that the >> problems is usage_count is getting incremented faster than the buffers >> are getting cleared out which is then causing the sweeper to spend >> more and more time examining hotly contended buffers. This may make >> no sense in the context of your issue; I haven't looked at the code >> yet. Also, I've been unable to cause this to happen in simulated >> testing. But I'm suspicious (and dollars to doughnuts '0x347ba9' is >> spinlock related). >> >> Anyways, thanks for the report and (hopefully) the follow up. >> >> merlin >> > > You guys have taken the time to help me through this, following up is the > least I can do. So far we're still looking good. >