Re: [PERFORM] Intermittent hangs with 9.2
We haven't seen any issues since we decreased shared_buffers. We also tuned some of the longer running / more frequently executed queries, so that may have had an effect as well, but my money would be on the shared_buffers change. If the issue re-appears I'll try to get a perf again and post back, but if you don't hear from me again you can assume the problem is solved. Thank you all again for the help. -Dave On Fri, Sep 13, 2013 at 11:05 AM, David Whittaker d...@iradix.com wrote: On Fri, Sep 13, 2013 at 10:52 AM, Merlin Moncure mmonc...@gmail.comwrote: On Thu, Sep 12, 2013 at 3:06 PM, David Whittaker d...@iradix.com wrote: Hi All, We lowered shared_buffers to 8G and increased effective_cache_size accordingly. So far, we haven't seen any issues since the adjustment. The issues have come and gone in the past, so I'm not convinced it won't crop up again, but I think the best course is to wait a week or so and see how things work out before we make any other changes. Thank you all for your help, and if the problem does reoccur, we'll look into the other options suggested, like using a patched postmaster and compiling for perf -g. Thanks again, I really appreciate the feedback from everyone. Interesting -- please respond with a follow up if/when you feel satisfied the problem has gone away. Andres was right; I initially mis-diagnosed the problem (there is another issue I'm chasing that has a similar performance presentation but originates from a different area of the code). That said, if reducing shared_buffers made *your* problem go away as well, then this more evidence that we have an underlying contention mechanic that is somehow influenced by the setting. Speaking frankly, under certain workloads we seem to have contention issues in the general area of the buffer system. I'm thinking (guessing) that the problems is usage_count is getting incremented faster than the buffers are getting cleared out which is then causing the sweeper to spend more and more time examining hotly contended buffers. This may make no sense in the context of your issue; I haven't looked at the code yet. Also, I've been unable to cause this to happen in simulated testing. But I'm suspicious (and dollars to doughnuts '0x347ba9' is spinlock related). Anyways, thanks for the report and (hopefully) the follow up. merlin You guys have taken the time to help me through this, following up is the least I can do. So far we're still looking good.
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On Thu, Sep 19, 2013 at 5:49 PM, Bartłomiej Romański b...@sentia.pl wrote: Finally, we created a python script to make simple queries in a loop: while True: id = random.randrange(1, 1000 * 1000) db.execute('select offers.id, offers.name, categories.id, categories.name from offers left join categories on categories.id = offers.category_id where offers.id = %s', (id,)) print db.fetchall() We start 20 instances simultaneously and measure performance: parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l /dev/null Normally we observe about 30k QPS what's a satisfying result (without any tuning at all). The problem occurs when we open a second console, start psql and type: pgtest= begin; insert into categories (name) select 'category_' || x from generate_series(1,1000) as x; Related topics have been discussed recently, but without much apparent resolution. See In progress INSERT wrecks plans on table and Performance bug in prepared statement binding in 9.2 also on this list The issues are: 1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics. 2) When doing so, it needs to wade through the in-progress rows, figuring out whether the owning transaction is still in progress or has already committed or aborted. If the owning transaction *has* committed or rolled back, then it can set hint bits so that future executions don't need to do this. But if the owning transaction is still open, then the querying transaction has done the work, but is not able to set any hint bits so other executions also need to do the work, repeatedly until the other transactions finishes. 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. One idea (from Simon, I think) was to remember that a transaction was just checked and was in progress, and not checking it again for future rows. In the future the transaction might have committed, but since it would have committed after we took the snapshot, thinking it is still in progress would not be a correctness problem, it would just needlessly delay setting the hint bits. Another idea was not to check if it were in progress at all, because if it is in the snapshot it doesn't matter if it is still in progress. This would a slightly more aggressive way to delay setting the hint bit (but also delay doing the work needed to figure out how to set them). Items 2 and 3 and can also arise in situations other than paired with 1. Cheers, Jeff
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 21/09/2013, at 00.01, Jeff Janes jeff.ja...@gmail.com wrote: See In progress INSERT wrecks plans on table and Performance bug in prepared statement binding in 9.2 also on this list This feels like the same http://postgresql.1045698.n5.nabble.com/Slow-query-plan-generation-fast-query-PG-9-2-td5769363.html The issues are: 1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics. In my app i would prefer potentially out-of-date statistics instead. Jesper