Re: [PERFORM] Intermittent hangs with 9.2

2013-09-20 Thread David Whittaker
We haven't seen any issues since we decreased shared_buffers.  We also
tuned some of the longer running / more frequently executed queries, so
that may have had an effect as well, but my money would be on the
shared_buffers change.  If the issue re-appears I'll try to get a perf
again and post back, but if you don't hear from me again you can assume the
problem is solved.

Thank you all again for the help.

-Dave

On Fri, Sep 13, 2013 at 11:05 AM, David Whittaker d...@iradix.com wrote:




 On Fri, Sep 13, 2013 at 10:52 AM, Merlin Moncure mmonc...@gmail.comwrote:

 On Thu, Sep 12, 2013 at 3:06 PM, David Whittaker d...@iradix.com wrote:
  Hi All,
 
  We lowered shared_buffers to 8G and increased effective_cache_size
  accordingly.  So far, we haven't seen any issues since the adjustment.
  The
  issues have come and gone in the past, so I'm not convinced it won't
 crop up
  again, but I think the best course is to wait a week or so and see how
  things work out before we make any other changes.
 
  Thank you all for your help, and if the problem does reoccur, we'll look
  into the other options suggested, like using a patched postmaster and
  compiling for perf -g.
 
  Thanks again, I really appreciate the feedback from everyone.

 Interesting -- please respond with a follow up if/when you feel
 satisfied the problem has gone away.  Andres was right; I initially
 mis-diagnosed the problem (there is another issue I'm chasing that has
 a similar performance presentation but originates from a different
 area of the code).

 That said, if reducing shared_buffers made *your* problem go away as
 well, then this more evidence that we have an underlying contention
 mechanic that is somehow influenced by the setting.  Speaking frankly,
 under certain workloads we seem to have contention issues in the
 general area of the buffer system.  I'm thinking (guessing) that the
 problems is usage_count is getting incremented faster than the buffers
 are getting cleared out which is then causing the sweeper to spend
 more and more time examining hotly contended buffers.  This may make
 no sense in the context of your issue; I haven't looked at the code
 yet.  Also, I've been unable to cause this to happen in simulated
 testing.  But I'm suspicious (and dollars to doughnuts '0x347ba9' is
 spinlock related).

 Anyways, thanks for the report and (hopefully) the follow up.

 merlin


 You guys have taken the time to help me through this, following up is the
 least I can do.  So far we're still looking good.



Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-20 Thread Jeff Janes
On Thu, Sep 19, 2013 at 5:49 PM, Bartłomiej Romański b...@sentia.pl wrote:


 Finally, we created a python script to make simple queries in a loop:

 while True:
 id = random.randrange(1, 1000 * 1000)
 db.execute('select offers.id, offers.name, categories.id,
 categories.name from offers left join categories on categories.id =
 offers.category_id where offers.id = %s', (id,))
 print db.fetchall()

 We start 20 instances simultaneously and measure performance:

 parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l  /dev/null

 Normally we observe about 30k QPS what's a satisfying result (without any
 tuning at all).

 The problem occurs when we open a second console, start psql and type:

 pgtest= begin; insert into categories (name) select 'category_' || x from
 generate_series(1,1000) as x;



Related topics have been discussed recently, but without much apparent
resolution.

See In progress INSERT wrecks plans on table and Performance bug in
prepared statement binding in 9.2 also on this list

The issues are:

1) The planner actually queries the relation to find the end points of the
variable ranges, rather than using potentially out-of-date statistics.

2) When doing so, it needs to wade through the in-progress rows, figuring
out whether the owning transaction is still in progress or has already
committed or aborted.  If the owning transaction *has* committed or rolled
back, then it can set hint bits so that future executions don't need to do
this.  But if the owning transaction is still open, then the querying
transaction has done the work, but is not able to set any hint bits so
other executions also need to do the work, repeatedly until the other
transactions finishes.

3) Even worse, asking if a given transaction has finished yet can be a
serious point of system-wide contention, because it takes the
ProcArrayLock, once per row which needs to be checked.  So you have 20
processes all fighting over the ProcArrayLock, each doing so 1000 times per
query.

One idea (from Simon, I think) was to remember that a transaction was just
checked and was in progress, and not checking it again for future rows.  In
the future the transaction might have committed, but since it would have
committed after we took the snapshot, thinking it is still in progress
would not be a correctness problem, it would just needlessly delay setting
the hint bits.

Another idea was not to check if it were in progress at all, because if it
is in the snapshot it doesn't matter if it is still in progress.  This
would a slightly more aggressive way to delay setting the hint bit (but
also delay doing the work needed to figure out how to set them).

Items 2 and 3 and can also arise in situations other than paired with 1.


Cheers,

Jeff


Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-20 Thread Jesper Krogh

On 21/09/2013, at 00.01, Jeff Janes jeff.ja...@gmail.com wrote:
 See In progress INSERT wrecks plans on table and Performance bug in 
 prepared statement binding in 9.2 also on this list

This feels like the same
http://postgresql.1045698.n5.nabble.com/Slow-query-plan-generation-fast-query-PG-9-2-td5769363.html



 
 The issues are:
 
 1) The planner actually queries the relation to find the end points of the 
 variable ranges, rather than using potentially out-of-date statistics.
 

In my app i would prefer potentially out-of-date statistics instead.

Jesper