Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread Magnus Hagander
On Oct 22, 2013 1:14 AM, "Tomas Vondra"  wrote:
>
> On 22.10.2013 00:59, sparikh wrote:
> > Yes, Expalin without Analyze is taking long. It is weird. In the
> > pg_stat_activity Explain was the only query running. So server was
> > almost idle. Using New relic interface I checked CPU was almost idle
> > - around 10-20%. There were some IO activity - around 40-50%.
> >
> > I forgot to mention before I could run perf on command line even with
> > root permission. It says command not found. May be utility is not
> > installed or not enabled.
>
> Obviously you need to install it ... maybe ask your sysadmin to do that.
>
> > I have attached the snapshot of vmstat while explain was running in
> > background. vmstat.txt
> > 
>
> The vmstat clearly shows that ~1 CPU is waiting on I/O. Hmm, I'm really
> wondering what's going on here - I can't think of a case where this
> would happen with a plain EXPLAIN ...

Catalog bloat could make that happen. Though that should show up on the
master as well, it could be that it's cached there and therefor only shows
us to as cpu and not io and is therefore not noticed.

/Magnus


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread sparikh
> Do you suggest if I remove all the data files from /data/base folder 
> of standby and again rebuild using rsync from primary ? do you see 
> any issues there.? This is just to rule out any fragmentation on 
> standby side. 

The EXPLAIN really should not do much I/O. I doubt it has anything to do 
with fragmentation, so I doubt this is going to help. 

Actually I was referring to this in the context of addressing main
underlying performance issue, not EXPLAIN. Sorry, I may not have
communicated it correctly.

Even strance does not seem to be installed.

The filesytem type it shows to me ext3.

Thanks.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hot-Standby-performance-issue-tp5774673p5775361.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread Tomas Vondra
On 22.10.2013 00:59, sparikh wrote:
> Yes, Expalin without Analyze is taking long. It is weird. In the 
> pg_stat_activity Explain was the only query running. So server was
> almost idle. Using New relic interface I checked CPU was almost idle
> - around 10-20%. There were some IO activity - around 40-50%.
> 
> I forgot to mention before I could run perf on command line even with
> root permission. It says command not found. May be utility is not
> installed or not enabled.

Obviously you need to install it ... maybe ask your sysadmin to do that.

> I have attached the snapshot of vmstat while explain was running in 
> background. vmstat.txt 
> 

The vmstat clearly shows that ~1 CPU is waiting on I/O. Hmm, I'm really
wondering what's going on here - I can't think of a case where this
would happen with a plain EXPLAIN ...

We really need the perf results. Or try to run strace, maybe it'll give
more info about which files it accesses.

> Do you suggest if I remove all the data files from /data/base folder
> of standby and again rebuild using rsync from primary ? do you see
> any issues there.? This is just to rule out any fragmentation on
> standby side.

The EXPLAIN really should not do much I/O. I doubt it has anything to do
with fragmentation, so I doubt this is going to help.

> Our sys admin is planning to run fsck sometime today or tomorrow.

OK. Which filesystem do you use, btw?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread sparikh
Yes, Expalin without Analyze is taking long. It is weird. In the
pg_stat_activity Explain was the only query running. So server was almost
idle. Using New relic interface I checked CPU was almost idle - around
10-20%. There were some IO activity - around 40-50%.

I forgot to mention before I could run perf on command line even with root
permission. It says command not found. May be utility is not installed or
not enabled.

I have attached the snapshot of vmstat while explain was running in
background. vmstat.txt
  

Do you suggest if I remove all the data files from /data/base folder of
standby and again rebuild using rsync from primary ? do you see any issues
there.? This is just to rule out any fragmentation on standby side.

Our sys admin is planning to run fsck sometime today or tomorrow.

Thanks.









--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hot-Standby-performance-issue-tp5774673p5775349.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread Tomas Vondra
On 21.10.2013 23:18, sparikh wrote:
> Yes, both Explain and Explain Analyse are taking time. As you 
> suggested I set the lock parameters, but no locks are observed. Also 
> checked pg_stat_activity and none of the sessions are either waiting 
> are blocked.

Not even the one running the explain? That's weird. Is the backend just
sitting there idle, or is it consuming some CPU?

If it's burning CPU, try to run perf to see where it's spending time.
We've already recommended this at least twice, but I haven't seen any
report yet. This might show if there are any issues with spinlocks
(which don't show in pg_locks etc.).

What about long-open transactions, not necessarily blocked/waiting?

  SELECT xact_start, current_query, waiting
FROM pg_stat_activity WHERE xact_start ASC LIMIT 10;

> I agree we must upgrade to latest version (9.1.10), but
> unfortunately kind of resources (not only man power) we are having it
> is going to be extremely challenging task for us. Of course all other
> options are not working then we have to take the tough route. No
> choice.

It's not really about this particular issue. As both me and Kevin
pointed out, there are some pretty important ecurity fixes etc. You need
to update irrespectedly of this issue.

BTW minor upgrades (including 9.1.1 -> 9.1.10) are supposed to be
relatively simple, as the the format remains the same. So it's a matter
of shutting down the machine, updating the binaries and starting again.
Of course, I'm not familiar with your setup and requirements.

> I am also working with sys admin to rule any issues at the OS or VM 
> level.

OK, good. Post a summary of what you checked.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread sparikh
Yes, both Explain and Explain Analyse are taking time. As you suggested I set
the lock parameters, but no locks are observed. Also checked
pg_stat_activity and none of the sessions are either waiting are blocked.

I agree we must upgrade to latest version (9.1.10), but unfortunately kind
of resources (not only man power) we are having it is going to be extremely
challenging task for us. Of course all other options are not working then we
have to take the tough route. No choice.

I am also working with sys admin to rule any issues at the OS or VM level.

Thanks.  



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hot-Standby-performance-issue-tp5774673p5775332.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread Tomas Vondra
On 21.10.2013 17:05, sparikh wrote:
> Stupid question - when you say that a query is fast on primary but
> slow on standby, are you referring to exactly the same query,
> including parameter values?
> 
> Yes . It is exactly and exactly the same query with the same
> parameters. Yes, it sounds stupid  but that is what happening. Though
> plan says it is 18ms it runs for more than 15-20 mins and finally
> returns with conflict error : " ERROR: canceling statement due to
> conflict with recovery "

OK.

> Even the to run execute plan itself takes very long on standby. Just
> to get the execute plan on standby is turning out big deal.

Do you mean EXPLAIN or EXPLAIN ANALYZE?

So far we've seen just EXPLAIN ANALYZE - can you try just EXPLAIN? If it
locks, it's either because of something expensive in the planning, or
locking.

The locking is much more likely, because the primary is behaving just
fine and the resulting plan is exactly the same on both ends.

> Regarding IO spike, yes I can understand that if data is not
> available in the memory then it has to get it from disk. But the
> thing is it remains there as much time until query returns with Query
> conflict error.

I don't think the I/O is a problem at all, because the query takes just
18 milliseconds. However that does not include planning, so either a lot
of time spent waiting for a lock or doing a lot of stuff on CPU, won't
be reported here.

What you can do to debug this is either look at pg_locks on the standby
for connections with "granted=f", or connect using psql and do this

  set log_lock_waits = true;
  set client_min_messages = log;

  EXPLAIN ... query ...;

and it should print what locks the connection is waiting for. Then you
may investigate further, e.g. check who's holding the lock in
pg_stat_activity etc.

But again, I think spending a single minute on this before upgrading to
the current version is a waste of time.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot Standby performance issue

2013-10-21 Thread sparikh
Stupid question - when you say that a query is fast on primary but slow 
on standby, are you referring to exactly the same query, including 
parameter values? 

Yes . It is exactly and exactly the same query with the same parameters. 
Yes, it sounds stupid  but that is what happening. Though plan says it is
18ms it runs for more than 15-20 mins and finally returns with conflict
error : " ERROR: canceling statement due to conflict with recovery "

Even the to run execute plan itself takes very long on standby. Just to get
the execute plan on standby is turning out big deal. 

Regarding IO spike, yes I can understand that if data is not available in
the memory then it has to get it from disk. But the thing is it remains
there as much time until query returns with Query conflict error.

Thanks again.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hot-Standby-performance-issue-tp5774673p5775257.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance