Re: [PERFORM] Postgres Replaying WAL slowly

2014-09-17 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane  wrote:
> Jeff Frost  writes:
>>> On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?
>
>> And indeed it did catch up overnight and the lag increased shortly after a 
>> correlating spike in AccessExclusiveLocks that were generated by temp table 
>> creation with on commit drop.
>
> OK, so we have a pretty clear idea of where the problem is now.
>
> It seems like there are three, not mutually exclusive, ways we might
> address this:
>
> 1. Local revisions inside StandbyReleaseLocks to make it perform better in
> the presence of many locks.  This would only be likely to improve matters
> much if there's a fixable O(N^2) algorithmic issue; but there might well
> be one.
>
> 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
> the grounds that no standby should be touching them.  I'm not entirely
> sure that that argument is bulletproof though; in particular, even though
> a standby couldn't access the table's data, it's possible that it would be
> interested in seeing consistent catalog entries.
>
> 3. Avoid WAL-logging AccessExclusiveLocks associated with
> new-in-transaction tables, temp or not, on the grounds that no standby
> could even see such tables until they're committed.  We could go a bit
> further and not take out any locks on a new-in-transaction table in the
> first place, on the grounds that other transactions on the master can't
> see 'em either.
>
> It sounded like Andres had taken a preliminary look at #1 and found a
> possible avenue for improvement, which I'd encourage him to pursue.
>
> For both #2 and the conservative version of #3, the main implementation
> problem would be whether the lock WAL-logging code has cheap access to
> the necessary information.  I suspect it doesn't.
>
> The radical version of #3 might be pretty easy to do, at least to the
> extent of removing locks taken out during CREATE TABLE.  I suspect there
> are some assertions or other consistency checks that would get unhappy if
> we manipulate relations without locks, though, so those would have to be
> taught about the exception.  Also, we sometimes forget new-in-transaction
> status during relcache flush events; it's not clear if that would be a
> problem for this.
>
> I don't plan to work on this myself, but perhaps someone with more
> motivation will want to run with these ideas.

Patch implements option 2 in the above.

Skipping the locks entirely seems like it opens a can of worms.

Skipping the lock for temp tables is valid since locks don't need to
exist on the standby. Any catalog entries for them will exist, but the
rows will show them as temp and nobody would expect them to be valid
outside of the original session.

Patch implements a special case that takes the lock normally, but
skips WAL logging the lock info.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


temp_tables_skip_logging_locks.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-03 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane  wrote:

> I don't plan to work on this myself, but perhaps someone with more
> motivation will want to run with these ideas.

I was planning to work on improving performance of replication apply
over the summer, mid July - Aug, so I'll add this to the list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Tom Lane
Andres Freund  writes:
> On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
>> It seems like there are three, not mutually exclusive, ways we might
>> address this:

> But I think more importantly it's probably quite possible to hit a
> similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
> inside a transaction (with several subxacts) or so? So we probaly really
> should fix the bad scaling.

Well, my thought was that these approaches would address somewhat
different sets of use-cases, and we might well want to do more than one.
Even if StandbyReleaseLocks were zero-cost, not emitting the WAL in the
first place is surely considerably cheaper yet.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Andres Freund
On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
> Jeff Frost  writes:
> >> On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:
> >>> Did you check whether the locks were all on temp tables of the
> >>> ON COMMIT DROP persuasion?
> 
> > And indeed it did catch up overnight and the lag increased shortly after a 
> > correlating spike in AccessExclusiveLocks that were generated by temp table 
> > creation with on commit drop.
> 
> OK, so we have a pretty clear idea of where the problem is now.
> 
> It seems like there are three, not mutually exclusive, ways we might
> address this:
> 
> 1. Local revisions inside StandbyReleaseLocks to make it perform better in
> the presence of many locks.  This would only be likely to improve matters
> much if there's a fixable O(N^2) algorithmic issue; but there might well
> be one.
>
> It sounded like Andres had taken a preliminary look at #1 and found a
> possible avenue for improvement, which I'd encourage him to pursue.
> 

I don't have the resources to do this right now, but yes, I think we can
get relatively easily get rid of the O(num_locks * num_subtransactions)
behaviour.

> 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
> the grounds that no standby should be touching them.  I'm not entirely
> sure that that argument is bulletproof though; in particular, even though
> a standby couldn't access the table's data, it's possible that it would be
> interested in seeing consistent catalog entries.

Hm. We definitely perform checks surprisingly late for those. It's
possible to do SELECT * FROM pg_temp_.whatever; without an error f
there's no rows of if the rest of the plan doesn't do accesses to that
table. The check prohibiting access is only in bufmgr.c...
So yea, I don't think we can do this for at least < 9.4. And there
it'll still be hard.

> 3. Avoid WAL-logging AccessExclusiveLocks associated with
> new-in-transaction tables, temp or not, on the grounds that no standby
> could even see such tables until they're committed.  We could go a bit
> further and not take out any locks on a new-in-transaction table in the
> first place, on the grounds that other transactions on the master can't
> see 'em either.
> 
> For both #2 and the conservative version of #3, the main implementation
> problem would be whether the lock WAL-logging code has cheap access to
> the necessary information.  I suspect it doesn't.

Not trivially. It's logged directly in LockAcquireExtended(). We could
add the information into locktags as there's unused fields for relation
locktags, but brrr.

> The radical version of #3 might be pretty easy to do, at least to the
> extent of removing locks taken out during CREATE TABLE.  I suspect there
> are some assertions or other consistency checks that would get unhappy if
> we manipulate relations without locks, though, so those would have to be
> taught about the exception.
>
> Also, we sometimes forget new-in-transaction
> status during relcache flush events; it's not clear if that would be a
> problem for this.

I think that hole is actually pluggable in newer releases - at least
there's no code around that assumes rd_createSubid now is persistent,
even across cache resets.

But I think more importantly it's probably quite possible to hit a
similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
inside a transaction (with several subxacts) or so? So we probaly really
should fix the bad scaling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Tom Lane
Jeff Frost  writes:
>> On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:
>>> Did you check whether the locks were all on temp tables of the
>>> ON COMMIT DROP persuasion?

> And indeed it did catch up overnight and the lag increased shortly after a 
> correlating spike in AccessExclusiveLocks that were generated by temp table 
> creation with on commit drop.

OK, so we have a pretty clear idea of where the problem is now.

It seems like there are three, not mutually exclusive, ways we might
address this:

1. Local revisions inside StandbyReleaseLocks to make it perform better in
the presence of many locks.  This would only be likely to improve matters
much if there's a fixable O(N^2) algorithmic issue; but there might well
be one.

2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
the grounds that no standby should be touching them.  I'm not entirely
sure that that argument is bulletproof though; in particular, even though
a standby couldn't access the table's data, it's possible that it would be
interested in seeing consistent catalog entries.

3. Avoid WAL-logging AccessExclusiveLocks associated with
new-in-transaction tables, temp or not, on the grounds that no standby
could even see such tables until they're committed.  We could go a bit
further and not take out any locks on a new-in-transaction table in the
first place, on the grounds that other transactions on the master can't
see 'em either.

It sounded like Andres had taken a preliminary look at #1 and found a
possible avenue for improvement, which I'd encourage him to pursue.

For both #2 and the conservative version of #3, the main implementation
problem would be whether the lock WAL-logging code has cheap access to
the necessary information.  I suspect it doesn't.

The radical version of #3 might be pretty easy to do, at least to the
extent of removing locks taken out during CREATE TABLE.  I suspect there
are some assertions or other consistency checks that would get unhappy if
we manipulate relations without locks, though, so those would have to be
taught about the exception.  Also, we sometimes forget new-in-transaction
status during relcache flush events; it's not clear if that would be a
problem for this.

I don't plan to work on this myself, but perhaps someone with more
motivation will want to run with these ideas.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Jeff Frost
On Jun 30, 2014, at 4:57 PM, Jeff Frost  wrote:

> 
> On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:
> 
>> Ah ... that's more like a number I can believe something would have
>> trouble coping with.  Did you see a noticeable slowdown with this?
>> Now that we've seen that number, of course it's possible there was an
>> even higher peak occurring when you saw the trouble.
>> 
>> Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
>> it just takes awhile to handle that many locks.
>> 
>> Did you check whether the locks were all on temp tables of the
>> ON COMMIT DROP persuasion?
> 
> 
> Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
> pg_locks WHERE mode = 'AccessExclusiveLock' 
> run in cron every minute.
> 
> That said, I'd bet it was mostly ON COMMIT DROP temp tables.
> 
> The unfortunate thing is I wouldn't know how to correlate that spike with the 
> corresponding slowdown because the replica is about 5.5hrs lagged at the 
> moment.
> 
> Hopefully it will get caught up tonight and we can see if there's a 
> correlation tomorrow.

And indeed it did catch up overnight and the lag increased shortly after a 
correlating spike in AccessExclusiveLocks that were generated by temp table 
creation with on commit drop.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:

> Ah ... that's more like a number I can believe something would have
> trouble coping with.  Did you see a noticeable slowdown with this?
> Now that we've seen that number, of course it's possible there was an
> even higher peak occurring when you saw the trouble.
> 
> Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
> it just takes awhile to handle that many locks.
> 
> Did you check whether the locks were all on temp tables of the
> ON COMMIT DROP persuasion?


Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
pg_locks WHERE mode = 'AccessExclusiveLock' 
run in cron every minute.

That said, I'd bet it was mostly ON COMMIT DROP temp tables.

The unfortunate thing is I wouldn't know how to correlate that spike with the 
corresponding slowdown because the replica is about 5.5hrs lagged at the moment.

Hopefully it will get caught up tonight and we can see if there's a correlation 
tomorrow.

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 19:04:20 -0400, Tom Lane wrote:
> Jeff Frost  writes:
> >>> So it seems like we have a candidate explanation.  I'm a bit surprised
> >>> that StandbyReleaseLocks would get this slow if there are only a dozen
> >>> AccessExclusiveLocks in place at any one time, though.  Perhaps that
> >>> was a low point and there are often many more?
> 
> > Since we turned on the monitoring for that, we had a peak of 13,550
> > AccessExclusiveLocks.

Any chance the workload also uses lots of subtransactions?

> Ah ... that's more like a number I can believe something would have
> trouble coping with.  Did you see a noticeable slowdown with this?
> Now that we've seen that number, of course it's possible there was an
> even higher peak occurring when you saw the trouble.
> 
> Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
> it just takes awhile to handle that many locks.

I don't think there's a O(n^2) in StandbyReleaseLocks() itself, but in
combination with StandbyReleaseLockTree() it looks possibly bad. The
latter will call StandbyReleaseLocks() for every xid/subxid, and each of
the StandbyReleaseLocks() will then trawl the entire RecoveryLockList...

It'd probably be better to implement ReleaseLocksTree() by sorting the
subxid list and bsearch that while iterating RecoveryLockList.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost  writes:
>>> So it seems like we have a candidate explanation.  I'm a bit surprised
>>> that StandbyReleaseLocks would get this slow if there are only a dozen
>>> AccessExclusiveLocks in place at any one time, though.  Perhaps that
>>> was a low point and there are often many more?

> Since we turned on the monitoring for that, we had a peak of 13,550
> AccessExclusiveLocks.

Ah ... that's more like a number I can believe something would have
trouble coping with.  Did you see a noticeable slowdown with this?
Now that we've seen that number, of course it's possible there was an
even higher peak occurring when you saw the trouble.

Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
it just takes awhile to handle that many locks.

Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:46 PM, Jeff Frost  wrote:

>> So it seems like we have a candidate explanation.  I'm a bit surprised
>> that StandbyReleaseLocks would get this slow if there are only a dozen
>> AccessExclusiveLocks in place at any one time, though.  Perhaps that
>> was a low point and there are often many more?
>> 
>>  
> 
> Entirely possible that it was a low point.  We'll set up some monitoring to 
> track the number of AccessExclusiveLocks and see how much variance there is 
> throughout the day.


Since we turned on the monitoring for that, we had a peak of 13,550 
AccessExclusiveLocks.  So far most of the samples have been in the double 
digit, with that and two other outliers: 6,118 and 12,747.

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:39 PM, Tom Lane  wrote:

> 
> 
>> Another item of note is the system catalogs are quite bloated:
>> Would that cause the replica to spin on StandbyReleaseLocks?
> 
> AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
> temp tables though.
> 

Yah, this has been an issue in the past, so we tend to cluster them regularly 
during off-hours to minimize the issue.

> So it seems like we have a candidate explanation.  I'm a bit surprised
> that StandbyReleaseLocks would get this slow if there are only a dozen
> AccessExclusiveLocks in place at any one time, though.  Perhaps that
> was a low point and there are often many more?
> 
>   

Entirely possible that it was a low point.  We'll set up some monitoring to 
track the number of AccessExclusiveLocks and see how much variance there is 
throughout the day.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost  writes:
> On Jun 30, 2014, at 1:15 PM, Andres Freund  wrote:
>> So these are probably relations created in uncommitted
>> transactions. Possibly ON COMMIT DROP temp tables?

> That would make sense.  There are definitely quite a few of those being used.

Uh-huh.  I doubt that the mechanism that handles propagation of
AccessExclusiveLocks to the standby is smart enough to ignore locks
on temp tables :-(

> Another item of note is the system catalogs are quite bloated:
> Would that cause the replica to spin on StandbyReleaseLocks?

AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
temp tables though.

So it seems like we have a candidate explanation.  I'm a bit surprised
that StandbyReleaseLocks would get this slow if there are only a dozen
AccessExclusiveLocks in place at any one time, though.  Perhaps that
was a low point and there are often many more?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:15 PM, Andres Freund  wrote:

> On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
>> 
>> On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira 
>>  wrote:
>> 
>>> 
>>> On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost  wrote:
>>> And if you go fishing in pg_class for any of the oids, you don't find 
>>> anything:
>>> 
>>> That is probably because you are connected in the wrong database. Once you 
>>> connect to the database of interest, you don't even need to query pg_class, 
>>> just cast relation attribute to regclass:
>>> 
>>>SELECT relation::regclass, ...
>>>FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
>>> datname = current_database());
>>> 
>> 
>> Yah, i thought about that too, but verified I am in the correct DB.  Just 
>> for clarity sake:
> 
> So these are probably relations created in uncommitted
> transactions. Possibly ON COMMIT DROP temp tables?


That would make sense.  There are definitely quite a few of those being used.

Another item of note is the system catalogs are quite bloated:

 schemaname |  tablename   | tbloat | wastedmb | idxbloat | wastedidxmb
+--++--+--+-
 pg_catalog | pg_attribute |   3945 |   106.51 | 2770 |  611.28
 pg_catalog | pg_class |   8940 |45.26 | 4420 |   47.89
 pg_catalog | pg_type  |   4921 |18.45 | 5850 |   81.16
 pg_catalog | pg_depend|933 |10.23 |11730 |  274.37
 pg_catalog | pg_index |   3429 | 8.36 | 3920 |   24.24
 pg_catalog | pg_shdepend  |983 | 2.67 | 9360 |   30.56
(6 rows)

Would that cause the replica to spin on StandbyReleaseLocks?




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
> 
> On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira  
> wrote:
> 
> > 
> > On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost  wrote:
> > And if you go fishing in pg_class for any of the oids, you don't find 
> > anything:
> > 
> > That is probably because you are connected in the wrong database. Once you 
> > connect to the database of interest, you don't even need to query pg_class, 
> > just cast relation attribute to regclass:
> > 
> > SELECT relation::regclass, ...
> > FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
> > datname = current_database());
> > 
> 
> Yah, i thought about that too, but verified I am in the correct DB.  Just for 
> clarity sake:

So these are probably relations created in uncommitted
transactions. Possibly ON COMMIT DROP temp tables?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira  
wrote:

> 
> On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost  wrote:
> And if you go fishing in pg_class for any of the oids, you don't find 
> anything:
> 
> That is probably because you are connected in the wrong database. Once you 
> connect to the database of interest, you don't even need to query pg_class, 
> just cast relation attribute to regclass:
> 
> SELECT relation::regclass, ...
> FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname 
> = current_database());
> 

Yah, i thought about that too, but verified I am in the correct DB.  Just for 
clarity sake:

SELECT relation::regclass
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = 
current_database()) and mode = 'AccessExclusiveLock';

  relation



 3245508214
 3245508273
 3245508272
 3245508257
 3245508469
 3245508274
 3245508373
 3245508468
 3245508210
 3245508463
 3245508205
 3245508260
 3245508265
 3245508434
(16 rows)

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Matheus de Oliveira
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost  wrote:

> And if you go fishing in pg_class for any of the oids, you don't find
> anything:


That is probably because you are connected in the wrong database. Once you
connect to the database of interest, you don't even need to query pg_class,
just cast relation attribute to regclass:

SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE
datname = current_database());

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nĂ­vel F!
www.dextra.com.br/postgres


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:32 PM, Tom Lane  wrote:

> Jeff Frost  writes:
>> Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
> 
>>   mode   | count
>> --+---
>> AccessExclusiveLock  |11
>> AccessShareLock  |  2089
>> ExclusiveLock|46
>> RowExclusiveLock |81
>> RowShareLock |17
>> ShareLock| 4
>> ShareUpdateExclusiveLock | 5
> 
> That's not too helpful if you don't pay attention to what the lock is on;
> it's likely that all the ExclusiveLocks are on transactions' own XIDs,
> which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
> are probably interesting though --- you should look to see what those
> are on.

You're right about the ExclusiveLocks.

Here's how the AccessExclusiveLocks look:

 locktype | database |  relation  | page | tuple | virtualxid | transactionid | 
classid |   objid| objsubid | virtualtransaction |  pid  |mode  
   | granted
--+--++--+---++---+-++--++---+-+-
 relation |   111285 | 3245291551 |  |   ||   | 
||  | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292820 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292833 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287874 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292836 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292774 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292734 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292827 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288540 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292773 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292775 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292743 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292751 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288669 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292817 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288657 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 object   |   111285 ||  |   ||   | 
   2615 | 1246019760 |0 | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292746 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287876 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292739 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292826 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292825 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292832 |  |   |   

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost  writes:
> Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

>mode   | count
> --+---
>  AccessExclusiveLock  |11
>  AccessShareLock  |  2089
>  ExclusiveLock|46
>  RowExclusiveLock |81
>  RowShareLock |17
>  ShareLock| 4
>  ShareUpdateExclusiveLock | 5

That's not too helpful if you don't pay attention to what the lock is on;
it's likely that all the ExclusiveLocks are on transactions' own XIDs,
which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
are probably interesting though --- you should look to see what those
are on.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 12:17 PM, Jeff Frost  wrote:

>> 
>> already is quite helpful.
>> 
>> What are you doing on that system? Is there anything requiring large
>> amounts of access exclusive locks on the primary? Possibly large amounts
>> of temporary relations?
> 
> 
> The last time we did a 100% logging run, the peak temp table creation was 
> something like 120k/hr, but the replicas seemed able to keep up with that 
> just fine.
> 

Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

   mode   | count
--+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5

Seems to be relatively consistent.  Of course, it's hard to say what it looked 
like back when the issue began.





Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 11:39 AM, Andres Freund  wrote:

> On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
>> On Jun 30, 2014, at 10:29 AM, Soni M  wrote:
> 
>>> It is 
>>> 96.62%  postgres [.] StandbyReleaseLocks
>>> as Jeff said. It runs quite long time, more than 5 minutes i think
>>> 
>>> i also use hot standby. we have 4 streaming replica, some of them has 
>>> active connection some has not. this issue has last more than 4 days. On 
>>> one of the standby, above postgres process is the only process that consume 
>>> high cpu load.
>> 
>> compiled with -fno-omit-frame-pointer doesn't yield much more info:
> 
> You'd need to do perf record -ga instead of perf record -a to see
> additional information.
> 

Ah! That's right.

Here's how that looks:

Samples: 473K of event 'cpu-clock', Event count (approx.): 473738
+  68.42%  init  [kernel.kallsyms] [k] 
native_safe_halt
+  26.07%  postgres  postgres  [.] 
StandbyReleaseLocks
+   2.82%   swapper  [kernel.kallsyms] [k] 
native_safe_halt
+   0.19%   ssh  libcrypto.so.1.0.1e   [.] 
md5_block_asm_data_order
+   0.19%  postgres  postgres  [.] 
RecordIsValid
+   0.16%   ssh  libcrypto.so.1.0.1e   [.] RC4
+   0.10%  postgres  postgres  [.] 
hash_search_with_hash_value
+   0.06%  postgres  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.05%  init  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.04%  postgres  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.04%  init  [kernel.kallsyms] [k] 
finish_task_switch
+   0.04%  init  [kernel.kallsyms] [k] 
__do_softirq
+   0.04%   ssh  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.04%   ssh  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%  postgres  postgres  [.] PinBuffer
+   0.03%  init  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%   ssh  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.03%  postgres  postgres  [.] 
XLogReadBufferExtended
+   0.03%   ssh  ssh   [.] 
0x0002aa07
+   0.03%  init  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.03%   ssh  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.02%   ssh  [kernel.kallsyms] [k] 
__do_softirq
+   0.02%  postgres  libc-2.12.so  [.] 
_wordcopy_bwd_dest_aligned
+   0.02%  postgres  postgres  [.] mdnblocks
+   0.02%   ssh  libcrypto.so.1.0.1e   [.] 
0x000e25a1
+   0.02%   scp  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.02%   ssh  libc-2.12.so  [.] memcpy
+   0.02%  postgres  libc-2.12.so  [.] memcpy


> But:
> 
>> 76.24%  postgres   [.] StandbyReleaseLocks
> 
> already is quite helpful.
> 
> What are you doing on that system? Is there anything requiring large
> amounts of access exclusive locks on the primary? Possibly large amounts
> of temporary relations?


The last time we did a 100% logging run, the peak temp table creation was 
something like 120k/hr, but the replicas seemed able to keep up with that just 
fine.

Hopefully Soni can answer whether that has increased significantly since May.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Pavel Stehule
2014-06-30 20:34 GMT+02:00 Jeff Frost :

> On Jun 30, 2014, at 10:29 AM, Soni M  wrote:
>
>
>
>
> On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund 
> wrote:
>
>>
>> My guess it's a spinlock, probably xlogctl->info_lck via
>> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
>> to show up correctly in profiles...
>>
>>
For this kind of issues a systemtap or dtrace can be useful

http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu

you can identify what locking is a problem - please, use a google translate

Regards

Pavel


>  What worked for me was to build with -fno-omit-frame-pointer - that
>> normally shows the callers, even if it can't generate a proper symbol
>> name.
>>
>> Soni: Do you use Hot Standby? Are there connections active while you
>> have that problem? Any other processes with high cpu load?
>>
>> Greetings,
>>
>> Andres Freund
>>
>> --
>>  Andres Freund http://www.2ndQuadrant.com/
>> 
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
> It is
>  96.62%  postgres [.] StandbyReleaseLocks
>  as Jeff said. It runs quite long time, more than 5 minutes i think
>
> i also use hot standby. we have 4 streaming replica, some of them has
> active connection some has not. this issue has last more than 4 days. On
> one of the standby, above postgres process is the only process that consume
> high cpu load.
>
>
> compiled with -fno-omit-frame-pointer doesn't yield much more info:
>
>  76.24%  postgres   [.] StandbyReleaseLocks
>   2.64%  libcrypto.so.1.0.1e[.]
> md5_block_asm_data_order
>   2.19%  libcrypto.so.1.0.1e[.] RC4
>   2.17%  postgres   [.] RecordIsValid
>   1.20%  [kernel]   [k]
> copy_user_generic_unrolled
>   1.18%  [kernel]   [k] _spin_unlock_irqrestore
>   0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
>   0.87%  [kernel]   [k] __do_softirq
>   0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
>   0.69%  postgres   [.]
> hash_search_with_hash_value
>   0.68%  [kernel]   [k] fin
>
> However, this server started progressing through the WAL files quite a bit
> better before I finished compiling, so we'll leave it running with this
> version and see if there's more info available the next time it starts
> replaying slowly.
>
>
>


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
> On Jun 30, 2014, at 10:29 AM, Soni M  wrote:

> > It is 
> >  96.62%  postgres [.] StandbyReleaseLocks
> >  as Jeff said. It runs quite long time, more than 5 minutes i think
> > 
> > i also use hot standby. we have 4 streaming replica, some of them has 
> > active connection some has not. this issue has last more than 4 days. On 
> > one of the standby, above postgres process is the only process that consume 
> > high cpu load.
> 
> compiled with -fno-omit-frame-pointer doesn't yield much more info:

You'd need to do perf record -ga instead of perf record -a to see
additional information.

But:

>  76.24%  postgres   [.] StandbyReleaseLocks

already is quite helpful.

What are you doing on that system? Is there anything requiring large
amounts of access exclusive locks on the primary? Possibly large amounts
of temporary relations?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 10:29 AM, Soni M  wrote:

> 
> 
> 
> On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund  wrote:
> 
> My guess it's a spinlock, probably xlogctl->info_lck via
> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
> to show up correctly in profiles...
> 
> What worked for me was to build with -fno-omit-frame-pointer - that
> normally shows the callers, even if it can't generate a proper symbol
> name.
> 
> Soni: Do you use Hot Standby? Are there connections active while you
> have that problem? Any other processes with high cpu load?
> 
> Greetings,
> 
> Andres Freund
> 
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> It is 
>  96.62%  postgres [.] StandbyReleaseLocks
>  as Jeff said. It runs quite long time, more than 5 minutes i think
> 
> i also use hot standby. we have 4 streaming replica, some of them has active 
> connection some has not. this issue has last more than 4 days. On one of the 
> standby, above postgres process is the only process that consume high cpu 
> load.

compiled with -fno-omit-frame-pointer doesn't yield much more info:

 76.24%  postgres   [.] StandbyReleaseLocks
  2.64%  libcrypto.so.1.0.1e[.] md5_block_asm_data_order
  2.19%  libcrypto.so.1.0.1e[.] RC4
  2.17%  postgres   [.] RecordIsValid
  1.20%  [kernel]   [k] copy_user_generic_unrolled
  1.18%  [kernel]   [k] _spin_unlock_irqrestore
  0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
  0.87%  [kernel]   [k] __do_softirq
  0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
  0.69%  postgres   [.] hash_search_with_hash_value
  0.68%  [kernel]   [k] fin

However, this server started progressing through the WAL files quite a bit 
better before I finished compiling, so we'll leave it running with this version 
and see if there's more info available the next time it starts replaying slowly.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M
On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund 
wrote:

>
> My guess it's a spinlock, probably xlogctl->info_lck via
> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
> to show up correctly in profiles...
>
> What worked for me was to build with -fno-omit-frame-pointer - that
> normally shows the callers, even if it can't generate a proper symbol
> name.
>
> Soni: Do you use Hot Standby? Are there connections active while you
> have that problem? Any other processes with high cpu load?
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

It is
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think

i also use hot standby. we have 4 streaming replica, some of them has
active connection some has not. this issue has last more than 4 days. On
one of the standby, above postgres process is the only process that consume
high cpu load.

-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote:
> On 06/30/2014 05:46 PM, Soni M wrote:
> >Here's what 'perf top' said on streaming replica :
> >
> >Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
> >  95.97%  postgres [.] 0x002210f3
> 
> Ok, so it's stuck doing something.. Can you get build with debug symbols
> installed, so that we could see the function name?

My guess it's a spinlock, probably xlogctl->info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...

What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.

Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas  wrote:

> On 06/30/2014 05:46 PM, Soni M wrote:
>> Here's what 'perf top' said on streaming replica :
>> 
>> Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
>>  95.97%  postgres [.] 0x002210f3
> 
> Ok, so it's stuck doing something.. Can you get build with debug symbols 
> installed, so that we could see the function name?
> - Heikki
> 

Looks like StandbyReleaseLocks:

Samples: 10K of event 'cpu-clock', Event count (approx.): 8507
 89.21%  postgres  [.] StandbyReleaseLocks
  0.89%  libc-2.12.so  [.] __strstr_sse2
  0.83%  perf  [.] 0x0005f1e5
  0.74%  [kernel]  [k] kallsyms_expand_symbol
  0.52%  libc-2.12.so  [.] memchr
  0.47%  perf  [.] symbols__insert
  0.47%  [kernel]  [k] format_decode

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/30/2014 05:46 PM, Soni M wrote:

Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3


Ok, so it's stuck doing something.. Can you get build with debug symbols 
installed, so that we could see the function name?

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M
Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
 95.97%  postgres [.] 0x002210f3
  0.41%  perf [.] 0x0005f225
  0.39%  libc-2.12.so [.] __strstr_sse2
  0.22%  libc-2.12.so [.] memchr
  0.22%  [kernel] [k] kallsyms_expand_symbol
  0.18%  perf [.] symbols__insert
  0.18%  [kernel] [k] format_decode
  0.15%  libc-2.12.so [.] __GI___strcmp_ssse3
  0.13%  [kernel] [k] string
  0.12%  [kernel] [k] number
  0.12%  [kernel] [k] vsnprintf
  0.12%  libc-2.12.so [.] _IO_vfscanf
  0.11%  perf [.] dso__find_symbol
  0.11%  [kernel] [k] _spin_unlock_irqrestore
  0.10%  perf [.] hex2u64
  0.10%  postgres [.]
hash_search_with_hash_value
  0.09%  perf [.] rb_next
  0.08%  libc-2.12.so [.] memcpy
  0.07%  libc-2.12.so [.] __strchr_sse2
  0.07%  [kernel] [k] clear_page
  0.06%  [kernel] [k] strnlen
  0.05%  perf [.] perf_evsel__parse_sample
  0.05%  perf [.] rb_insert_color
  0.05%  [kernel] [k] pointer



On Mon, Jun 30, 2014 at 2:05 PM, Heikki Linnakangas  wrote:

> On 06/29/2014 03:43 PM, Soni M wrote:
>
>> top and sar says 100% cpu usage of one core, no sign of I/O wait.
>>
>
> Hmm, I wonder what it's doing then... If you have "perf" installed on the
> system, you can do "perf top" to get a quick overlook of where the CPU time
> is spent.
>
> - Heikki
>
>


-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/29/2014 03:43 PM, Soni M wrote:

top and sar says 100% cpu usage of one core, no sign of I/O wait.


Hmm, I wonder what it's doing then... If you have "perf" installed on 
the system, you can do "perf top" to get a quick overlook of where the 
CPU time is spent.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Soni M
Here's some data from standby (this is when replaying take 44seconds per
wal files):

2014-06-29 00:07:36.513 CDT,,,16682,,53af6f46.412a,44,,2014-06-28 20:43:34
CDT,,0,LOG,0,"restartpoint complete: wrote 63187 buffers (3.0%); 0
transaction log file(s) added, 0 removed, 0 recycled; write=209.170 s,
sync=0.482 s, total=209.667 s; sync files=644, longest=0.036 s,
average=0.000 s",""
2014-06-29 00:07:36.513 CDT,,,16682,,53af6f46.412a,45,,2014-06-28 20:43:34
CDT,,0,LOG,0,"recovery restart point at 27CE/170056A8","last completed
transaction was at log time 2014-06-27 13:39:00.542624-05"""
2014-06-29 00:28:59.678 CDT,,,16682,,53af6f46.412a,47,,2014-06-28 20:43:34
CDT,,0,LOG,0,"restartpoint complete: wrote 70942 buffers (3.4%); 0
transaction log file(s) added, 0 removed, 0 recycled; write=209.981 s,
sync=0.493 s, total=210.486 s; sync files=723, longest=0.156 s,
average=0.000 s",""
2014-06-29 00:28:59.678 CDT,,,16682,,53af6f46.412a,48,,2014-06-28 20:43:34
CDT,,0,LOG,0,"recovery restart point at 27CE/35002678","last completed
transaction was at log time 2014-06-27 13:42:05.121121-05"""

-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Soni M
top and sar says 100% cpu usage of one core, no sign of I/O wait. The
database is 1.5TB in size. RAM in master is 145GB, on slave it's differ,
some has about 16GB another has 145GB also.

nothing suspicious on standby's postgres log.

on master's postgres log :
WARNING,01000,"pgstat wait timeout",""
ERROR,57014,"canceling autovacuum task","automatic vacuum of
table ""consprod._consprod_replication.sl_event"""""
ERROR,57014,"canceling statement due to statement timeout",,"
"PARSE",2014-06-26 00:39:35 CDT,91/0,0,ERROR,25P02,"current transaction is
aborted, commands ignored until end of transaction block",,"select
1",,,""
"could not receive data from client: Connection reset by peer",""

the log files is big anyway. if you can specify some pattern to look at the
log, that would really help.


On Sun, Jun 29, 2014 at 3:31 PM, Heikki Linnakangas  wrote:

> On 06/29/2014 11:14 AM, Soni M wrote:
>
>> Everything works fine until on Thursday we have high load on master, and
>> after that every streaming replica lag further behind the master. Even on
>> night and weekend where all server load is low. But the slony slave is OK
>> at all.
>>
>
> What does 'top' on the standby say? Is the startup process using 100% of
> (one) CPU replaying records, or is it waiting for I/O? How large is the
> database, does it fit in RAM? Any clues in the system or PostgreSQL logs?
>
> - Heikki
>
>


-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Heikki Linnakangas

On 06/29/2014 11:14 AM, Soni M wrote:

Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.


What does 'top' on the standby say? Is the startup process using 100% of 
(one) CPU replaying records, or is it waiting for I/O? How large is the 
database, does it fit in RAM? Any clues in the system or PostgreSQL logs?


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance