Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Noah Misch
On Mon, Jan 19, 2015 at 12:05:01PM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
  Sure, but the log isn't invisible. As mentioned one paragraph above, I
  don't think it's likely to ever be noticed in the client code in the
  cases where it happens in production.
 
  Yes, that is my feeling as well.
 
 Meh.  I still don't like it, but I guess I'm outvoted.  Unless there are
 further votes, what we have at this point is just:
 
 - elog(WARNING, pgstat wait timeout);
 + ereport(LOG, (errmsg(using stale statistics instead of current 
 ones because stats collector is not responding)));
 
 with no conditionality for pgstat_vacuum_stat vs. other callers.

That is satisfactory to me, too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
 On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  After looking at the code, the minimum-change alternative would be more or
  less as attached: first, get rid of the long-obsolete proposition that
  autovacuum workers need fresher-than-usual stats; second, allow
  pgstat_vacuum_stat to accept stats that are moderately stale (the number
  given below allows them to be up to 50 seconds old); and third, suppress
  wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
  point is what we need to avoid unnecessary buildfarm failures.  The second
  point addresses the idea that we don't need to stress the stats collector
  too much for this.

 I think this is too much of a good thing.  I don't see any reason why
 autovacuum's statistics need to be fresher than normal, but I also
 don't see any reason why they need to be less fresh.  I think
 suppressing the warning is a good idea, but why only suppress it for
 autovacuum?  How about just knocking the level down to, say, DEBUG1?

 +1 for just using LOG - which by default does not end up on client
 machines. In contrast to WARNING.

Yeah, that doesn't seem like a bad idea, either.  The message seems
much more likely to be of interest to the DBA than the user; the DBA
can use the message to diagnose an overloaded I/O subsystem (which I
think is the usual cause of this problem) whereas the only point of
likely interest to the user is that their query ran slowly (which they
know without the message).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If that were true I'd agree with you, but it's false on its face.
 A user who is actually examining the statistics might well want to
 know if they're significantly out of date.  A very relevant example
 is that I've always wondered how come, when we see buildfarm failures
 in the stats regression test, they always appear in the form of
 output differences that indicate that the session did not see the
 expected stats update --- but there's never a timeout warning printed,
 which indicates that whatever the cause is, it ain't that.

Sure, but nobody who is not a developer is going to care about that.
A typical user who sees pgstat wait timeout, or doesn't, isn't going
to be able to make anything at all out of that.

 I'd be fine with changing the warning to LOG level rather than
 suppressing it entirely for the specific case of pgstat_vacuum_stat;
 but I do want to distinguish that case, wherein it's fair to conclude
 that obsolete stats aren't too worrisome, from other cases where no
 such conclusion is justified.

But I can live with this compromise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Sure, but nobody who is not a developer is going to care about that.
 A typical user who sees pgstat wait timeout, or doesn't, isn't going
 to be able to make anything at all out of that.

Possibly we need to improve the wording of that error message then.
When it was written, we really assumed that it was a can't-happen case
and so didn't spend much effort on it.  Perhaps it should become a
translatable ereport phrased like WARNING: using stale statistics
instead of current ones because stats collector is not responding.

(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well.  But that's future-feature territory.)

 I'd be fine with changing the warning to LOG level rather than
 suppressing it entirely for the specific case of pgstat_vacuum_stat;
 but I do want to distinguish that case, wherein it's fair to conclude
 that obsolete stats aren't too worrisome, from other cases where no
 such conclusion is justified.

 But I can live with this compromise.

Is that OK with everybody?  Going once, going twice ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
 I think this is too much of a good thing.  I don't see any reason why
 autovacuum's statistics need to be fresher than normal, but I also
 don't see any reason why they need to be less fresh.  I think
 suppressing the warning is a good idea, but why only suppress it for
 autovacuum?  How about just knocking the level down to, say, DEBUG1?

 +1 for just using LOG - which by default does not end up on client
 machines. In contrast to WARNING.

 Yeah, that doesn't seem like a bad idea, either.  The message seems
 much more likely to be of interest to the DBA than the user; the DBA
 can use the message to diagnose an overloaded I/O subsystem (which I
 think is the usual cause of this problem) whereas the only point of
 likely interest to the user is that their query ran slowly (which they
 know without the message).

If that were true I'd agree with you, but it's false on its face.
A user who is actually examining the statistics might well want to
know if they're significantly out of date.  A very relevant example
is that I've always wondered how come, when we see buildfarm failures
in the stats regression test, they always appear in the form of
output differences that indicate that the session did not see the
expected stats update --- but there's never a timeout warning printed,
which indicates that whatever the cause is, it ain't that.

I'd be fine with changing the warning to LOG level rather than
suppressing it entirely for the specific case of pgstat_vacuum_stat;
but I do want to distinguish that case, wherein it's fair to conclude
that obsolete stats aren't too worrisome, from other cases where no
such conclusion is justified.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Andres Freund
On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
 On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  After looking at the code, the minimum-change alternative would be more or
  less as attached: first, get rid of the long-obsolete proposition that
  autovacuum workers need fresher-than-usual stats; second, allow
  pgstat_vacuum_stat to accept stats that are moderately stale (the number
  given below allows them to be up to 50 seconds old); and third, suppress
  wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
  point is what we need to avoid unnecessary buildfarm failures.  The second
  point addresses the idea that we don't need to stress the stats collector
  too much for this.
 
 I think this is too much of a good thing.  I don't see any reason why
 autovacuum's statistics need to be fresher than normal, but I also
 don't see any reason why they need to be less fresh.  I think
 suppressing the warning is a good idea, but why only suppress it for
 autovacuum?  How about just knocking the level down to, say, DEBUG1?

+1 for just using LOG - which by default does not end up on client
machines. In contrast to WARNING.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Andres Freund
On 2015-01-19 11:28:41 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
  Possibly we need to improve the wording of that error message then.
  When it was written, we really assumed that it was a can't-happen case
  and so didn't spend much effort on it.  Perhaps it should become a
  translatable ereport phrased like WARNING: using stale statistics
  instead of current ones because stats collector is not responding.
 
  Yes, that seems like a good message improvement.
 
  It's not like making it LOG makes the message invisible...
 
 Uh, yes it does.  So far as the user is concerned anyway.

Sure, but the log isn't invisible. As mentioned one paragraph above, I
don't think it's likely to ever be noticed in the client code in the
cases where it happens in production.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Andres Freund
On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'd be fine with changing the warning to LOG level rather than
  suppressing it entirely for the specific case of pgstat_vacuum_stat;
  but I do want to distinguish that case, wherein it's fair to conclude
  that obsolete stats aren't too worrisome, from other cases where no
  such conclusion is justified.
 
  But I can live with this compromise.
 
 Is that OK with everybody?  Going once, going twice ...

I can live with the compromise as well, but I'd rather change it to
always be of LOG priority. LOG is more likely to end up in the log and
that's where it's actually likely to be noticed.  In most of the cases
WARNINGs going to the client won't be noticed as this error is much more
likely on servers with a high load caused by programs than during
interactive use.

  Sure, but nobody who is not a developer is going to care about that.
  A typical user who sees pgstat wait timeout, or doesn't, isn't going
  to be able to make anything at all out of that.
 
 Possibly we need to improve the wording of that error message then.
 When it was written, we really assumed that it was a can't-happen case
 and so didn't spend much effort on it.  Perhaps it should become a
 translatable ereport phrased like WARNING: using stale statistics
 instead of current ones because stats collector is not responding.

Yes, that seems like a good message improvement.

It's not like making it LOG makes the message invisible...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Sure, but the log isn't invisible. As mentioned one paragraph above, I
 don't think it's likely to ever be noticed in the client code in the
 cases where it happens in production.

 Yes, that is my feeling as well.

Meh.  I still don't like it, but I guess I'm outvoted.  Unless there are
further votes, what we have at this point is just:

-   elog(WARNING, pgstat wait timeout);
+   ereport(LOG, (errmsg(using stale statistics instead of current 
ones because stats collector is not responding)));

with no conditionality for pgstat_vacuum_stat vs. other callers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 11:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Sure, but nobody who is not a developer is going to care about that.
 A typical user who sees pgstat wait timeout, or doesn't, isn't going
 to be able to make anything at all out of that.

 Possibly we need to improve the wording of that error message then.
 When it was written, we really assumed that it was a can't-happen case
 and so didn't spend much effort on it.  Perhaps it should become a
 translatable ereport phrased like WARNING: using stale statistics
 instead of current ones because stats collector is not responding.

I'm still not completely convinced it deserves to be a WARNING, but I
definitely think turning it into a translatable error message is the
right call.  Calling this a can't happen case is clearly ridiculous
at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-19 11:28:41 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
  Possibly we need to improve the wording of that error message then.
  When it was written, we really assumed that it was a can't-happen case
  and so didn't spend much effort on it.  Perhaps it should become a
  translatable ereport phrased like WARNING: using stale statistics
  instead of current ones because stats collector is not responding.

  Yes, that seems like a good message improvement.

  It's not like making it LOG makes the message invisible...

 Uh, yes it does.  So far as the user is concerned anyway.

 Sure, but the log isn't invisible. As mentioned one paragraph above, I
 don't think it's likely to ever be noticed in the client code in the
 cases where it happens in production.

Yes, that is my feeling as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
 Possibly we need to improve the wording of that error message then.
 When it was written, we really assumed that it was a can't-happen case
 and so didn't spend much effort on it.  Perhaps it should become a
 translatable ereport phrased like WARNING: using stale statistics
 instead of current ones because stats collector is not responding.

 Yes, that seems like a good message improvement.

 It's not like making it LOG makes the message invisible...

Uh, yes it does.  So far as the user is concerned anyway.  The entire
point of this discussion is to prevent the message from showing up in
buildfarm output, remember?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Robert Haas
On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 After looking at the code, the minimum-change alternative would be more or
 less as attached: first, get rid of the long-obsolete proposition that
 autovacuum workers need fresher-than-usual stats; second, allow
 pgstat_vacuum_stat to accept stats that are moderately stale (the number
 given below allows them to be up to 50 seconds old); and third, suppress
 wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
 point is what we need to avoid unnecessary buildfarm failures.  The second
 point addresses the idea that we don't need to stress the stats collector
 too much for this.

I think this is too much of a good thing.  I don't see any reason why
autovacuum's statistics need to be fresher than normal, but I also
don't see any reason why they need to be less fresh.  I think
suppressing the warning is a good idea, but why only suppress it for
autovacuum?  How about just knocking the level down to, say, DEBUG1?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Noah Misch
On Sun, Jan 18, 2015 at 07:02:54PM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
  After looking at the code, the minimum-change alternative would be more or
  less as attached: first, get rid of the long-obsolete proposition that
  autovacuum workers need fresher-than-usual stats; second, allow
  pgstat_vacuum_stat to accept stats that are moderately stale (the number
  given below allows them to be up to 50 seconds old); and third, suppress
  wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
  point is what we need to avoid unnecessary buildfarm failures.  The second
  point addresses the idea that we don't need to stress the stats collector
  too much for this.
 
  Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
  different problems and operate independently of #3.  A patch covering #3 
  alone
  sounds attractive.
 
 [ raised eyebrow... ] In your previous message, you were advocating *for*
 a change that was more invasive than this one.  Why the change of heart?

I phrased that proposal based on a wrong belief that the collector writes the
stats file regularly in any case.  Vacuuming a 49s-old stats file without even
trying to get a fresh one might or might not improve PostgreSQL, but it's
dissimilar to what I had in mind.  I did not advocate for #1 at any point.

 In particular, I thought we'd already agreed to point #1.

Nope.  You and Alvaro ACKed it, then Heikki NACKed it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Noah Misch
On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
 To get back to that original complaint about buildfarm runs failing,
 I notice that essentially all of those failures are coming from wait
 timeout warnings reported by manual VACUUM commands.  Now, VACUUM itself
 has no need to read the stats files.  What's actually causing these
 messages is failure to get a timely response in pgstat_vacuum_stat().
 So let me propose a drastic solution: let's dike out this bit in vacuum.c:
 
 /*
  * Send info about dead objects to the statistics collector, unless we are
  * in autovacuum --- autovacuum.c does this for itself.
  */
 if ((vacstmt-options  VACOPT_VACUUM)  !IsAutoVacuumWorkerProcess())
 pgstat_vacuum_stat();
 
 This would have the effect of transferring all responsibility for
 dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
 that'd be just fine.  It might be less fine though for people who
 disable autovacuum, if there still are any.

Like Robert, I don't care for that.

 Or, much more simply, we could conclude that it's not that important
 if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
 the code so that we don't bleat when the file-reading request comes from
 that source.  This should be a back-patchable fix, whereas #2 above might
 be a bit too complicated for that.

This, however, feels like a clear improvement.  When every VACUUM does
pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
entries that became obsolete in the preceding second or so.  In fact, rather
than merely not bleating when the wait times out, let's not wait at all.  I
don't favor VACUUM of a small table taking 12s because it spent 2s on the
table and 10s waiting to garbage collect a piping-hot stats file.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Noah Misch
On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
  Or, much more simply, we could conclude that it's not that important
  if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
  the code so that we don't bleat when the file-reading request comes from
  that source.  This should be a back-patchable fix, whereas #2 above might
  be a bit too complicated for that.
 
  This, however, feels like a clear improvement.  When every VACUUM does
  pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
  entries that became obsolete in the preceding second or so.  In fact, rather
  than merely not bleating when the wait times out, let's not wait at all.  I
  don't favor VACUUM of a small table taking 12s because it spent 2s on the
  table and 10s waiting to garbage collect a piping-hot stats file.
 
 I think that would be going too far: if an autovac wakes up in the dead of
 night, it should not use an ancient stats file merely because no one else
 has asked for stats recently.  But if it never waits at all, it'll be
 at the mercy of whatever is already on disk.

Whoops; I underestimated the upper bound on time between stats file writes.

 After looking at the code, the minimum-change alternative would be more or
 less as attached: first, get rid of the long-obsolete proposition that
 autovacuum workers need fresher-than-usual stats; second, allow
 pgstat_vacuum_stat to accept stats that are moderately stale (the number
 given below allows them to be up to 50 seconds old); and third, suppress
 wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
 point is what we need to avoid unnecessary buildfarm failures.  The second
 point addresses the idea that we don't need to stress the stats collector
 too much for this.

Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
different problems and operate independently of #3.  A patch covering #3 alone
sounds attractive.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
 After looking at the code, the minimum-change alternative would be more or
 less as attached: first, get rid of the long-obsolete proposition that
 autovacuum workers need fresher-than-usual stats; second, allow
 pgstat_vacuum_stat to accept stats that are moderately stale (the number
 given below allows them to be up to 50 seconds old); and third, suppress
 wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
 point is what we need to avoid unnecessary buildfarm failures.  The second
 point addresses the idea that we don't need to stress the stats collector
 too much for this.

 Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
 different problems and operate independently of #3.  A patch covering #3 alone
 sounds attractive.

[ raised eyebrow... ] In your previous message, you were advocating *for*
a change that was more invasive than this one.  Why the change of heart?

In particular, I thought we'd already agreed to point #1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
 To get back to that original complaint about buildfarm runs failing,
 I notice that essentially all of those failures are coming from wait
 timeout warnings reported by manual VACUUM commands.  Now, VACUUM itself
 has no need to read the stats files.  What's actually causing these
 messages is failure to get a timely response in pgstat_vacuum_stat().
 So let me propose a drastic solution: let's dike out this bit in vacuum.c:
 
 /*
 * Send info about dead objects to the statistics collector, unless we are
 * in autovacuum --- autovacuum.c does this for itself.
 */
 if ((vacstmt-options  VACOPT_VACUUM)  !IsAutoVacuumWorkerProcess())
 pgstat_vacuum_stat();
 
 This would have the effect of transferring all responsibility for
 dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
 that'd be just fine.  It might be less fine though for people who
 disable autovacuum, if there still are any.

 Like Robert, I don't care for that.

I obviously failed to make it clear enough that that was meant as a straw
man, lets-think-outside-the-box proposal.

 Or, much more simply, we could conclude that it's not that important
 if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
 the code so that we don't bleat when the file-reading request comes from
 that source.  This should be a back-patchable fix, whereas #2 above might
 be a bit too complicated for that.

 This, however, feels like a clear improvement.  When every VACUUM does
 pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
 entries that became obsolete in the preceding second or so.  In fact, rather
 than merely not bleating when the wait times out, let's not wait at all.  I
 don't favor VACUUM of a small table taking 12s because it spent 2s on the
 table and 10s waiting to garbage collect a piping-hot stats file.

I think that would be going too far: if an autovac wakes up in the dead of
night, it should not use an ancient stats file merely because no one else
has asked for stats recently.  But if it never waits at all, it'll be
at the mercy of whatever is already on disk.

After looking at the code, the minimum-change alternative would be more or
less as attached: first, get rid of the long-obsolete proposition that
autovacuum workers need fresher-than-usual stats; second, allow
pgstat_vacuum_stat to accept stats that are moderately stale (the number
given below allows them to be up to 50 seconds old); and third, suppress
wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
point is what we need to avoid unnecessary buildfarm failures.  The second
point addresses the idea that we don't need to stress the stats collector
too much for this.

regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0cf4988..b030e1c 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** static void pgstat_write_statsfiles(bool
*** 259,265 
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static bool pgstat_write_statsfile_needed(void);
--- 259,265 
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(bool for_vacuum_stat);
  static void pgstat_read_current_status(void);
  
  static bool pgstat_write_statsfile_needed(void);
*** pgstat_vacuum_stat(void)
*** 951,959 
  
  	/*
  	 * If not done for this transaction, read the statistics collector stats
! 	 * file into some hash tables.
  	 */
! 	backend_read_statsfile();
  
  	/*
  	 * Read pg_database and make a list of OIDs of all existing databases
--- 951,962 
  
  	/*
  	 * If not done for this transaction, read the statistics collector stats
! 	 * file into some hash tables.  We are willing to accept stats that are
! 	 * more stale than usual here.  (Since VACUUM never runs in a transaction
! 	 * block, this cannot result in accepting stale stats that would be used
! 	 * for other purposes later in the transaction.)
  	 */
! 	backend_read_statsfile(true);
  
  	/*
  	 * Read pg_database and make a list of OIDs of all existing databases
*** pgstat_fetch_stat_dbentry(Oid dbid)
*** 2182,2188 
  	 * If not done for this transaction, read the statistics collector stats
  	 * file into some hash tables.
  	 */
! 	backend_read_statsfile();
  
  	/*
  

[HACKERS] Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-27 Thread Noah Misch
On Thu, Dec 25, 2014 at 09:14:36PM +0100, Andres Freund wrote:
 My guess is that a checkpoint happened at that time. Maybe it'd be a
 good idea to make pg_regress start postgres with log_checkpoints
 enabled? My guess is that we'd find horrendous 'sync' times.

+1, independent of $SUBJECT.  How about log_autovacuum_min_duration=0, too?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers