Re: [GENERAL] pg_standby observation

2008-03-27 Thread Simon Riggs
On Sun, 2007-09-16 at 11:18 -0700, Jeff Davis wrote:
 On Sun, 2007-09-16 at 09:25 +0100, Simon Riggs wrote:
  Well, the definition of it working correctly is that a restored log
  file... message occurs. Even with archive_timeout set there could be
  various delays before that happens. We have two servers and a network
  involved, so the time might spike occasionally.
  
 
 The problem is, a restored log file message might appear in a
 different language or with a different prefix, depending on the
 settings. That makes it hard to come up with a general solution, so
 everyone has to use their own scripts that work with their logging
 configuration.
 
 In my particular case, I want to know if those logs aren't being
 replayed, regardless of whether it's a network problem or a postgres
 problem.

Currently pg_standby just sits there waiting. If you can specify the
events you wish to monitor and what action to take when that event
happens, I can make it do this.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_standby observation

2007-09-16 Thread Simon Riggs
On Thu, 2007-09-13 at 15:13 -0500, Erik Jones wrote:
 On Sep 13, 2007, at 3:02 PM, Jeff Davis wrote:
 
  On Thu, 2007-09-13 at 14:05 -0500, Erik Jones wrote:
  If you include the -d option pg_standby will emit logging info on
  stderr so you can tack on something like 2 logpath/standby.log.
  What it is lacking, however, is timestamps in the output when it
  successfully recovers a WAL file.  Was there something more ou were
  looking for?
 
  I don't think the timestamps will be a problem, I can always pipe it
  through something else.
 
  I think this will work, but it would be nice to have something  
  that's a
  little more well-defined and standardized to determine whether some  
  kind
  of error happens during replay.
 
 Right.  The problem there is that there really isn't anything  
 standardized about pg_standby, yet.  Or, if it is, it hasn't been  
 documented, yet.  Perhaps you could ask Simon about the possible  
 outputs on error conditions so that you'll have a definite list to  
 work with?

There's a few different kinds of errors pg_standby can generate, though
much of its behaviour depends upon the command line switches. 

I wasn't planning on documenting all possible failure states. We don't
do that anywhere else in the docs.

Happy to consider any requests for change. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] pg_standby observation

2007-09-16 Thread Simon Riggs
On Thu, 2007-09-13 at 11:38 -0700, Jeff Davis wrote:
 I think it would be useful if pg_standby (in version 8.3 contrib) could
 be observed in some way.
 
 Right now I use my own standby script, because every time it runs, it
 touches a file in a known location. That allows me to monitor that file,
 and if it is too stale, I know something must have gone wrong (I have an
 archive_timeout set), and I can send an SNMP trap.
 
 Would it be useful to add something similar to pg_standby? Is there a
 better way to detect a problem with a standby system, or a more
 appropriate place?
 
 The postgres logs do report this also, but it requires more care to
 properly intercept the restored log file ... from archive messages.

Well, the definition of it working correctly is that a restored log
file... message occurs. Even with archive_timeout set there could be
various delays before that happens. We have two servers and a network
involved, so the time might spike occasionally.

Touching a file doesn't really prove its working either.

Not sure what to suggest otherwise.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] pg_standby observation

2007-09-16 Thread Jeff Davis
On Sun, 2007-09-16 at 09:25 +0100, Simon Riggs wrote:
 Well, the definition of it working correctly is that a restored log
 file... message occurs. Even with archive_timeout set there could be
 various delays before that happens. We have two servers and a network
 involved, so the time might spike occasionally.
 

The problem is, a restored log file message might appear in a
different language or with a different prefix, depending on the
settings. That makes it hard to come up with a general solution, so
everyone has to use their own scripts that work with their logging
configuration.

In my particular case, I want to know if those logs aren't being
replayed, regardless of whether it's a network problem or a postgres
problem.

It would be nice if there was a more standardized way to see when
postgres replays a log successfully.

 Touching a file doesn't really prove its working either.
 

Right. It's the best I have now, however, and should detect most error
conditions.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] pg_standby observation

2007-09-13 Thread Erik Jones

On Sep 13, 2007, at 1:38 PM, Jeff Davis wrote:

I think it would be useful if pg_standby (in version 8.3 contrib)  
could

be observed in some way.

Right now I use my own standby script, because every time it runs, it
touches a file in a known location. That allows me to monitor that  
file,
and if it is too stale, I know something must have gone wrong (I  
have an

archive_timeout set), and I can send an SNMP trap.

Would it be useful to add something similar to pg_standby? Is there a
better way to detect a problem with a standby system, or a more
appropriate place?

The postgres logs do report this also, but it requires more care to
properly intercept the restored log file ... from archive messages.

Regards,
Jeff Davis


If you include the -d option pg_standby will emit logging info on  
stderr so you can tack on something like 2 logpath/standby.log.   
What it is lacking, however, is timestamps in the output when it  
successfully recovers a WAL file.  Was there something more ou were  
looking for?


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] pg_standby observation

2007-09-13 Thread Jeff Davis
On Thu, 2007-09-13 at 14:05 -0500, Erik Jones wrote:
 If you include the -d option pg_standby will emit logging info on  
 stderr so you can tack on something like 2 logpath/standby.log.   
 What it is lacking, however, is timestamps in the output when it  
 successfully recovers a WAL file.  Was there something more ou were  
 looking for?

I don't think the timestamps will be a problem, I can always pipe it
through something else. 

I think this will work, but it would be nice to have something that's a
little more well-defined and standardized to determine whether some kind
of error happens during replay.

Ultimately, what I'm trying to do is make it so that pgsnmpd can monitor
this, and trap if a problem occurs. In order for pgsnmpd to do this in a
way that works for a large number of people, it can't make too many
assumptions about logging options, etc.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] pg_standby observation

2007-09-13 Thread Erik Jones


On Sep 13, 2007, at 3:02 PM, Jeff Davis wrote:


On Thu, 2007-09-13 at 14:05 -0500, Erik Jones wrote:

If you include the -d option pg_standby will emit logging info on
stderr so you can tack on something like 2 logpath/standby.log.
What it is lacking, however, is timestamps in the output when it
successfully recovers a WAL file.  Was there something more ou were
looking for?


I don't think the timestamps will be a problem, I can always pipe it
through something else.

I think this will work, but it would be nice to have something  
that's a
little more well-defined and standardized to determine whether some  
kind

of error happens during replay.


Right.  The problem there is that there really isn't anything  
standardized about pg_standby, yet.  Or, if it is, it hasn't been  
documented, yet.  Perhaps you could ask Simon about the possible  
outputs on error conditions so that you'll have a definite list to  
work with?


Ultimately, what I'm trying to do is make it so that pgsnmpd can  
monitor
this, and trap if a problem occurs. In order for pgsnmpd to do this  
in a

way that works for a large number of people, it can't make too many
assumptions about logging options, etc.



Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly