I wrote:
Tom Lane wrote:
"Andrew Dunstan" <[EMAIL PROTECTED]> writes:
It could certainly be done. In general, I have generally taken the view
that owners have the responsibility for monitoring their own machines.
Sure, but providing them tools to do that seems within buildfarm's
purview.

For some types of failure, the buildfarm script could make a local
notification without bothering the server --- but a timeout on the
server side would cover a wider variety of failures, including "this
machine is dead and ought to be removed from the farm".


Nothing gets removed. If a machine does not report on a branch for 30 days
it drops off the dashboard, but apart from that it is a retained historic
aretfact. This buildup in history has been gradually slowing down the
dashboard, in fact, but Ian Barwick tells me that he has rewritten my
lousy SQL to make it fast again, so we'll soon get that working better.

Anyway, I think we can do something fairly simply for these alarms. We'll
just have a special stanza in the config file, and a cron job that checks,
say, once a day, to see if we have exceeded the alarm period on any
machine/branch combination.


OK, I have a gadget to do this in place.


It looks at the config of the last build registered on each branch for a stanza called 'alerts' that would look like this:

 alerts => {
   HEAD => { alert_after => 24, alert_every => 48 },
   REL8_1_STABLE => { alert_after => 168, alert_every => 48 },
 }

The settings are in hours, so this says that if we haven't seen a HEAD build in 1 day or a stable branch build in 1 week, alert the owner by email, and keep repeating the alert in each case every 2 days.

If some intrepid buildfarm owner wants to test this out by using low settings that would trigger an alert that would be good - the cron job runs every hour.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to