Re: [HACKERS] buildfarm animals and 'snapshot too old'

Andrew Dunstan Thu, 15 May 2014 13:57:48 -0700


On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote:

On 05/15/2014 07:46 PM, Andrew Dunstan wrote:

On 05/15/2014 12:43 PM, Tomas Vondra wrote:

Hi all,


today I got a few of errors like these (this one is from last week,
though):

     Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
     Content:
     snapshot to old: Wed May  7 04:36:57 2014 GMT

on the new buildfarm animals. I believe it was my mistake (incorrectly
configured local git mirror), but it got me thinking about how this will
behave with the animals running CLOBBER_CACHE_RECURSIVELY.

If I understand the Perl code correctly, it does this:

(1) update the repository
(2) run the tests
(3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
(4) fail if older

Now, imagine that the test runs for days/weeks. This pretty much means
it's wasted, because the results will be thrown away anyway, no?


The 24 hours runs from the time of the latest commit on the branch in
question, not the current time, but basically yes.

We've never had machines with runs that long. The longest in recent
times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
around 4.5 hours. But we have had misconfigured machines reporting
unbelievable snapshot times.  I'll take a look and see if we can tighten
up the sanity check. It's worth noting that one thing friarbird does is
skip the install-check stage - it's almost certainly not going to have
terribly much interesting to tell us from that, given it has already run
a plain "make check".

well I'm not sure about about "misconfigured" but both my personal
buildfarm members and pginfra run ones (like gaibasaurus) got errors
complaining about "snapshot too old" in the past for long running tests
so I'm not sure it is really a "we never had machine with runs that
long". So maybe we should not reject those submissions at submission
time but rather mark them clearly on the dashboard and leave the final
interpretation to a human...

That's a LOT harder and more work to arrange. Frankly, there are moreimportant things to do.

I would like to know the circumstances of these very long runs. I drivesome of my VMs pretty hard on pretty modest hardware, and they don'tcome close to running 24 hours.


The current behaviour goes back to this commit from December 2011:

   commit a8b5049e64f9cb08f8e165d0737139dab74e3bce
   Author: Andrew Dunstan <and...@dunslane.net>
   Date:   Wed Dec 14 14:38:44 2011 -0800

        Use git snapshot instead of fixed 10 day timeout.

        The sanity checks made sure that an animal wasn't submitting a
        snapshot that was too old. But sometimes an old branch doesn't
        get any changes for more than 10 days. So accept a snapshot that
        is not more than 1 day older than the last known snapshot. Per
        complaint from Stefan.

I'm prepared to increase the sanity check time if there is a seriousdemand for it, but I'd like to know what to increase it to.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

Reply via email to