On 12:07 am, jans...@parc.com wrote:
exar...@twistedmatrix.com wrote:
On 08:31 pm, jans...@parc.com wrote:
>My Intel Snow Leopard 2 build slave has gone into outer-space again.
>
>When I look at it, I see buildslave taking up most of a CPU (80%), and
>nothing much else going on.  The twistd log says:
>
>[... much omitted ...]
>2011-04-04 08:35:47-0700 [-] sending app-level keepalive
>2011-04-04 08:45:47-0700 [-] sending app-level keepalive
>2011-04-04 08:55:47-0700 [-] sending app-level keepalive
>2011-04-04 09:03:15-0700 [Broker,client] lost remote
>2011-04-04 09:03:15-0700 [Broker,client] lost remote
>2011-04-04 09:03:15-0700 [Broker,client] lost remote
>2011-04-04 09:03:15-0700 [Broker,client] lost remote
>2011-04-04 09:03:15-0700 [Broker,client] lost remote
> 2011-04-04 09:03:15-0700 [Broker,client] Lost connection to
> dinsdale.python.org:9020
> 2011-04-04 09:03:15-0700 [Broker,client]
> <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry
> in 3 seconds
> 2011-04-04 09:03:15-0700 [Broker,client] Stopping factory
> <buildslave.bot.BotFactory instance at 0x1016299e0>
> 2011-04-04 09:03:18-0700 [-] Starting factory
> <buildslave.bot.BotFactory instance at 0x1016299e0>
>2011-04-04 09:03:18-0700 [-] Connecting to dinsdale.python.org:9020
> 2011-04-04 09:03:18-0700 [Uninitialized] Connection to
> dinsdale.python.org:9020 failed: Connection Refused
> 2011-04-04 09:03:18-0700 [Uninitialized]
> <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry
> in 8 seconds
> 2011-04-04 09:03:18-0700 [Uninitialized] Stopping factory
> <buildslave.bot.BotFactory instance at 0x1016299e0>
> 2011-04-04 09:03:27-0700 [-] Starting factory
> <buildslave.bot.BotFactory instance at 0x1016299e0>
>2011-04-04 09:03:27-0700 [-] Connecting to dinsdale.python.org:9020
>
>So it's been spinning its wheels for 3 days.

Does this mean that the "2011-04-04 09:03:27-0700 [-] Connecting to
dinsdale.python.org:9020" message in the logs is the last one you see
until you restart the slave?

Yes, that's the last line in the file.
Or does it mean that the logs go on and on for three days with these
"Connecting to dinsdale...." / "Connection Refused" / "... will retry
in N seconds" cycles, thousands and thousands of times?

Well, it's doing something, chewing up cycles, but there's only one
"Connecting" line at the end of the log file.

That's very interesting. It may be worth doing some gdb or dtrace investigation next time it gets into this state.
What does the buildmaster's info page for this slave say when the
slave is in this state?  In particular, what does it say about
"connects/hour"?

Ah, good question.  Too bad I restarted the slave after I sent out my
info.  Is there some way to recover that from earlier?  If not, it will
undoubtedly fail again in a few days.

If the master logs are available, that would provide some information. Otherwise, I think waiting for it to happen again is the thing to do.

Since there were no other messages in the log file, I expect the connects/hour value will be low - perhaps 0.

Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to