Thanks and Happy New Year, Matthys! We really appreciate having your on-site
support!!!
Unfortunately, we're still not able to access this machine ("r2d020671", aka
"pf4") via the network. Here is what we see in the log file for that system:
> Dec 26 21:13:58 r2d020671 -- MARK --
> Dec 28 02:26:52 syslogd 1.5.0#6: restart.
> [...]
> Dec 28 03:06:52 r2d020671 -- MARK --
> Dec 30 21:24:08 syslogd 1.5.0#6: restart.
> [...]
> Dec 30 21:44:08 r2d020671 -- MARK --
> Jan 2 03:01:47 syslogd 1.5.0#6: restart.
> [...]
> Jan 2 03:41:47 r2d020671 -- MARK --
> Jan 5 11:14:52 syslogd 1.5.0#6: restart.
> [...]
> Jan 5 11:14:53 r2d020671 sshd[543]: Server listening on 0.0.0.0 port 22.
The "MARK" messages get logged after 20 minutes on logging inactivity and the
"syslogd ... restart" lines get logged when the machine reboots. The final
"sshd" line is the last line in the log file. The timestamps are SAST (UTC+2).
Since we didn't get the expected "MARK" line at 11:34 I can only assume that
connectivity was lost sometime between 11:14:53 and 11:34:53.
It would really help our understanding of the problem if you could please
provide some more details of your visit to the PAPER container (e.g. time of
day, duration of visit, actions taken, etc). I suspect it's either a power
problem, a network problem, or a system problem (e.g. bad RAM). The problem is
isolated to "pf4" (or its associated cables); all the other ROACH2s seem fine.
Thanks again,
Dave
On Jan 5, 2015, at 2:36 AM, Matthys Maree wrote:
> Hi
>
> Roach#4 back ON.
>
> Probably the power cable.
>
> Cooling still fine inside container.
>
> Groete
>
> Matthys Maree
> SKA South Africa – Carnarvon
>
> Tel: 021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
> Web : www.ska.ac.za
>
> From: danny jacobs [mailto:[email protected]]
> Sent: 31 December 2014 06:57 AM
> To: David DeBoer; PAPER List; Matthys Maree; Matt Dexter
> Subject: Fwd: recalcitrant roach
>
> Hi Matthys (cc PAPER),
>
> One of our ROACHs has stopped responding. A power issue seems most likely.
> What with the heat cycling, its possible that its power cable has loosened
> (or maybe even the ethernet). A failing power supply is also possible. Could
> you, or someone like you, double check that ROACH #4 is getting power and
> shows an ethernet light?
>
> Thanks,
>
> ~Danny
>
>
>
>
> ---------- Forwarded message ----------
> From: David MacMahon <[email protected]>
> Date: Tue, Dec 30, 2014 at 12:21 PM
> Subject: Re: recalcitrant roach
> To: danny jacobs <[email protected]>
> Cc: Matt Dexter <[email protected]>
>
>
> Hi, Danny,
>
> pf4 seems to be having problems. These problems seem to have started on
> December 19. The roach2s log a "syslog restart" line when they boot. I've
> extracted the December restart messages from the log files:
>
> pf1:2014 Dec 19 08:46:09 syslogd 1.5.0#6: restart.
> pf2:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
> pf3:2014 Dec 19 08:46:17 syslogd 1.5.0#6: restart.
> pf5:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
> pf6:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
> pf7:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
> pf8:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>
> pf4:2014 Dec 19 09:45:58 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 19 10:28:00 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 19 11:36:52 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 19 15:10:14 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 19 16:28:23 syslogd 1.5.0#6: restart.
>
> pf4:2014 Dec 20 23:17:49 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 23 23:55:54 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 26 20:33:59 syslogd 1.5.0#6: restart.
> pf4:2014 Dec 28 02:26:52 syslogd 1.5.0#6: restart.
>
> pf1:2014 Dec 30 18:26:41 syslogd 1.5.0#6: restart.
> pf2:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
> pf3:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
> pf5:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
> pf6:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
> pf7:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
> pf8:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>
> As you can see, pf4 did not restart on Dec 19 with the rest of the roach2s at
> 08:46. It restarted almost an hour later at 9:45. It then restarted several
> times throughout the day on the 19th. It also restarted sporadically a few
> days since then with the most recent being on Dec 28 at 02:26. The last log
> message for pf4 was Dec 28 03:06. It went down sometime in the next 20
> minutes after that.
>
> I'm guessing it's a flaky power issue. Hopefully just power cord that got
> loose at one end or the other during the shutdown. If it's not that then I'd
> guess it's something internal to the power supply?
>
> I've CC'd Matt in case he has any other ideas.
>
> It would probably be a good idea to have someone check on the power cables.
>
> Thanks,
> Dave
>
> On Dec 30, 2014, at 8:32 AM, danny jacobs wrote:
>
> > Hi Dave,
> >
> > I thought I'd give PAPER a boot up and see if we could break the A/C but it
> > looks like we may have a dead roach. #4 doesn't respond to pings even
> > after power cycling. Just in case there was some mislabeling on the
> > roachpdu apc page I even rebooted all of them. All go down, all come
> > back... except for #4.
> >
> > Could you maybe take a look and confirm?
> >
> > Thanks,
> > ~Danny
> >
> >
> > --
> >
> > National Science Foundation Fellow
> > Arizona State University
> > School of Earth and Space Exploration
> > Low Frequency Cosmology
> > Phone: (505) 500 4521
> > Homepage: http://loco.lab.asu.edu/danny_jacobs/
>
>
>
>
> --
>
> National Science Foundation Fellow
> Arizona State University
> School of Earth and Space Exploration
> Low Frequency Cosmology
> Phone: (505) 500 4521
> Homepage: http://loco.lab.asu.edu/danny_jacobs/