Thanks, Matthys, that's very helpful!  The ROACH2s are configured to power on 
automatically when VAC power is applied.  Since you had to push the power 
button to turn it on then I suspect something internal to that ROACH2 is 
unwell.  It sounds like the power cables are (and were) securely connected.

It would be great if you could please check in on it again tomorrow (Jan 7).  
The things that will be of most interest to us are:

1) Is pf4 currently powered on when you arrive at the container?  This will 
tell us whether it is a power problem or  communication problem.  Depending on 
the current "powered on" status, do either 2A or 2B...

2A) If pf4 is currently off, does pushing the power button turn it on?

2B) If pf4 is currently on, unplug its kettle plug, wait a few seconds, 
reconnect the kettle plug.  Does it turn on automatically when the power cable 
is reconnected?  If not, does pushing the power button turn it on?

Assuming that pf4 is powered up after doing 2A or 2B, please wait a few minutes 
for it to boot.  I'm not sure how you can tell that the boot has completed 
(maybe the network LEDs will stop their rapid blinking?), but I think 3 minutes 
should be adequate.

3) If it is not too difficult to access, you could try swapping the kettle 
plugs for pf4 and pf5.  That way if the symptom moves to pf5 we will know it is 
a problem in the PDU (or power cable?).  If the symptom stays with pf4 then 
we'll know it's not the PDU.  If it's easier, you could instead swap pf4's 
kettle plug with pf3's.  If you do this swap, please let us know which two you 
swapped.  This is an optional step.

4) If you could check that the RJ-45 network cable is securely attached to the 
back of pf4 that would be reassuring.  This is also an optional step.

5) So that we can correlate your actions with what we see in the log files, it 
would be great if you could record the times when things power on and when you 
leave the container.

6) Anything else you observe that might be relevant to why pf4 is behaving 
differently from the other ROACH2s.

Thanks again for your assistance!!!

Cheers,
Dave

On Jan 6, 2015, at 4:57 AM, Matthys Maree wrote:

> Sorry, I only read this mail now that I am already back from site for today.
> 
> What I did yesterday 5 January, was around that time you mentioned.
> Unfortunately I did not check the exact time.
> I first tried the "kettle plug" directly on the ROACH#4 machine.  Tried to
> push it in probably(even if it was not out).  I did not succeed.
> I traced it down to where it get power supplied from.(for this I had to bend
> over and under some cables!  Could easily have pulled a cable slightly of
> something with this attempt).
> On the Power supply unit where all the kettle plugs get power from, I did
> the same by ensuring proper connection.
> Still not successful.
> I went back to Roach #4 power inlet, pushed again, and tried Power button on
> front of Roach.  Now it turned ON.
> So I assumed it was either on the bottom PDU unit or top connection.
> 
> I was probably in the container  for +/- 20minutes.
> 
> Please let me know if you need me to try something in there again.  I can
> have a look tomorrow(7 Jan).
> 
> 
> Groete
> 
> Matthys Maree
> SKA South Africa – Carnarvon
> 
> Tel:       021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
> Web :  www.ska.ac.za
> 
> -----Original Message-----
> From: David MacMahon [mailto:[email protected]] On Behalf Of David
> MacMahon
> Sent: 06 January 2015 08:12 AM
> To: Matthys Maree
> Cc: 'danny jacobs'; 'David DeBoer'; 'PAPER List'; 'Matt Dexter'
> Subject: Re: recalcitrant roach
> 
> Thanks and Happy New Year, Matthys!  We really appreciate having your
> on-site support!!!
> 
> Unfortunately, we're still not able to access this machine ("r2d020671", aka
> "pf4") via the network.  Here is what we see in the log file for that
> system:
> 
>> Dec 26 21:13:58 r2d020671 -- MARK --
>> Dec 28 02:26:52 syslogd 1.5.0#6: restart.
>> [...]
>> Dec 28 03:06:52 r2d020671 -- MARK --
>> Dec 30 21:24:08 syslogd 1.5.0#6: restart.
>> [...]
>> Dec 30 21:44:08 r2d020671 -- MARK --
>> Jan  2 03:01:47 syslogd 1.5.0#6: restart.
>> [...]
>> Jan  2 03:41:47 r2d020671 -- MARK --
>> Jan  5 11:14:52 syslogd 1.5.0#6: restart.
>> [...]
>> Jan  5 11:14:53 r2d020671 sshd[543]: Server listening on 0.0.0.0 port 22.
> 
> The "MARK" messages get logged after 20 minutes on logging inactivity and
> the "syslogd ... restart" lines get logged when the machine reboots.  The
> final "sshd" line is the last line in the log file.  The timestamps are SAST
> (UTC+2).  Since we didn't get the expected "MARK" line at 11:34 I can only
> assume that connectivity was lost sometime between 11:14:53 and 11:34:53.
> 
> It would really help our understanding of the problem if you could please
> provide some more details of your visit to the PAPER container (e.g. time of
> day, duration of visit, actions taken, etc).  I suspect it's either a power
> problem, a network problem, or a system problem (e.g. bad RAM).  The problem
> is isolated to "pf4" (or its associated cables); all the other ROACH2s seem
> fine.
> 
> Thanks again,
> Dave
> 
> On Jan 5, 2015, at 2:36 AM, Matthys Maree wrote:
> 
>> Hi
>> 
>> Roach#4 back ON.
>> 
>> Probably the power cable.
>> 
>> Cooling still fine inside container.
>> 
>> Groete
>> 
>> Matthys Maree
>> SKA South Africa – Carnarvon
>> 
>> Tel:       021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
>> Web :  www.ska.ac.za
>> 
>> From: danny jacobs [mailto:[email protected]]
>> Sent: 31 December 2014 06:57 AM
>> To: David DeBoer; PAPER List; Matthys Maree; Matt Dexter
>> Subject: Fwd: recalcitrant roach
>> 
>> Hi Matthys (cc PAPER),
>> 
>> One of our ROACHs has stopped responding.  A power issue seems most
> likely. What with the heat cycling, its possible that its power cable has
> loosened (or maybe even the ethernet). A failing power supply is also
> possible.  Could you, or someone like you, double check that ROACH #4 is
> getting power and shows an ethernet light? 
>> 
>> Thanks,
>> 
>> ~Danny
>> 
>> 
>> 
>> 
>> ---------- Forwarded message ----------
>> From: David MacMahon <[email protected]>
>> Date: Tue, Dec 30, 2014 at 12:21 PM
>> Subject: Re: recalcitrant roach
>> To: danny jacobs <[email protected]>
>> Cc: Matt Dexter <[email protected]>
>> 
>> 
>> Hi, Danny,
>> 
>> pf4 seems to be having problems.  These problems seem to have started on
> December 19.  The roach2s log a "syslog restart" line when they boot.  I've
> extracted the December restart messages from the log files:
>> 
>> pf1:2014 Dec 19 08:46:09 syslogd 1.5.0#6: restart.
>> pf2:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> pf3:2014 Dec 19 08:46:17 syslogd 1.5.0#6: restart.
>> pf5:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> pf6:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
>> pf7:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
>> pf8:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> 
>> pf4:2014 Dec 19 09:45:58 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 10:28:00 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 11:36:52 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 15:10:14 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 16:28:23 syslogd 1.5.0#6: restart.
>> 
>> pf4:2014 Dec 20 23:17:49 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 23 23:55:54 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 26 20:33:59 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 28 02:26:52 syslogd 1.5.0#6: restart.
>> 
>> pf1:2014 Dec 30 18:26:41 syslogd 1.5.0#6: restart.
>> pf2:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> pf3:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf5:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf6:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> pf7:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf8:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> 
>> As you can see, pf4 did not restart on Dec 19 with the rest of the roach2s
> at 08:46.  It restarted almost an hour later at 9:45.  It then restarted
> several times throughout the day on the 19th.  It also restarted
> sporadically a few days since then with the most recent being on Dec 28 at
> 02:26.  The last log message for pf4 was Dec 28 03:06.  It went down
> sometime in the next 20 minutes after that.
>> 
>> I'm guessing it's a flaky power issue.  Hopefully just power cord that got
> loose at one end or the other during the shutdown.  If it's not that then
> I'd guess it's something internal to the power supply?
>> 
>> I've CC'd Matt in case he has any other ideas.
>> 
>> It would probably be a good idea to have someone check on the power
> cables.
>> 
>> Thanks,
>> Dave
>> 
>> On Dec 30, 2014, at 8:32 AM, danny jacobs wrote:
>> 
>>> Hi Dave,
>>> 
>>> I thought I'd give PAPER a boot up and see if we could break the A/C but
> it looks like we may have a dead roach.  #4 doesn't respond to pings even
> after power cycling. Just in case there was some mislabeling on the roachpdu
> apc page I even rebooted all of them.  All go down, all come back... except
> for #4.
>>> 
>>> Could you maybe take a look and confirm?
>>> 
>>> Thanks,
>>> ~Danny
>>> 
>>> 
>>> --
>>> 
>>> National Science Foundation Fellow
>>> Arizona State University
>>> School of Earth and Space Exploration Low Frequency Cosmology
>>> Phone:           (505) 500 4521
>>> Homepage:     http://loco.lab.asu.edu/danny_jacobs/
>> 
>> 
>> 
>> 
>> --
>> 
>> National Science Foundation Fellow
>> Arizona State University
>> School of Earth and Space Exploration
>> Low Frequency Cosmology
>> Phone:           (505) 500 4521
>> Homepage:     http://loco.lab.asu.edu/danny_jacobs/
> 


Reply via email to