OK

9:43:
- Enter PAPER Container.
- All internal lights OFF.  Wall sockets OFF.  This was strange yes, but I
do not think this must be linked with the Roach4 problem.  Earth leakage
tripped inside DB on the lights and wall sockets.(We had a dip in
electricity supply from ESKOM yesterday afternoon for some reason, and might
have to do with this)
- All racks ON except for PF#4 which was OFF again.

9:54:
- Pushed kettle plug hard at back of unit, and it turned ON automatically.

10:00:
- Swopped kettle plugs (only the ends that connect into the ROACHes),
between units #3 and #4.
- #3 turned ON without problems.
- #3 needed some hard pushing from the kettle plug before turning ON, and
died again after some seconds.

10:05:
- Pulled kettle plug out again from #4, while in the OFF state.  Pushed back
in lightly.  Did not come ON auto, but did respond to the PWR button on
front.  Strange YES?

10:10:
- Fiddle with the kettle plug on #4(while ON), to see if it might turn OFF
due to the fiddling.  This did not affect the status at all, and it remained
ON.
- Pulled out kettle plug again.

10:12:
- Pushed back in kettle plug in #4 normally(not with much force to see if
normal operation work fine).  Unit came ON immediately.

10:20:
- #4 Still ON.  
- Leave container.

11:00:
- Re-enter container.
- #4 OFF.

11:05:
First tried PWR button with no response.
Then RESET button, and unit came ON with the FAULT light ON as RED.
RESET button seem to turn this unit OFF when pressed now(Maybe part of the
RESET cycle?), and the PWR button seem to get it ON now.  Now this unit
confuse me.....

11:08:
- Remove kettle plug from unit in attempt to HARD RESET it.

11:10:
- Re-connect kettle plug.
- Unit turn ON (Like it should without failure/fault)

11:15:
- #4 still ON without fault.
- Leave container.


My conclusion is that something is behaving strange with this Roach#4, and
not the power supplies/kettle plugs.
It looks like it turn OFF by itself after a while, maybe because of
something heating up?
I would suggest swopping this unit out with a spare one if there is a spare?
Maybe you can try a power cycle on the PDU for this unit in attempt to get
it back ON again if you have difficulty.


(Remember the kettle plugs are still swopped between #3 and #4, only on the
Roach side.)

Let me know if I can assist further, maybe with a swop or so.



Me and Jasper plan to add more gas to the cooling unit on Friday 9 Jan, in
an attempt to keep the cooling unit running, until the fault/leak or
whatever is fixed on it later this month hopefully. 


Groete

Matthys Maree
SKA South Africa – Carnarvon

Tel:       021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
Web :  www.ska.ac.za


-----Original Message-----
From: David MacMahon [mailto:[email protected]] On Behalf Of David
MacMahon
Sent: 06 January 2015 07:13 PM
To: Matthys Maree
Cc: 'danny jacobs'; 'David DeBoer'; 'PAPER List'; 'Matt Dexter'
Subject: Re: recalcitrant roach

Thanks, Matthys, that's very helpful!  The ROACH2s are configured to power
on automatically when VAC power is applied.  Since you had to push the power
button to turn it on then I suspect something internal to that ROACH2 is
unwell.  It sounds like the power cables are (and were) securely connected.

It would be great if you could please check in on it again tomorrow (Jan 7).
The things that will be of most interest to us are:

1) Is pf4 currently powered on when you arrive at the container?  This will
tell us whether it is a power problem or  communication problem.  Depending
on the current "powered on" status, do either 2A or 2B...

2A) If pf4 is currently off, does pushing the power button turn it on?

2B) If pf4 is currently on, unplug its kettle plug, wait a few seconds,
reconnect the kettle plug.  Does it turn on automatically when the power
cable is reconnected?  If not, does pushing the power button turn it on?

Assuming that pf4 is powered up after doing 2A or 2B, please wait a few
minutes for it to boot.  I'm not sure how you can tell that the boot has
completed (maybe the network LEDs will stop their rapid blinking?), but I
think 3 minutes should be adequate.

3) If it is not too difficult to access, you could try swapping the kettle
plugs for pf4 and pf5.  That way if the symptom moves to pf5 we will know it
is a problem in the PDU (or power cable?).  If the symptom stays with pf4
then we'll know it's not the PDU.  If it's easier, you could instead swap
pf4's kettle plug with pf3's.  If you do this swap, please let us know which
two you swapped.  This is an optional step.

4) If you could check that the RJ-45 network cable is securely attached to
the back of pf4 that would be reassuring.  This is also an optional step.

5) So that we can correlate your actions with what we see in the log files,
it would be great if you could record the times when things power on and
when you leave the container.

6) Anything else you observe that might be relevant to why pf4 is behaving
differently from the other ROACH2s.

Thanks again for your assistance!!!

Cheers,
Dave

On Jan 6, 2015, at 4:57 AM, Matthys Maree wrote:

> Sorry, I only read this mail now that I am already back from site for
today.
> 
> What I did yesterday 5 January, was around that time you mentioned.
> Unfortunately I did not check the exact time.
> I first tried the "kettle plug" directly on the ROACH#4 machine.  
> Tried to push it in probably(even if it was not out).  I did not succeed.
> I traced it down to where it get power supplied from.(for this I had 
> to bend over and under some cables!  Could easily have pulled a cable 
> slightly of something with this attempt).
> On the Power supply unit where all the kettle plugs get power from, I 
> did the same by ensuring proper connection.
> Still not successful.
> I went back to Roach #4 power inlet, pushed again, and tried Power 
> button on front of Roach.  Now it turned ON.
> So I assumed it was either on the bottom PDU unit or top connection.
> 
> I was probably in the container  for +/- 20minutes.
> 
> Please let me know if you need me to try something in there again.  I 
> can have a look tomorrow(7 Jan).
> 
> 
> Groete
> 
> Matthys Maree
> SKA South Africa – Carnarvon
> 
> Tel:       021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
> Web :  www.ska.ac.za
> 
> -----Original Message-----
> From: David MacMahon [mailto:[email protected]] On Behalf Of David 
> MacMahon
> Sent: 06 January 2015 08:12 AM
> To: Matthys Maree
> Cc: 'danny jacobs'; 'David DeBoer'; 'PAPER List'; 'Matt Dexter'
> Subject: Re: recalcitrant roach
> 
> Thanks and Happy New Year, Matthys!  We really appreciate having your 
> on-site support!!!
> 
> Unfortunately, we're still not able to access this machine 
> ("r2d020671", aka
> "pf4") via the network.  Here is what we see in the log file for that
> system:
> 
>> Dec 26 21:13:58 r2d020671 -- MARK --
>> Dec 28 02:26:52 syslogd 1.5.0#6: restart.
>> [...]
>> Dec 28 03:06:52 r2d020671 -- MARK --
>> Dec 30 21:24:08 syslogd 1.5.0#6: restart.
>> [...]
>> Dec 30 21:44:08 r2d020671 -- MARK --
>> Jan  2 03:01:47 syslogd 1.5.0#6: restart.
>> [...]
>> Jan  2 03:41:47 r2d020671 -- MARK --
>> Jan  5 11:14:52 syslogd 1.5.0#6: restart.
>> [...]
>> Jan  5 11:14:53 r2d020671 sshd[543]: Server listening on 0.0.0.0 port 22.
> 
> The "MARK" messages get logged after 20 minutes on logging inactivity 
> and the "syslogd ... restart" lines get logged when the machine 
> reboots.  The final "sshd" line is the last line in the log file.  The 
> timestamps are SAST (UTC+2).  Since we didn't get the expected "MARK" 
> line at 11:34 I can only assume that connectivity was lost sometime
between 11:14:53 and 11:34:53.
> 
> It would really help our understanding of the problem if you could 
> please provide some more details of your visit to the PAPER container 
> (e.g. time of day, duration of visit, actions taken, etc).  I suspect 
> it's either a power problem, a network problem, or a system problem 
> (e.g. bad RAM).  The problem is isolated to "pf4" (or its associated 
> cables); all the other ROACH2s seem fine.
> 
> Thanks again,
> Dave
> 
> On Jan 5, 2015, at 2:36 AM, Matthys Maree wrote:
> 
>> Hi
>> 
>> Roach#4 back ON.
>> 
>> Probably the power cable.
>> 
>> Cooling still fine inside container.
>> 
>> Groete
>> 
>> Matthys Maree
>> SKA South Africa – Carnarvon
>> 
>> Tel:       021 506 7300 ext.#1035 (Carnarvon, Klerefontein)
>> Web :  www.ska.ac.za
>> 
>> From: danny jacobs [mailto:[email protected]]
>> Sent: 31 December 2014 06:57 AM
>> To: David DeBoer; PAPER List; Matthys Maree; Matt Dexter
>> Subject: Fwd: recalcitrant roach
>> 
>> Hi Matthys (cc PAPER),
>> 
>> One of our ROACHs has stopped responding.  A power issue seems most
> likely. What with the heat cycling, its possible that its power cable 
> has loosened (or maybe even the ethernet). A failing power supply is 
> also possible.  Could you, or someone like you, double check that 
> ROACH #4 is getting power and shows an ethernet light?
>> 
>> Thanks,
>> 
>> ~Danny
>> 
>> 
>> 
>> 
>> ---------- Forwarded message ----------
>> From: David MacMahon <[email protected]>
>> Date: Tue, Dec 30, 2014 at 12:21 PM
>> Subject: Re: recalcitrant roach
>> To: danny jacobs <[email protected]>
>> Cc: Matt Dexter <[email protected]>
>> 
>> 
>> Hi, Danny,
>> 
>> pf4 seems to be having problems.  These problems seem to have started 
>> on
> December 19.  The roach2s log a "syslog restart" line when they boot.  
> I've extracted the December restart messages from the log files:
>> 
>> pf1:2014 Dec 19 08:46:09 syslogd 1.5.0#6: restart.
>> pf2:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> pf3:2014 Dec 19 08:46:17 syslogd 1.5.0#6: restart.
>> pf5:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> pf6:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
>> pf7:2014 Dec 19 08:46:16 syslogd 1.5.0#6: restart.
>> pf8:2014 Dec 19 08:46:15 syslogd 1.5.0#6: restart.
>> 
>> pf4:2014 Dec 19 09:45:58 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 10:28:00 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 11:36:52 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 15:10:14 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 19 16:28:23 syslogd 1.5.0#6: restart.
>> 
>> pf4:2014 Dec 20 23:17:49 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 23 23:55:54 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 26 20:33:59 syslogd 1.5.0#6: restart.
>> pf4:2014 Dec 28 02:26:52 syslogd 1.5.0#6: restart.
>> 
>> pf1:2014 Dec 30 18:26:41 syslogd 1.5.0#6: restart.
>> pf2:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> pf3:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf5:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf6:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> pf7:2014 Dec 30 18:26:37 syslogd 1.5.0#6: restart.
>> pf8:2014 Dec 30 18:26:36 syslogd 1.5.0#6: restart.
>> 
>> As you can see, pf4 did not restart on Dec 19 with the rest of the 
>> roach2s
> at 08:46.  It restarted almost an hour later at 9:45.  It then 
> restarted several times throughout the day on the 19th.  It also 
> restarted sporadically a few days since then with the most recent 
> being on Dec 28 at 02:26.  The last log message for pf4 was Dec 28 
> 03:06.  It went down sometime in the next 20 minutes after that.
>> 
>> I'm guessing it's a flaky power issue.  Hopefully just power cord 
>> that got
> loose at one end or the other during the shutdown.  If it's not that 
> then I'd guess it's something internal to the power supply?
>> 
>> I've CC'd Matt in case he has any other ideas.
>> 
>> It would probably be a good idea to have someone check on the power
> cables.
>> 
>> Thanks,
>> Dave
>> 
>> On Dec 30, 2014, at 8:32 AM, danny jacobs wrote:
>> 
>>> Hi Dave,
>>> 
>>> I thought I'd give PAPER a boot up and see if we could break the A/C 
>>> but
> it looks like we may have a dead roach.  #4 doesn't respond to pings 
> even after power cycling. Just in case there was some mislabeling on 
> the roachpdu apc page I even rebooted all of them.  All go down, all 
> come back... except for #4.
>>> 
>>> Could you maybe take a look and confirm?
>>> 
>>> Thanks,
>>> ~Danny
>>> 
>>> 
>>> --
>>> 
>>> National Science Foundation Fellow
>>> Arizona State University
>>> School of Earth and Space Exploration Low Frequency Cosmology
>>> Phone:           (505) 500 4521
>>> Homepage:     http://loco.lab.asu.edu/danny_jacobs/
>> 
>> 
>> 
>> 
>> --
>> 
>> National Science Foundation Fellow
>> Arizona State University
>> School of Earth and Space Exploration Low Frequency Cosmology
>> Phone:           (505) 500 4521
>> Homepage:     http://loco.lab.asu.edu/danny_jacobs/
> 


Reply via email to