[j-nsp] EX4200 Notprsnt VC members after reboot (filesystem corruption)

2010-06-20 Thread Dale Shaw
Hi all,

Not much technical detail to provide at this stage but I wanted to
know if anyone else has experienced the same/similar problem --

We've just been through a programme of upgrading ~400 EX4200s (some
single switches, most multiple switch/VC) from JUNOS 10.0R2 to 10.0S1.
This process, of course, involved lots of reboots.

In a small number of cases, upon rebooting the VC, we have member
switches 'disappear' -- from the VC master they show up as 'NotPrsnt'
(not present). I haven't been involved first hand with the recovery
process but from what I understand, the console log entries indicate
that the member switch doesn't appear to have unmounted its
filesystems cleanly and upon rebooting, filesystem corruption is
detected on da0s1a and badness ensues. The end result is a switch that
doesn't boot and doesn't join the VC. It's effectively orphaned,
unmanageable and an on-site visit is required to recover it.

Has anyone had problems with EX4200s and filesystem corruption
relating to ungraceful power-downs, routine reboots (i.e. for JUNOS
upgrades or whatever), or anything else? Does anyone know of any
tricks to access a switch in this state remotely? (unfortunately there
is no out-of-band management path available).

A re-install of JUNOS (from a JUNOS tgz located in a USB flash disk)
typically 'fixes' the problem. A power cycle is not enough.

Cheers,
Dale
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 Notprsnt VC members after reboot (filesystem corruption)

2010-06-20 Thread Chuck Anderson
On Sun, Jun 20, 2010 at 04:35:40PM +1000, Dale Shaw wrote:
 Has anyone had problems with EX4200s and filesystem corruption
 relating to ungraceful power-downs, routine reboots (i.e. for JUNOS
 upgrades or whatever), or anything else? Does anyone know of any
 tricks to access a switch in this state remotely? (unfortunately there
 is no out-of-band management path available).
 
 A re-install of JUNOS (from a JUNOS tgz located in a USB flash disk)
 typically 'fixes' the problem. A power cycle is not enough.

Remote console access is needed to fix this.  If you can at least 
connect all the VC member management ethernet ports together with a 
separate OOB switch along with a Console Server connected to all VC 
member console ports, even if the OOB network is trunked back to your 
location through an in-band VLAN, you might be able to recover from 
this remotely.  When I was in this situation recently the corrupted 
member switch was able to boot enough to be able to FTP JUNOS via OOB 
from another VC member switch.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp