Hi All
Kindly find below the testing scenario for control link failover:
1. The two FW’s were OK and reachable through OoB (tested from IP in the
same subnet of management subnet).
2. Configured the system backup router command “backup-router
11.11.11.254 destination 11.11.11.0/24;”.
3. Removing the control link between the two firewalls.
4. Got the failure message and the secondary firwall is still reachable
through OoB. Checking the ARP entry:
{disabled:node1}
Juniper123@FW2> show arp expiration-time
MAC Address Address Name Interface
Flags TTE
5c:26:0a:2b:6e:a2 11.11.11.254 11.11.11.254 fxp0.0
none 1062 n à My IP in the same LAN of the SRX.
28:c0:da:8f:8e:30 30.17.0.2 30.17.0.2 fab0.0
permanent
28:c0:da:8f:97:30 30.18.0.1 30.18.0.1 fab1.0
permanent
00:22:83:14:0b:f0 172.16.0.1 172.16.0.1 reth0.0
none 273
00:22:83:14:0b:f1 192.168.0.1 192.168.0.1 reth1.0
none 613
5. Got the following message:
FW2 FW2 PFEMAN: Shutting down , Master routing engine did not recover;
forwarding stopped
Message from syslogd@FW2 at Mar 26 23:04:01 ...
FW2 FW2 CMLC: Master RE did not recover, forwarding stopped
Message from syslogd@FW2 at Mar 26 23:04:01 ...
FW2 FW2 CMLC: committing suicide , Shutting down due to loss of
communicationwith master RE
6. After ARP timer expired, the source IP (My IP) entry disappeared and I
couldn’t reach the router except by using console and making reboot.
{disabled:node1}
Juniper123@FW2> show arp expiration-time
MAC Address Address Name Interface
Flags TTE
28:c0:da:8f:8e:30 30.17.0.2 30.17.0.2 fab0.0
permanent
28:c0:da:8f:97:30 30.18.0.1 30.18.0.1 fab1.0
permanent
00:22:83:14:0b:f0 172.16.0.1 172.16.0.1 reth0.0
none 1
00:22:83:14:0b:f1 192.168.0.1 192.168.0.1 reth1.0
none 149
Total entries: 4
Conclusion: When removing the clustering control link, I can’t reach the
secondary firewall even from directly connected IP to the management subnet
even the ARP entry is not created for my IP, I have to console the box and
reboot. This was done with system backup router command.
Any suggestions?
BR,
From: Pavel Lunin [mailto:[email protected]]
Sent: Wednesday, March 23, 2011 8:05 PM
To: Chen Jiang
Cc: Walaa Abdel razzak; Michael Lee; juniper-nsp
Subject: Re: [j-nsp] SRX650 Failover Test Issue
2011/3/23 Chen Jiang <[email protected]>
It's a by design behavior. When control link or fabric link disconnected, the
current RG0 master node will remain in master status but the current RG0
backup node will disable itself to avoid split-brain issue, "Disable" means the
node will offline all SPC/NPC and Line Card. And only reboot the whole chassis
could recovery the node.
Right but the question is slightly different: whether it's possible to reboot
it not having access to its console.
I don't have a lab ready for testing right now but AFAIR fxp0 is still active
for a disabled node even on branch (all the more so for high-end, since it's
directly on RE there). Moreover "request routing-engine login node X" should
also be available.
By now I don't really remember the details but there is absolutely no doubt,
it's possible to access a disable node throughout the network, not only on
console. About half a year ago we've run into an bug in 10.0R2 causing losses
of heartbeats on control link and consequent regular failovers and node
disabling. No doubt it was possible to reboot the disabled node without having
access to console.
_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp