Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
--> so then in a 2node VC one node is Master one node is backup > If they split the master will go down but the backup should survive as it > is > still half of the original cluster > > So this means you should make the part you want to survive to be the > backup-RE and not the master-RE > > --- or did I miss something ?! > > My philosophy is that a default use case of a two nodes VC (and nearly only use case of a VC at all) should be some LAG-based redundancy, when two switches are racked next to each other, connected with two twinax cables and should never split. Of course, technically they can, but a lot of other bloody things, which are out of our control, can happen to them: software bug, misconfiguration, uncontrolled hardware failure like bite errors cased by an overheated SFP, drunk worker with an angle grinder etc. All the exotic cases like geographically distributed VCs etc are, in my opinion, an exercise for the folks who can't figure out what routing protocols are made for. So this should not be the default use case, and the default software behavior should not be adapted to such scenarios. Thus in a two-nodes VC, a *real* failure scenario is a switch failure. When split-detection is enabled, two-nodes VC will only survive in 50% of switch failure cases (if backup RE dies, VC dies, if master RE dies, VC survives). So in fact it's just no better than a signle non-redundant switch. It's worse, in fact, as the added complexity and false expectations are, in fact, more expensive currency than a controlled service outage. Cheers, Pavel ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Alexander Marhold wrote: > Therefore if you want to put one node out of a 2 node VC you need to put the > Master down not the backup > Sounds strange but this is according to the rules stated below Interesting twist :-) -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN AS43859 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Tobias Heister wrote: > > > > Yes, no-split-detection did help. > > I would like to add to that. My point of view is that you do not > always disable split-detection in a two member VC. You can do so if > you know what that implies. > > The reasoning for the remaining node going into LC mode is that only > the portions of the VC having the majority of nodes stays up and > operational. In a two member VC if for whatever reason one of the > nodes looses connection to the other, we cannot have a majority so > both sides go down. Even if it is the only node remaining. > > But imagine an error scenario where the second node does not crash, > but for whatever reason both sides stay up, but the connection > between them gets lost. With split-detection configured, both sides > will go down and you have a controlled service outage. When no > split-detection is configured both sides remain up and you might > have interesting effects happening in your network with two switches > with the same configuration and same "identity" being up and > forwarding. I have seen that happening in DC scenarios doing stp to > other devices and it is not pretty! Thank you for the explanation. However, in my case I would rather risk an active/active configuration than have two unresponsive switches which can only be revived through manual intervention. This is mainly because: 1. The stacks are in remote locations, you have to ride an all-terrain vehicle to reach them for manual intervention, or sometimes even a helicopter. 2. The stack members are located together in one rack, so the most likely scenarios requiring failover will be a) one switch hardware failure or b) a failure of one of the UPSes or invertors. 3. An active/active situation can be easily mitigated remotely by shutting down a port on the uplink. > > So always check the implications of what the command are doing. If > in your case an active/active split scenario (for worst case) works > out better than a completely offline VC, that is perfectly fine. I > have seen lots of scenarios where it would not be the expected or > wanted behavior. But in my world a VC is no real redundancy method > it is just stacking-NG for additional ports under one MGMT so i > would have two VCs if i relay need redundancy in most setups. But > that is just me ;) I guess, in my case a completely offline VC is unacceptable. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN AS43859 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Therefore if you want to put one node out of a 2 node VC you need to put the Master down not the backup Sounds strange but this is according to the rules stated below Regards alexander -Ursprüngliche Nachricht- Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von Alexander Marhold Gesendet: Donnerstag, 26. Juli 2018 09:52 An: 'Tobias Heister'; 'Victor Sudakov'; 'Pavel Lunin' Cc: 'juniper-nsp' Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode Hi According to the documentation there should be the following behavior with split-detection enabled: In case of a complete split: If the Master-RE sees MORE THAN HALF of the devices it survives otherwise it disables that part of the cluster If the Backup-RE sees HALF of the devices the backup Re will survive and play the master --> so then in a 2node VC one node is Master one node is backup If they split the master will go down but the backup should survive as it is still half of the original cluster So this means you should make the part you want to survive to be the backup-RE and not the master-RE --- or did I miss something ?! Regards Alexander -Ursprüngliche Nachricht- Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von Tobias Heister Gesendet: Donnerstag, 26. Juli 2018 09:26 An: Victor Sudakov; Pavel Lunin Cc: juniper-nsp Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode Hi, On 26.07.2018 09:06, Victor Sudakov wrote: >>> I don't like to explain what others say but I think yes. It's been known >>> behavior since always: in a two-member VC always disable split-detection. >>> You can google for other threads on this in this list. >>> >>> It's always been kind of poorly documented. Last time I checked the docs, >>> instead of just writing clearly that it must be disabled in two-members >>> mode, they "don't recommend" it with some kind of hand-waving explanation >>> that if you estimate that the backup RE failure probability is higher that >>> a split-brain condition blah-blah-blah... Just disable split-detection, >>> that's it :) >> >> Tomorrow we are planning a lab with and without split-detection. I >> hope this solves the issue for us, and if it does, I'm sure to make a >> note in my engineering journal. > > Yes, no-split-detection did help. I would like to add to that. My point of view is that you do not always disable split-detection in a two member VC. You can do so if you know what that implies. The reasoning for the remaining node going into LC mode is that only the portions of the VC having the majority of nodes stays up and operational. In a two member VC if for whatever reason one of the nodes looses connection to the other, we cannot have a majority so both sides go down. Even if it is the only node remaining. But imagine an error scenario where the second node does not crash, but for whatever reason both sides stay up, but the connection between them gets lost. With split-detection configured, both sides will go down and you have a controlled service outage. When no split-detection is configured both sides remain up and you might have interesting effects happening in your network with two switches with the same configuration and same "identity" being up and forwarding. I have seen that happening in DC scenarios doing stp to other devices and it is not pretty! So always check the implications of what the command are doing. If in your case an active/active split scenario (for worst case) works out better than a completely offline VC, that is perfectly fine. I have seen lots of scenarios where it would not be the expected or wanted behavior. But in my world a VC is no real redundancy method it is just stacking-NG for additional ports under one MGMT so i would have two VCs if i relay need redundancy in most setups. But that is just me ;) -- Kind Regards Tobias Heister ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Hi According to the documentation there should be the following behavior with split-detection enabled: In case of a complete split: If the Master-RE sees MORE THAN HALF of the devices it survives otherwise it disables that part of the cluster If the Backup-RE sees HALF of the devices the backup Re will survive and play the master --> so then in a 2node VC one node is Master one node is backup If they split the master will go down but the backup should survive as it is still half of the original cluster So this means you should make the part you want to survive to be the backup-RE and not the master-RE --- or did I miss something ?! Regards Alexander -Ursprüngliche Nachricht- Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von Tobias Heister Gesendet: Donnerstag, 26. Juli 2018 09:26 An: Victor Sudakov; Pavel Lunin Cc: juniper-nsp Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode Hi, On 26.07.2018 09:06, Victor Sudakov wrote: >>> I don't like to explain what others say but I think yes. It's been known >>> behavior since always: in a two-member VC always disable split-detection. >>> You can google for other threads on this in this list. >>> >>> It's always been kind of poorly documented. Last time I checked the docs, >>> instead of just writing clearly that it must be disabled in two-members >>> mode, they "don't recommend" it with some kind of hand-waving explanation >>> that if you estimate that the backup RE failure probability is higher that >>> a split-brain condition blah-blah-blah... Just disable split-detection, >>> that's it :) >> >> Tomorrow we are planning a lab with and without split-detection. I >> hope this solves the issue for us, and if it does, I'm sure to make a >> note in my engineering journal. > > Yes, no-split-detection did help. I would like to add to that. My point of view is that you do not always disable split-detection in a two member VC. You can do so if you know what that implies. The reasoning for the remaining node going into LC mode is that only the portions of the VC having the majority of nodes stays up and operational. In a two member VC if for whatever reason one of the nodes looses connection to the other, we cannot have a majority so both sides go down. Even if it is the only node remaining. But imagine an error scenario where the second node does not crash, but for whatever reason both sides stay up, but the connection between them gets lost. With split-detection configured, both sides will go down and you have a controlled service outage. When no split-detection is configured both sides remain up and you might have interesting effects happening in your network with two switches with the same configuration and same "identity" being up and forwarding. I have seen that happening in DC scenarios doing stp to other devices and it is not pretty! So always check the implications of what the command are doing. If in your case an active/active split scenario (for worst case) works out better than a completely offline VC, that is perfectly fine. I have seen lots of scenarios where it would not be the expected or wanted behavior. But in my world a VC is no real redundancy method it is just stacking-NG for additional ports under one MGMT so i would have two VCs if i relay need redundancy in most setups. But that is just me ;) -- Kind Regards Tobias Heister ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Hi, On 26.07.2018 09:06, Victor Sudakov wrote: I don't like to explain what others say but I think yes. It's been known behavior since always: in a two-member VC always disable split-detection. You can google for other threads on this in this list. It's always been kind of poorly documented. Last time I checked the docs, instead of just writing clearly that it must be disabled in two-members mode, they "don't recommend" it with some kind of hand-waving explanation that if you estimate that the backup RE failure probability is higher that a split-brain condition blah-blah-blah... Just disable split-detection, that's it :) Tomorrow we are planning a lab with and without split-detection. I hope this solves the issue for us, and if it does, I'm sure to make a note in my engineering journal. Yes, no-split-detection did help. I would like to add to that. My point of view is that you do not always disable split-detection in a two member VC. You can do so if you know what that implies. The reasoning for the remaining node going into LC mode is that only the portions of the VC having the majority of nodes stays up and operational. In a two member VC if for whatever reason one of the nodes looses connection to the other, we cannot have a majority so both sides go down. Even if it is the only node remaining. But imagine an error scenario where the second node does not crash, but for whatever reason both sides stay up, but the connection between them gets lost. With split-detection configured, both sides will go down and you have a controlled service outage. When no split-detection is configured both sides remain up and you might have interesting effects happening in your network with two switches with the same configuration and same "identity" being up and forwarding. I have seen that happening in DC scenarios doing stp to other devices and it is not pretty! So always check the implications of what the command are doing. If in your case an active/active split scenario (for worst case) works out better than a completely offline VC, that is perfectly fine. I have seen lots of scenarios where it would not be the expected or wanted behavior. But in my world a VC is no real redundancy method it is just stacking-NG for additional ports under one MGMT so i would have two VCs if i relay need redundancy in most setups. But that is just me ;) -- Kind Regards Tobias Heister ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Victor Sudakov wrote: > Pavel Lunin wrote: > > > > in a virtual chassis you could add: > > > > > > > > set virtual-chassis no-split-detection > > > > > > > > This will ensure that if both VC ports go down, the master routing > > > engine carries on working. > > > > > > Are you referring to "Scenario B" in > > > https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ? > > > or a different case? > > > > > > > > > I don't like to explain what others say but I think yes. It's been known > > behavior since always: in a two-member VC always disable split-detection. > > You can google for other threads on this in this list. > > > > It's always been kind of poorly documented. Last time I checked the docs, > > instead of just writing clearly that it must be disabled in two-members > > mode, they "don't recommend" it with some kind of hand-waving explanation > > that if you estimate that the backup RE failure probability is higher that > > a split-brain condition blah-blah-blah... Just disable split-detection, > > that's it :) > > Tomorrow we are planning a lab with and without split-detection. I > hope this solves the issue for us, and if it does, I'm sure to make a > note in my engineering journal. Yes, no-split-detection did help. Thank you Catalin and Pavel very much again. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN AS43859 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Pavel Lunin wrote: > > > in a virtual chassis you could add: > > > > > > set virtual-chassis no-split-detection > > > > > > This will ensure that if both VC ports go down, the master routing > > engine carries on working. > > > > Are you referring to "Scenario B" in > > https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ? > > or a different case? > > > > > I don't like to explain what others say but I think yes. It's been known > behavior since always: in a two-member VC always disable split-detection. > You can google for other threads on this in this list. > > It's always been kind of poorly documented. Last time I checked the docs, > instead of just writing clearly that it must be disabled in two-members > mode, they "don't recommend" it with some kind of hand-waving explanation > that if you estimate that the backup RE failure probability is higher that > a split-brain condition blah-blah-blah... Just disable split-detection, > that's it :) Tomorrow we are planning a lab with and without split-detection. I hope this solves the issue for us, and if it does, I'm sure to make a note in my engineering journal. Thank you Catalin and Pavel for your input. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN AS43859 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
> > in a virtual chassis you could add: > > > > set virtual-chassis no-split-detection > > > > This will ensure that if both VC ports go down, the master routing > engine carries on working. > > Are you referring to "Scenario B" in > https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ? > or a different case? > I don't like to explain what others say but I think yes. It's been known behavior since always: in a two-member VC always disable split-detection. You can google for other threads on this in this list. It's always been kind of poorly documented. Last time I checked the docs, instead of just writing clearly that it must be disabled in two-members mode, they "don't recommend" it with some kind of hand-waving explanation that if you estimate that the backup RE failure probability is higher that a split-brain condition blah-blah-blah... Just disable split-detection, that's it :) ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
Catalin Dominte wrote: > > > > > > I've encountered an odd problem with adding EX4200s (running 12.3R6.6) > > to Virtual Chassis with a nonprovisioned configuration file. [dd] > > > > Looks fine, doesn't it? However, if I later poweroff the Backup switch > > (BM0217040019), the current Master switch (BM0213317561) goes into linecard > > (sic!) > > mode! And I lose ssh access to it. I can undo the disaster only from the > > serial console. > > > If you only have 2 members Yes, I do. > in a virtual chassis you could add: > > set virtual-chassis no-split-detection > > This will ensure that if both VC ports go down, the master routing engine > carries on working. Are you referring to "Scenario B" in https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ? or a different case? -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN AS43859 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode
If you only have 2 members in a virtual chassis you could add: set virtual-chassis no-split-detection This will ensure that if both VC ports go down, the master routing engine carries on working. Catalin Dominte | Senior Network Consultant Nocsult Ltd | 11 Castle Hill | Maidenhead | Berkshire | SL6 4AA | Phone: +44 (0)1628 302 007 VAT registration number: GB 180957674 | Company registration number: 08886349 P Please consider the environment - Do you really need to print this email? THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the email and its attachments from all computers. On 25 Jul 2018, 08:27 +0100, Victor Sudakov , wrote: > Dear Colleagues, > > I've encountered an odd problem with adding EX4200s (running 12.3R6.6) > to Virtual Chassis with a nonprovisioned configuration file. > > According to documentation, I zeroize the second switch, power it off, > connect to the running switch and power it on. Some magic happens, and > voila: > > > show virtual-chassis > > Virtual Chassis ID: 915f.3bb3.ff60 > Virtual Chassis Mode: Enabled > Mstr Mixed Neighbor List > Member ID Status Serial No Model prio Role Mode ID Interface > 0 (FPC 0) Prsnt BM0213317561 ex4200-24t 128 Master* N 1 vcp-0 > 1 (FPC 1) Prsnt BM0217040019 ex4200-24t 128 Backup N 0 vcp-0 > > Member ID for next new member: 2 (FPC 2) > > Looks fine, doesn't it? However, if I later poweroff the Backup switch > (BM0217040019), the current Master switch (BM0213317561) goes into linecard > (sic!) > mode! And I lose ssh access to it. I can undo the disaster only from the > serial console. > > What am I doing wrong? Or if it's a bug, is there a workaround? > > -- > Victor Sudakov, VAS4-RIPE, VAS47-RIPN > AS43859 > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp