Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Pavel Lunin
--> so then in a 2node VC one node is Master one node is backup
> If they split the master will go down but the backup should survive as it
> is
> still half of the original cluster
>
> So this means you should make the part you want to survive to be the
> backup-RE and not the master-RE
>
> --- or did I miss something ?!
>
>

My philosophy is that a default use case of a two nodes VC (and nearly only
use case of a VC at all) should be some LAG-based redundancy, when two
switches are racked next to each other, connected with two twinax cables
and should never split. Of course, technically they can, but a lot of other
bloody things, which are out of our control, can happen to them: software
bug, misconfiguration, uncontrolled hardware failure like bite errors cased
by an overheated SFP, drunk worker with an angle grinder etc.

All the exotic cases like geographically distributed VCs etc are, in my
opinion, an exercise for the folks who can't figure out what routing
protocols are made for. So this should not be the default use case, and the
default software behavior should not be adapted to such scenarios.

Thus in a two-nodes VC, a *real* failure scenario is a switch failure. When
split-detection is enabled, two-nodes VC will only survive in 50% of switch
failure cases (if backup RE dies, VC dies, if master RE dies, VC survives).
So in fact it's just no better than a signle non-redundant switch. It's
worse, in fact, as the added complexity and false expectations are, in
fact, more expensive currency than a controlled service outage.

Cheers,
Pavel
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Victor Sudakov
Alexander Marhold wrote:
> Therefore if you want to put one node out of a 2 node VC you need to put the
> Master down not the backup
> Sounds strange but this is according to the rules stated below

Interesting twist :-)


-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
AS43859
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Victor Sudakov
Tobias Heister wrote:
> > 
> > Yes, no-split-detection did help.
> 
> I would like to add to that. My point of view is that you do not
> always disable split-detection in a two member VC.  You can do so if
> you know what that implies.
> 
> The reasoning for the remaining node going into LC mode is that only
> the portions of the VC having the majority of nodes stays up and
> operational. In a two member VC if for whatever reason one of the
> nodes looses connection to the other, we cannot have a majority so
> both sides go down. Even if it is the only node remaining.
> 
> But imagine an error scenario where the second node does not crash,
> but for whatever reason both sides stay up, but the connection
> between them gets lost. With split-detection configured, both sides
> will go down and you have a controlled service outage. When no
> split-detection is configured both sides remain up and you might
> have interesting effects happening in your network with two switches
> with the same configuration and same "identity" being up and
> forwarding. I have seen that happening in DC scenarios doing stp to
> other devices and it is not pretty!

Thank you for the explanation. However, in my case I would rather risk
an active/active configuration than have two unresponsive switches
which can only be revived through manual intervention. This is mainly
because:

1. The stacks are in remote locations, you have to ride an all-terrain
vehicle to reach them for manual intervention, or sometimes even a
helicopter.

2. The stack members are located together in one rack, so the most
likely scenarios requiring failover will be a) one switch hardware
failure or b) a failure of one of the UPSes or invertors.

3. An active/active situation can be easily mitigated remotely by
shutting down a port on the uplink.

> 
> So always check the implications of what the command are doing. If
> in your case an active/active split scenario (for worst case) works
> out better than a completely offline VC, that is perfectly fine. I
> have seen lots of scenarios where it would not be the expected or
> wanted behavior. But in my world a VC is no real redundancy method
> it is just stacking-NG for additional ports under one MGMT so i
> would have two VCs if i relay need redundancy in most setups. But
> that is just me ;)

I guess, in my case a completely offline VC is unacceptable.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
AS43859
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Alexander Marhold
Therefore if you want to put one node out of a 2 node VC you need to put the
Master down not the backup
Sounds strange but this is according to the rules stated below

Regards

alexander

-Ursprüngliche Nachricht-
Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von
Alexander Marhold
Gesendet: Donnerstag, 26. Juli 2018 09:52
An: 'Tobias Heister'; 'Victor Sudakov'; 'Pavel Lunin'
Cc: 'juniper-nsp'
Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into
linecard mode

Hi 

According to the documentation there should be the following behavior with
split-detection enabled:
In case of a complete split:
If the Master-RE sees MORE THAN HALF of the devices it survives otherwise it
disables that part of the cluster
If the Backup-RE sees HALF of the devices the backup Re will survive and
play the master

--> so then in a 2node VC one node is Master one node is backup
If they split the master will go down but the backup should survive as it is
still half of the original cluster

So this means you should make the part you want to survive to be the
backup-RE and not the master-RE

--- or did I miss something ?!

Regards

Alexander

-Ursprüngliche Nachricht-
Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von
Tobias Heister
Gesendet: Donnerstag, 26. Juli 2018 09:26
An: Victor Sudakov; Pavel Lunin
Cc: juniper-nsp
Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into
linecard mode

Hi,

On 26.07.2018 09:06, Victor Sudakov wrote:
>>> I don't like to explain what others say but I think yes. It's been known
>>> behavior since always: in a two-member VC always disable
split-detection.
>>> You can google for other threads on this in this list.
>>>
>>> It's always been kind of poorly documented. Last time I checked the
docs,
>>> instead of just writing clearly that it must be disabled in two-members
>>> mode, they "don't recommend" it with some kind of hand-waving
explanation
>>> that if you estimate that the backup RE failure probability is higher
that
>>> a split-brain condition blah-blah-blah... Just disable split-detection,
>>> that's it :)
>>
>> Tomorrow we are planning a lab with and without split-detection. I
>> hope this solves the issue for us, and if it does, I'm sure to make a
>> note in my engineering journal.
> 
> Yes, no-split-detection did help.

I would like to add to that. My point of view is that you do not always
disable split-detection in a two member VC.
You can do so if you know what that implies.

The reasoning for the remaining node going into LC mode is that only the
portions of the VC having the majority of nodes stays up and operational. In
a two member VC if for whatever reason one of the nodes looses connection to
the other, we cannot have a majority so both sides go down. Even if it is
the only node remaining.

But imagine an error scenario where the second node does not crash, but for
whatever reason both sides stay up, but the connection between them gets
lost. With split-detection configured, both sides will go down and you have
a controlled service outage. When no split-detection is configured both
sides remain up and you might have interesting effects happening in your
network with two switches with the same configuration and same "identity"
being up and forwarding. I have seen that happening in DC scenarios doing
stp to other devices and it is not pretty!

So always check the implications of what the command are doing. If in your
case an active/active split scenario (for worst case) works out better than
a completely offline VC, that is perfectly fine. I have seen lots of
scenarios where it would not be the expected or wanted behavior. But in my
world a VC is no real redundancy method it is just stacking-NG for
additional ports under one MGMT so i would have two VCs if i relay need
redundancy in most setups. But that is just me ;)

-- 
Kind Regards
Tobias Heister
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Alexander Marhold
Hi 

According to the documentation there should be the following behavior with
split-detection enabled:
In case of a complete split:
If the Master-RE sees MORE THAN HALF of the devices it survives otherwise it
disables that part of the cluster
If the Backup-RE sees HALF of the devices the backup Re will survive and
play the master

--> so then in a 2node VC one node is Master one node is backup
If they split the master will go down but the backup should survive as it is
still half of the original cluster

So this means you should make the part you want to survive to be the
backup-RE and not the master-RE

--- or did I miss something ?!

Regards

Alexander

-Ursprüngliche Nachricht-
Von: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] Im Auftrag von
Tobias Heister
Gesendet: Donnerstag, 26. Juli 2018 09:26
An: Victor Sudakov; Pavel Lunin
Cc: juniper-nsp
Betreff: Re: [j-nsp] EX4200 virtual chassis problem, master going into
linecard mode

Hi,

On 26.07.2018 09:06, Victor Sudakov wrote:
>>> I don't like to explain what others say but I think yes. It's been known
>>> behavior since always: in a two-member VC always disable
split-detection.
>>> You can google for other threads on this in this list.
>>>
>>> It's always been kind of poorly documented. Last time I checked the
docs,
>>> instead of just writing clearly that it must be disabled in two-members
>>> mode, they "don't recommend" it with some kind of hand-waving
explanation
>>> that if you estimate that the backup RE failure probability is higher
that
>>> a split-brain condition blah-blah-blah... Just disable split-detection,
>>> that's it :)
>>
>> Tomorrow we are planning a lab with and without split-detection. I
>> hope this solves the issue for us, and if it does, I'm sure to make a
>> note in my engineering journal.
> 
> Yes, no-split-detection did help.

I would like to add to that. My point of view is that you do not always
disable split-detection in a two member VC.
You can do so if you know what that implies.

The reasoning for the remaining node going into LC mode is that only the
portions of the VC having the majority of nodes stays up and operational. In
a two member VC if for whatever reason one of the nodes looses connection to
the other, we cannot have a majority so both sides go down. Even if it is
the only node remaining.

But imagine an error scenario where the second node does not crash, but for
whatever reason both sides stay up, but the connection between them gets
lost. With split-detection configured, both sides will go down and you have
a controlled service outage. When no split-detection is configured both
sides remain up and you might have interesting effects happening in your
network with two switches with the same configuration and same "identity"
being up and forwarding. I have seen that happening in DC scenarios doing
stp to other devices and it is not pretty!

So always check the implications of what the command are doing. If in your
case an active/active split scenario (for worst case) works out better than
a completely offline VC, that is perfectly fine. I have seen lots of
scenarios where it would not be the expected or wanted behavior. But in my
world a VC is no real redundancy method it is just stacking-NG for
additional ports under one MGMT so i would have two VCs if i relay need
redundancy in most setups. But that is just me ;)

-- 
Kind Regards
Tobias Heister
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Tobias Heister

Hi,

On 26.07.2018 09:06, Victor Sudakov wrote:

I don't like to explain what others say but I think yes. It's been known
behavior since always: in a two-member VC always disable split-detection.
You can google for other threads on this in this list.

It's always been kind of poorly documented. Last time I checked the docs,
instead of just writing clearly that it must be disabled in two-members
mode, they "don't recommend" it with some kind of hand-waving explanation
that if you estimate that the backup RE failure probability is higher that
a split-brain condition blah-blah-blah... Just disable split-detection,
that's it :)


Tomorrow we are planning a lab with and without split-detection. I
hope this solves the issue for us, and if it does, I'm sure to make a
note in my engineering journal.


Yes, no-split-detection did help.


I would like to add to that. My point of view is that you do not always disable 
split-detection in a two member VC.
You can do so if you know what that implies.

The reasoning for the remaining node going into LC mode is that only the 
portions of the VC having the majority of nodes stays up and operational. In a 
two member VC if for whatever reason one of the nodes looses connection to the 
other, we cannot have a majority so both sides go down. Even if it is the only 
node remaining.

But imagine an error scenario where the second node does not crash, but for whatever 
reason both sides stay up, but the connection between them gets lost. With 
split-detection configured, both sides will go down and you have a controlled service 
outage. When no split-detection is configured both sides remain up and you might have 
interesting effects happening in your network with two switches with the same 
configuration and same "identity" being up and forwarding. I have seen that 
happening in DC scenarios doing stp to other devices and it is not pretty!

So always check the implications of what the command are doing. If in your case 
an active/active split scenario (for worst case) works out better than a 
completely offline VC, that is perfectly fine. I have seen lots of scenarios 
where it would not be the expected or wanted behavior. But in my world a VC is 
no real redundancy method it is just stacking-NG for additional ports under one 
MGMT so i would have two VCs if i relay need redundancy in most setups. But 
that is just me ;)

--
Kind Regards
Tobias Heister
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-26 Thread Victor Sudakov
Victor Sudakov wrote:
> Pavel Lunin wrote:
> > > > in a virtual chassis you could add:
> > > >
> > > > set virtual-chassis no-split-detection
> > > >
> > > > This will ensure that if both VC ports go down, the master routing
> > > engine carries on working.
> > >
> > > Are you referring to "Scenario B" in
> > > https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ?
> > > or a different case?
> > >
> > 
> > 
> > I don't like to explain what others say but I think yes. It's been known
> > behavior since always: in a two-member VC always disable split-detection.
> > You can google for other threads on this in this list.
> > 
> > It's always been kind of poorly documented. Last time I checked the docs,
> > instead of just writing clearly that it must be disabled in two-members
> > mode, they "don't recommend" it with some kind of hand-waving explanation
> > that if you estimate that the backup RE failure probability is higher that
> > a split-brain condition blah-blah-blah... Just disable split-detection,
> > that's it :)
> 
> Tomorrow we are planning a lab with and without split-detection. I
> hope this solves the issue for us, and if it does, I'm sure to make a
> note in my engineering journal.

Yes, no-split-detection did help. 

Thank you Catalin and Pavel very much again.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
AS43859
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-25 Thread Victor Sudakov
Pavel Lunin wrote:
> > > in a virtual chassis you could add:
> > >
> > > set virtual-chassis no-split-detection
> > >
> > > This will ensure that if both VC ports go down, the master routing
> > engine carries on working.
> >
> > Are you referring to "Scenario B" in
> > https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ?
> > or a different case?
> >
> 
> 
> I don't like to explain what others say but I think yes. It's been known
> behavior since always: in a two-member VC always disable split-detection.
> You can google for other threads on this in this list.
> 
> It's always been kind of poorly documented. Last time I checked the docs,
> instead of just writing clearly that it must be disabled in two-members
> mode, they "don't recommend" it with some kind of hand-waving explanation
> that if you estimate that the backup RE failure probability is higher that
> a split-brain condition blah-blah-blah... Just disable split-detection,
> that's it :)

Tomorrow we are planning a lab with and without split-detection. I
hope this solves the issue for us, and if it does, I'm sure to make a
note in my engineering journal.

Thank you Catalin and Pavel for your input.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
AS43859
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-25 Thread Pavel Lunin
> > in a virtual chassis you could add:
> >
> > set virtual-chassis no-split-detection
> >
> > This will ensure that if both VC ports go down, the master routing
> engine carries on working.
>
> Are you referring to "Scenario B" in
> https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ?
> or a different case?
>


I don't like to explain what others say but I think yes. It's been known
behavior since always: in a two-member VC always disable split-detection.
You can google for other threads on this in this list.

It's always been kind of poorly documented. Last time I checked the docs,
instead of just writing clearly that it must be disabled in two-members
mode, they "don't recommend" it with some kind of hand-waving explanation
that if you estimate that the backup RE failure probability is higher that
a split-brain condition blah-blah-blah... Just disable split-detection,
that's it :)
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-25 Thread Victor Sudakov
Catalin Dominte wrote:
> 
> 
> >
> > I've encountered an odd problem with adding EX4200s (running 12.3R6.6)
> > to Virtual Chassis with a nonprovisioned configuration file.

[dd]

> >
> > Looks fine, doesn't it? However, if I later poweroff the Backup switch
> > (BM0217040019), the current Master switch (BM0213317561) goes into linecard 
> > (sic!)
> > mode! And I lose ssh access to it. I can undo the disaster only from the
> > serial console.
> >

> If you only have 2 members 

Yes, I do.

> in a virtual chassis you could add:
> 
> set virtual-chassis no-split-detection
> 
> This will ensure that if both VC ports go down, the master routing engine 
> carries on working.

Are you referring to "Scenario B" in
https://kb.juniper.net/InfoCenter/index?page=content=KB13879 ?
or a different case?

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
AS43859
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4200 virtual chassis problem, master going into linecard mode

2018-07-25 Thread Catalin Dominte
If you only have 2 members in a virtual chassis you could add:

set virtual-chassis no-split-detection

This will ensure that if both VC ports go down, the master routing engine 
carries on working.


Catalin Dominte | Senior Network Consultant
Nocsult Ltd | 11 Castle Hill | Maidenhead | Berkshire | SL6 4AA | Phone: +44 
(0)1628 302 007
VAT registration number: GB 180957674 | Company registration number: 08886349
P Please consider the environment - Do you really need to print this email?

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the email and its 
attachments from all computers.
On 25 Jul 2018, 08:27 +0100, Victor Sudakov , wrote:
> Dear Colleagues,
>
> I've encountered an odd problem with adding EX4200s (running 12.3R6.6)
> to Virtual Chassis with a nonprovisioned configuration file.
>
> According to documentation, I zeroize the second switch, power it off,
> connect to the running switch and power it on. Some magic happens, and
> voila:
>
> > show virtual-chassis
>
> Virtual Chassis ID: 915f.3bb3.ff60
> Virtual Chassis Mode: Enabled
> Mstr Mixed Neighbor List
> Member ID Status Serial No Model prio Role Mode ID Interface
> 0 (FPC 0) Prsnt BM0213317561 ex4200-24t 128 Master* N 1 vcp-0
> 1 (FPC 1) Prsnt BM0217040019 ex4200-24t 128 Backup N 0 vcp-0
>
> Member ID for next new member: 2 (FPC 2)
>
> Looks fine, doesn't it? However, if I later poweroff the Backup switch
> (BM0217040019), the current Master switch (BM0213317561) goes into linecard 
> (sic!)
> mode! And I lose ssh access to it. I can undo the disaster only from the
> serial console.
>
> What am I doing wrong? Or if it's a bug, is there a workaround?
>
> --
> Victor Sudakov, VAS4-RIPE, VAS47-RIPN
> AS43859
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp