Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-20 Thread Aaron Gould
OK good, I just read this.  

https://forums.juniper.net/jnet/attachments/jnet/Day1Books/360/1/DO_EVPNSforDCI.pdf
 

Day One: Using Ethernet VPNs for Data Center Interconnect 

page 11, last sentence on that page...

"EVPN also has mechanisms that prevent the looping of BUM traffic in an 
all-active multi-homed topology."

-Aaron

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-18 Thread Hugo Slabbert

Thanks Hugo, what about leaf to leaf connection?  Is that good?


It Depends(tm).  I would start with asking why you want to interconnect 
your leafs.  Same question again about scaling out >2 as well as just what 
you're trying to accomplish with those links.  A use case could be 
something like MLAG/VPC/whatever to bring L2 redundancy down to the node 
attachment.  Personally I'm trying to kill the need for that (well, more 
just run L3 straight down to the host and be done with all layers of 
protocols and headers just to stretch L2 everywhere), but one battle at a 
time.


--
Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
pgp key: B178313E   | also on Signal

On Thu 2018-Nov-15 07:31:30 -0600, Aaron1  wrote:


Thanks Hugo, what about leaf to leaf connection?  Is that good?

What about Layer 2 loop prevention?

Aaron


signature.asc
Description: Digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-16 Thread Giuliano C. Medalha
We are testing right now in our lab the new qfx5120

We are only waiting the official software release ...

The box is here already

The specs only shows L2circuit ... but we are waiting to see flexible ethernet 
encapsutation ( no vpls we already know ) to use vlan and mpls at the same 
interface.

But the main ideia is to use it with evpn/vxlan configuration and try qinq in 
vtep

After that we can post the results here

Att

Giuliano C. Medalha
WZTECH NETWORKS
+55 (17) 98112-5394
giuli...@wztech.com.br


From: juniper-nsp  on behalf of Aaron1 

Sent: Friday, November 16, 2018 13:14
To: adamv0...@netconsultings.com
Cc: rmcgov...@juniper.net; Juniper List
Subject: Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: 
Opinions on fusion provider edge ]

Geez, sounds horrible , thanks Adam

We are buying QFX-5120’s for our new DC build. How good is the MPLS services 
capability of the QFX-5120?

Aaron

On Nov 16, 2018, at 5:12 AM,  
 wrote:

>> Of Aaron1
>> Sent: Thursday, November 15, 2018 4:23 PM
>>
>> Well, I’m a data center rookie, so I appreciate your patience
>>
>> I do understand that layer 2 emulation is needed between data centers, if I
>> do it with traditional mechanisms like VPLS or l2circuit martini, I’m just 
>> afraid
>> if I make too many connections between spine and leaves that I might create
>> a loop
>>
>> However, I’m beginning to think that EVPN may take care of all that stuff,
>> again, still learning some of the stuff that data centers due
>>
>>
> Hey Aaron,
>
> My advice would be if you're building a new DC build it as part of your MPLS 
> network (yes no boundaries).
>
> Rant//
> The whole networking industry got it very wrong with the VXLAN technology, 
> that was one of the industry's biggest blunders.
> The VXLAN project of DC folks is a good example of short sighted goals and 
> desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN 
> came to be).
> SP folks then came up with EVPN as a replacement for VPLS and DC folks then 
> shoehorned it on top of VXLAN.
> Then micro-segmentation buzzword came along and DC folks quickly realized 
> that there's no field in the VXLAN header to indicate common access group nor 
> the ability to stack VXLAN headers on top of each other (or some tried with 
> custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's 
> maintain access lists! -like it's 90's again. As an SP guy I'm just shaking 
> my head thinking did these guys ever heard of L2-VPNs which were around since 
> inception of MPLS? (so yes not telling people about mac addresses they should 
> not be talking to is better than telling everyone and then maintaining ACLs) 
> in SP sector we learned that in 90s.
> Oh and then there's the Traffic-Engineering requirement to route mice flows 
> around elephant flows in the DC, not mentioning the ability to seamlessly 
> steer traffic flows right from VMs then across DC and MPLS core which is 
> impossible with VXLAN islands in form of DCs hanging off of MPLS core.
> Rant\\
>
>
>
> adam
>
> netconsultings.com
> ::carrier-class solutions for the telecommunications industry::
>

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

WZTECH is registered trademark of WZTECH NETWORKS.
Copyright © 2018 WZTECH NETWORKS. All Rights Reserved.

IMPORTANTE:
As informações deste e-mail e o conteúdo dos eventuais documentos anexos são 
confidenciais e para conhecimento exclusivo do destinatário. Se o leitor desta 
mensagem não for o seu destinatário, fica desde já notificado de que não poderá 
divulgar, distribuir ou, sob qualquer forma, dar conhecimento a terceiros das 
informações e do conteúdo dos documentos anexos. Neste caso, favor comunicar 
imediatamente o remetente, respondendo este e-mail ou telefonando ao mesmo, e 
em seguida apague-o.

CONFIDENTIALITY NOTICE:
The information transmitted in this email message and any attachments are 
solely for the intended recipient and may contain confidential or privileged 
information. If you are not the intended recipient, any review, transmission, 
dissemination or other use of this information is prohibited. If you have 
received this communication in error, please notify the sender immediately and 
delete the material from any computer, including any copies.

WZTECH is registered trademark of WZTECH NETWORKS.
Copyright © 2018 WZTECH NETWORKS. All Rights Reserved.

IMPORTANTE:
As informações deste e-mail e o conteúdo dos eventuais documentos anexos são 
confidenciais e para conhecimento exclusivo do destinatário. Se o leitor desta 
mensagem não for o seu destinatário, fica desde já n

Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-16 Thread Gert Doering
Hi,

On Fri, Nov 16, 2018 at 09:13:37AM -0600, Aaron1 wrote:
> Geez, sounds horrible , thanks Adam
> 
> We are buying QFX-5120???s for our new DC build.  How good is the MPLS 
> services capability of the QFX-5120?

Are they shipping already?  Any success or horror stories?

25G looks promising for "10G is not enough, 40G is such a hassle", but
it's the usual "new chip, new product, has it matured enough?" discussion.

gert
-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-16 Thread Aaron1
Geez, sounds horrible , thanks Adam

We are buying QFX-5120’s for our new DC build.  How good is the MPLS services 
capability of the QFX-5120?

Aaron

On Nov 16, 2018, at 5:12 AM,  
 wrote:

>> Of Aaron1
>> Sent: Thursday, November 15, 2018 4:23 PM
>> 
>> Well, I’m a data center rookie, so I appreciate your patience
>> 
>> I do understand that layer 2 emulation is needed between data centers, if I
>> do it with traditional mechanisms like VPLS or l2circuit martini, I’m just 
>> afraid
>> if I make too many connections between spine and leaves that I might create
>> a loop
>> 
>> However, I’m beginning to think that EVPN may take care of all that stuff,
>> again, still learning some of the stuff that data centers due
>> 
>> 
> Hey Aaron,
> 
> My advice would be if you're building a new DC build it as part of your MPLS 
> network (yes no boundaries).
> 
> Rant//
> The whole networking industry got it very wrong with the VXLAN technology, 
> that was one of the industry's biggest blunders. 
> The VXLAN project of DC folks is a good example of short sighted goals and 
> desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN 
> came to be).
> SP folks then came up with EVPN as a replacement for VPLS and DC folks then 
> shoehorned it on top of VXLAN.
> Then micro-segmentation buzzword came along and DC folks quickly realized 
> that there's no field in the VXLAN header to indicate common access group nor 
> the ability to stack VXLAN headers on top of each other (or some tried with 
> custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's 
> maintain access lists! -like it's 90's again. As an SP guy I'm just shaking 
> my head thinking did these guys ever heard of L2-VPNs which were around since 
> inception of MPLS? (so yes not telling people about mac addresses they should 
> not be talking to is better than telling everyone and then maintaining ACLs) 
> in SP sector we learned that in 90s. 
> Oh and then there's the Traffic-Engineering requirement to route mice flows 
> around elephant flows in the DC, not mentioning the ability to seamlessly 
> steer traffic flows right from VMs then across DC and MPLS core which is 
> impossible with VXLAN islands in form of DCs hanging off of MPLS core. 
> Rant\\
> 
> 
> 
> adam
> 
> netconsultings.com
> ::carrier-class solutions for the telecommunications industry::
> 

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-16 Thread adamv0025
> Of Aaron1
> Sent: Thursday, November 15, 2018 4:23 PM
> 
> Well, I’m a data center rookie, so I appreciate your patience
> 
> I do understand that layer 2 emulation is needed between data centers, if I
> do it with traditional mechanisms like VPLS or l2circuit martini, I’m just 
> afraid
> if I make too many connections between spine and leaves that I might create
> a loop
> 
> However, I’m beginning to think that EVPN may take care of all that stuff,
> again, still learning some of the stuff that data centers due
> 
> 
Hey Aaron,

My advice would be if you're building a new DC build it as part of your MPLS 
network (yes no boundaries).

Rant//
The whole networking industry got it very wrong with the VXLAN technology, that 
was one of the industry's biggest blunders. 
The VXLAN project of DC folks is a good example of short sighted goals and 
desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN 
came to be).
SP folks then came up with EVPN as a replacement for VPLS and DC folks then 
shoehorned it on top of VXLAN.
Then micro-segmentation buzzword came along and DC folks quickly realized that 
there's no field in the VXLAN header to indicate common access group nor the 
ability to stack VXLAN headers on top of each other (or some tried with custom 
VXLAN spin offs) so DC folks came up with a brilliant idea -let's maintain 
access lists! -like it's 90's again. As an SP guy I'm just shaking my head 
thinking did these guys ever heard of L2-VPNs which were around since inception 
of MPLS? (so yes not telling people about mac addresses they should not be 
talking to is better than telling everyone and then maintaining ACLs) in SP 
sector we learned that in 90s. 
Oh and then there's the Traffic-Engineering requirement to route mice flows 
around elephant flows in the DC, not mentioning the ability to seamlessly steer 
traffic flows right from VMs then across DC and MPLS core which is impossible 
with VXLAN islands in form of DCs hanging off of MPLS core. 
Rant\\



adam

netconsultings.com
::carrier-class solutions for the telecommunications industry::

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-16 Thread adamv0025
> Of Pavel Lunin
> Sent: Friday, November 16, 2018 12:10 AM
> 
> Gert Doering wrote:
> 
> >
> > EVPN is, basically, just putting a proper control-plane on top of MPLS
> > or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it
> > will scale like hell.
> >
> 
> "Like hell" is the right name for it.
> 
> Not that I don't like EVPN but... a) EVPN is not necessarily L2 b)
Ethernet is
> still Ethernet, even over EVPN. In order to announce the MAC over BGP, you
> first need to learn it. With all the consequences and prerequisites.
> And, of course, mapping dynamically leaned stuff to BGP announces comes
> at a cost of making BGP routes as stable as learned MACs.
> 
> Magic doesn't exist.
>
It does and it's called PBB-EVPN
No just kidding :)  

PBB on top of EVPN just brings back the conversational mac learning aspect
of it and solves the scalability issues of pure EVPN (makes BGP independent
of customer mac change rate or mac scale).
But as you rightly pointed out it's still Ethernet with all its problems. 
Though I guess this "simulated" Ethernet is somewhat better than vanilla
Ethernet since you have all these clever features like split horizon groups
designated forwarders, multicast-style distribution of BOM traffic etc...
which depending on who's driving might prevent one from shooting himself in
the foot or provide enough rope to hang with...

adam


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-15 Thread Pavel Lunin
Gert Doering wrote:

>
> EVPN is, basically, just putting a proper control-plane on top of MPLS
> or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it will
> scale like hell.
>

"Like hell" is the right name for it.

Not that I don't like EVPN but... a) EVPN is not necessarily L2 b) Ethernet
is still Ethernet, even over EVPN. In order to announce the MAC over BGP,
you first need to learn it. With all the consequences and prerequisites.
And, of course, mapping dynamically leaned stuff to BGP announces comes at
a cost of making BGP routes as stable as learned MACs.

Magic doesn't exist.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-15 Thread Gert Doering
Hi,

On Thu, Nov 15, 2018 at 10:22:51AM -0600, Aaron1 wrote:
> Well, I???m a data center rookie, so I appreciate your patience
> 
> I do understand that layer 2 emulation is needed between data centers, if I 
> do it with traditional mechanisms like VPLS or l2circuit martini, I???m just 
> afraid if I make too many connections between spine and leaves that I might 
> create a loop

Since these connections are all *routed*, the routing protocol takes care
of loops.  There is no redundant L2 anything (unless you do LACP links,
but then LACP takes care of it) that could loop.

The "user-visible layer2 network" stuff emulated via VXLAN, MPLS, ...
might form loop, so how you attach downstream L2 "infrastructure" will pose 
some challenges - but this is totally independent from the leaf/spine
infra.

> However, I???m beginning to think that EVPN may take care of all that stuff, 
> again, still learning some of the stuff that data centers due

EVPN is, basically, just putting a proper control-plane on top of MPLS
or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it will
scale like hell.

ISPs I've talked to like EVPN, because "this is BGP, I understand BGP".

Enterprise folks find EVPN scary, because "this is BGP, nobody here knows
about BGP"... :-)  (and indeed, if BGP is news to you, there are way too
many things that can be designed poorly, and half the "this is how you do
a DC with EVPN" documents design their BGP in ways that I wouldn't do...)

gert

-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-15 Thread Aaron1
Well, I’m a data center rookie, so I appreciate your patience

I do understand that layer 2 emulation is needed between data centers, if I do 
it with traditional mechanisms like VPLS or l2circuit martini, I’m just afraid 
if I make too many connections between spine and leaves that I might create a 
loop

However, I’m beginning to think that EVPN may take care of all that stuff, 
again, still learning some of the stuff that data centers due



Aaron

> On Nov 15, 2018, at 7:33 AM, Gert Doering  wrote:
> 
> Hi,
> 
>> On Thu, Nov 15, 2018 at 07:31:30AM -0600, Aaron1 wrote:
>> What about Layer 2 loop prevention?
> 
> What is this "Layer 2 loop" thing?
> 
> gert
> -- 
> "If was one thing all people took for granted, was conviction that if you 
> feed honest figures into a computer, honest figures come out. Never doubted 
> it myself till I met a computer with a sense of humor."
> Robert A. Heinlein, The Moon is a Harsh Mistress
> 
> Gert Doering - Munich, Germany g...@greenie.muc.de

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-15 Thread Gert Doering
Hi,

On Thu, Nov 15, 2018 at 07:31:30AM -0600, Aaron1 wrote:
> What about Layer 2 loop prevention?

What is this "Layer 2 loop" thing?

gert
-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-15 Thread Aaron1
Thanks Hugo, what about leaf to leaf connection?  Is that good?

What about Layer 2 loop prevention?

Aaron

On Nov 14, 2018, at 10:51 PM, Hugo Slabbert  wrote:

>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
> 
> Glib answer: because then it's not spine & leaf anymore ;)
> 
> Less glib answer:
> 
> 1. it's not needed and is suboptimal
> 
> Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to 
> each spine.  Connectivity between any two leafs is via any spine to which 
> they are both connected.  Suppose you have 2 spines, spine1 and spine2, and, 
> say, 10 leaf switches. If a given leaf loses its connection to spine1, it 
> would then just reach all other leafs via spine2.
> 
> If you add a connection between two spines, you do create an alternate path, 
> but it's also not an equal cost or optimal path.  If we're going simple least 
> hops / shortest path, provided leaf1's connection to spine1 is lost, in 
> theory leaf2 could reach leaf1 via:
> 
> leaf2 -> spine1 -> spine2 -> leaf1
> 
> ...but that would be a longer path than just going via the remaining:
> 
> leaf2 -> spine2 -> leaf2
> 
> ...path.  You could force it through the longer path, but why?
> 
> 2. What's your oversub?
> 
> The pitch on spine & leaf networks is generally their high bandwith, high 
> availability (lots of links), and low oversubscription ratios.  For the sake 
> of illustration let's go away from chassis gear for spines to a simpler 
> option like, say, 32x100G Tomahawk spines.  The spines there have capacity to 
> connect 32x leaf switches at line rate.  Whatever connections the leaf 
> switches have to the spines do not have any further oversub imposed within 
> the spine layer.
> 
> Now you interconnect your spines.  How many of those 32x 100G ports are you 
> going to dedicate to spine interconnect?  2 links?  If so, you've now dropped 
> the capacity for 2x more leafs in your fabric (and however many compute nodes 
> they were going to connect), and you're also only providing 200G interconnect 
> between spines for 3 Tbps of leaf connection capacity.  Even if you ignore 
> the less optimal path thing from above and try to intentionally force a 
> fallback path on spine:leaf link failure to traverse your spine xconnect, you 
> can impose up to 15:1 oversub in that scenario.
> 
> Or you could kill the oversub and carve out 16x of your 32x spine ports for 
> spine interconnects.  But now you've shrunk your fabric significantly (can 
> only support 16 leaf switches)...and you've done so unnecessarily because the 
> redundancy model is for leafs to use their uplinks through spines directly 
> rather than using inter-spine links.
> 
> 3. >2 spines
> 
> What if we leaf1 loses its connection to spine2 and leafx loses its 
> connection to spine1?  Have we not created a reachability problem?
> 
> spine1 spine2
>/   \
>  /  \
> leaf1  leafx
> 
> Why, yes we have.  The design solution here is either >1 links between each 
> leaf & spine (cheating; blergh) or a greater number of spines.  What's your 
> redundancy factor?  Augment the above to 4x spines and you've significantly 
> shrunk your risk of creating connectivity islands.
> 
> But if you've designed for interconnecting your spines, what do you for 
> interconnecting 4x spines?  What about if you reach 6x spines?  Again: the 
> model is that resilience is achieved at the leaf:spine interconnectivity 
> rather than at the "top of the tree" as you would have in a standard 
> hierarchical, 3-tier-type setup.
> 
> -- 
> Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
> pgp key: B178313E   | also on Signal
> 
>> On Tue 2018-Nov-06 12:38:22 -0600, Aaron1  wrote:
>> 
>> This is a timely topic for me as I just got off a con-call yesterday with my 
>> Juniper SE and an SP specialist...
>> 
>> They also recommended EVPN as the way ahead in place of things like fusion.  
>> They even somewhat shy away from MC-lag
>> 
>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
>> 
>> (please forgive, don’t mean to hijack thread, just some good topics going on 
>> here)
>> 
>> Aaron

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-14 Thread Nikos Leontsinis
CoS will not work on the SD ports.

On 15 Nov 2018, at 04:51, Hugo Slabbert  wrote:

>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
> 
> Glib answer: because then it's not spine & leaf anymore ;)
> 
> Less glib answer:
> 
> 1. it's not needed and is suboptimal
> 
> Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to 
> each spine.  Connectivity between any two leafs is via any spine to which 
> they are both connected.  Suppose you have 2 spines, spine1 and spine2, and, 
> say, 10 leaf switches. If a given leaf loses its connection to spine1, it 
> would then just reach all other leafs via spine2.
> 
> If you add a connection between two spines, you do create an alternate path, 
> but it's also not an equal cost or optimal path.  If we're going simple least 
> hops / shortest path, provided leaf1's connection to spine1 is lost, in 
> theory leaf2 could reach leaf1 via:
> 
> leaf2 -> spine1 -> spine2 -> leaf1
> 
> ...but that would be a longer path than just going via the remaining:
> 
> leaf2 -> spine2 -> leaf2
> 
> ...path.  You could force it through the longer path, but why?
> 
> 2. What's your oversub?
> 
> The pitch on spine & leaf networks is generally their high bandwith, high 
> availability (lots of links), and low oversubscription ratios.  For the sake 
> of illustration let's go away from chassis gear for spines to a simpler 
> option like, say, 32x100G Tomahawk spines.  The spines there have capacity to 
> connect 32x leaf switches at line rate.  Whatever connections the leaf 
> switches have to the spines do not have any further oversub imposed within 
> the spine layer.
> 
> Now you interconnect your spines.  How many of those 32x 100G ports are you 
> going to dedicate to spine interconnect?  2 links?  If so, you've now dropped 
> the capacity for 2x more leafs in your fabric (and however many compute nodes 
> they were going to connect), and you're also only providing 200G interconnect 
> between spines for 3 Tbps of leaf connection capacity.  Even if you ignore 
> the less optimal path thing from above and try to intentionally force a 
> fallback path on spine:leaf link failure to traverse your spine xconnect, you 
> can impose up to 15:1 oversub in that scenario.
> 
> Or you could kill the oversub and carve out 16x of your 32x spine ports for 
> spine interconnects.  But now you've shrunk your fabric significantly (can 
> only support 16 leaf switches)...and you've done so unnecessarily because the 
> redundancy model is for leafs to use their uplinks through spines directly 
> rather than using inter-spine links.
> 
> 3. >2 spines
> 
> What if we leaf1 loses its connection to spine2 and leafx loses its 
> connection to spine1?  Have we not created a reachability problem?
> 
> spine1 spine2
>/   \
>  /  \
> leaf1  leafx
> 
> Why, yes we have.  The design solution here is either >1 links between each 
> leaf & spine (cheating; blergh) or a greater number of spines.  What's your 
> redundancy factor?  Augment the above to 4x spines and you've significantly 
> shrunk your risk of creating connectivity islands.
> 
> But if you've designed for interconnecting your spines, what do you for 
> interconnecting 4x spines?  What about if you reach 6x spines?  Again: the 
> model is that resilience is achieved at the leaf:spine interconnectivity 
> rather than at the "top of the tree" as you would have in a standard 
> hierarchical, 3-tier-type setup.
> 
> -- 
> Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
> pgp key: B178313E   | also on Signal
> 
>> On Tue 2018-Nov-06 12:38:22 -0600, Aaron1  wrote:
>> 
>> This is a timely topic for me as I just got off a con-call yesterday with my 
>> Juniper SE and an SP specialist...
>> 
>> They also recommended EVPN as the way ahead in place of things like fusion.  
>> They even somewhat shy away from MC-lag
>> 
>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
>> 
>> (please forgive, don’t mean to hijack thread, just some good topics going on 
>> here)
>> 
>> Aaron
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman

[j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

2018-11-14 Thread Hugo Slabbert
This was all while talking about a data center redesign that we are 
working on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K 
with new dual QFX5120 leaf to single MX960, dual MPC7E-MRATE


I think we will connect each QFX to each mpc7e card.  Is it best practice to 
not interconnect directly between the two QFX’s ? If so why not.


Glib answer: because then it's not spine & leaf anymore ;)

Less glib answer:

1. it's not needed and is suboptimal

Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected 
to each spine.  Connectivity between any two leafs is via any spine to 
which they are both connected.  Suppose you have 2 spines, spine1 and 
spine2, and, say, 10 leaf switches. If a given leaf loses its connection to 
spine1, it would then just reach all other leafs via spine2.


If you add a connection between two spines, you do create an alternate 
path, but it's also not an equal cost or optimal path.  If we're going 
simple least hops / shortest path, provided leaf1's connection to spine1 is 
lost, in theory leaf2 could reach leaf1 via:


leaf2 -> spine1 -> spine2 -> leaf1

...but that would be a longer path than just going via the remaining:

leaf2 -> spine2 -> leaf2

...path.  You could force it through the longer path, but why?

2. What's your oversub?

The pitch on spine & leaf networks is generally their high bandwith, high 
availability (lots of links), and low oversubscription ratios.  For the 
sake of illustration let's go away from chassis gear for spines to a 
simpler option like, say, 32x100G Tomahawk spines.  The spines there have 
capacity to connect 32x leaf switches at line rate.  Whatever connections 
the leaf switches have to the spines do not have any further oversub 
imposed within the spine layer.


Now you interconnect your spines.  How many of those 32x 100G ports are you 
going to dedicate to spine interconnect?  2 links?  If so, you've now 
dropped the capacity for 2x more leafs in your fabric (and however many 
compute nodes they were going to connect), and you're also only providing 
200G interconnect between spines for 3 Tbps of leaf connection capacity.  
Even if you ignore the less optimal path thing from above and try to 
intentionally force a fallback path on spine:leaf link failure to traverse 
your spine xconnect, you can impose up to 15:1 oversub in that scenario.


Or you could kill the oversub and carve out 16x of your 32x spine ports for 
spine interconnects.  But now you've shrunk your fabric significantly (can 
only support 16 leaf switches)...and you've done so unnecessarily because 
the redundancy model is for leafs to use their uplinks through spines 
directly rather than using inter-spine links.


3. >2 spines

What if we leaf1 loses its connection to spine2 and leafx loses its 
connection to spine1?  Have we not created a reachability problem?


 spine1 spine2
/   \
  /  \
leaf1  leafx

Why, yes we have.  The design solution here is either >1 links between each 
leaf & spine (cheating; blergh) or a greater number of spines.  What's your 
redundancy factor?  Augment the above to 4x spines and you've significantly 
shrunk your risk of creating connectivity islands.


But if you've designed for interconnecting your spines, what do you for 
interconnecting 4x spines?  What about if you reach 6x spines?  Again: the 
model is that resilience is achieved at the leaf:spine interconnectivity 
rather than at the "top of the tree" as you would have in a standard 
hierarchical, 3-tier-type setup.


--
Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
pgp key: B178313E   | also on Signal

On Tue 2018-Nov-06 12:38:22 -0600, Aaron1  wrote:


This is a timely topic for me as I just got off a con-call yesterday with my 
Juniper SE and an SP specialist...

They also recommended EVPN as the way ahead in place of things like fusion.  
They even somewhat shy away from MC-lag

This was all while talking about a data center redesign that we are working on 
currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new dual 
QFX5120 leaf to single MX960, dual MPC7E-MRATE

I think we will connect each QFX to each mpc7e card.  Is it best practice to 
not interconnect directly between the two QFX’s ? If so why not.

(please forgive, don’t mean to hijack thread, just some good topics going on 
here)

Aaron


signature.asc
Description: Digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp