Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
OK good, I just read this. https://forums.juniper.net/jnet/attachments/jnet/Day1Books/360/1/DO_EVPNSforDCI.pdf Day One: Using Ethernet VPNs for Data Center Interconnect page 11, last sentence on that page... "EVPN also has mechanisms that prevent the looping of BUM traffic in an all-active multi-homed topology." -Aaron ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Thanks Hugo, what about leaf to leaf connection? Is that good? It Depends(tm). I would start with asking why you want to interconnect your leafs. Same question again about scaling out >2 as well as just what you're trying to accomplish with those links. A use case could be something like MLAG/VPC/whatever to bring L2 redundancy down to the node attachment. Personally I'm trying to kill the need for that (well, more just run L3 straight down to the host and be done with all layers of protocols and headers just to stretch L2 everywhere), but one battle at a time. -- Hugo Slabbert | email, xmpp/jabber: h...@slabnet.com pgp key: B178313E | also on Signal On Thu 2018-Nov-15 07:31:30 -0600, Aaron1 wrote: Thanks Hugo, what about leaf to leaf connection? Is that good? What about Layer 2 loop prevention? Aaron signature.asc Description: Digital signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
We are testing right now in our lab the new qfx5120 We are only waiting the official software release ... The box is here already The specs only shows L2circuit ... but we are waiting to see flexible ethernet encapsutation ( no vpls we already know ) to use vlan and mpls at the same interface. But the main ideia is to use it with evpn/vxlan configuration and try qinq in vtep After that we can post the results here Att Giuliano C. Medalha WZTECH NETWORKS +55 (17) 98112-5394 giuli...@wztech.com.br From: juniper-nsp on behalf of Aaron1 Sent: Friday, November 16, 2018 13:14 To: adamv0...@netconsultings.com Cc: rmcgov...@juniper.net; Juniper List Subject: Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ] Geez, sounds horrible , thanks Adam We are buying QFX-5120’s for our new DC build. How good is the MPLS services capability of the QFX-5120? Aaron On Nov 16, 2018, at 5:12 AM, wrote: >> Of Aaron1 >> Sent: Thursday, November 15, 2018 4:23 PM >> >> Well, I’m a data center rookie, so I appreciate your patience >> >> I do understand that layer 2 emulation is needed between data centers, if I >> do it with traditional mechanisms like VPLS or l2circuit martini, I’m just >> afraid >> if I make too many connections between spine and leaves that I might create >> a loop >> >> However, I’m beginning to think that EVPN may take care of all that stuff, >> again, still learning some of the stuff that data centers due >> >> > Hey Aaron, > > My advice would be if you're building a new DC build it as part of your MPLS > network (yes no boundaries). > > Rant// > The whole networking industry got it very wrong with the VXLAN technology, > that was one of the industry's biggest blunders. > The VXLAN project of DC folks is a good example of short sighted goals and > desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN > came to be). > SP folks then came up with EVPN as a replacement for VPLS and DC folks then > shoehorned it on top of VXLAN. > Then micro-segmentation buzzword came along and DC folks quickly realized > that there's no field in the VXLAN header to indicate common access group nor > the ability to stack VXLAN headers on top of each other (or some tried with > custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's > maintain access lists! -like it's 90's again. As an SP guy I'm just shaking > my head thinking did these guys ever heard of L2-VPNs which were around since > inception of MPLS? (so yes not telling people about mac addresses they should > not be talking to is better than telling everyone and then maintaining ACLs) > in SP sector we learned that in 90s. > Oh and then there's the Traffic-Engineering requirement to route mice flows > around elephant flows in the DC, not mentioning the ability to seamlessly > steer traffic flows right from VMs then across DC and MPLS core which is > impossible with VXLAN islands in form of DCs hanging off of MPLS core. > Rant\\ > > > > adam > > netconsultings.com > ::carrier-class solutions for the telecommunications industry:: > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp WZTECH is registered trademark of WZTECH NETWORKS. Copyright © 2018 WZTECH NETWORKS. All Rights Reserved. IMPORTANTE: As informações deste e-mail e o conteúdo dos eventuais documentos anexos são confidenciais e para conhecimento exclusivo do destinatário. Se o leitor desta mensagem não for o seu destinatário, fica desde já notificado de que não poderá divulgar, distribuir ou, sob qualquer forma, dar conhecimento a terceiros das informações e do conteúdo dos documentos anexos. Neste caso, favor comunicar imediatamente o remetente, respondendo este e-mail ou telefonando ao mesmo, e em seguida apague-o. CONFIDENTIALITY NOTICE: The information transmitted in this email message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any review, transmission, dissemination or other use of this information is prohibited. If you have received this communication in error, please notify the sender immediately and delete the material from any computer, including any copies. WZTECH is registered trademark of WZTECH NETWORKS. Copyright © 2018 WZTECH NETWORKS. All Rights Reserved. IMPORTANTE: As informações deste e-mail e o conteúdo dos eventuais documentos anexos são confidenciais e para conhecimento exclusivo do destinatário. Se o leitor desta mensagem não for o seu destinatário, fica desde já n
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Hi, On Fri, Nov 16, 2018 at 09:13:37AM -0600, Aaron1 wrote: > Geez, sounds horrible , thanks Adam > > We are buying QFX-5120???s for our new DC build. How good is the MPLS > services capability of the QFX-5120? Are they shipping already? Any success or horror stories? 25G looks promising for "10G is not enough, 40G is such a hassle", but it's the usual "new chip, new product, has it matured enough?" discussion. gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany g...@greenie.muc.de signature.asc Description: PGP signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Geez, sounds horrible , thanks Adam We are buying QFX-5120’s for our new DC build. How good is the MPLS services capability of the QFX-5120? Aaron On Nov 16, 2018, at 5:12 AM, wrote: >> Of Aaron1 >> Sent: Thursday, November 15, 2018 4:23 PM >> >> Well, I’m a data center rookie, so I appreciate your patience >> >> I do understand that layer 2 emulation is needed between data centers, if I >> do it with traditional mechanisms like VPLS or l2circuit martini, I’m just >> afraid >> if I make too many connections between spine and leaves that I might create >> a loop >> >> However, I’m beginning to think that EVPN may take care of all that stuff, >> again, still learning some of the stuff that data centers due >> >> > Hey Aaron, > > My advice would be if you're building a new DC build it as part of your MPLS > network (yes no boundaries). > > Rant// > The whole networking industry got it very wrong with the VXLAN technology, > that was one of the industry's biggest blunders. > The VXLAN project of DC folks is a good example of short sighted goals and > desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN > came to be). > SP folks then came up with EVPN as a replacement for VPLS and DC folks then > shoehorned it on top of VXLAN. > Then micro-segmentation buzzword came along and DC folks quickly realized > that there's no field in the VXLAN header to indicate common access group nor > the ability to stack VXLAN headers on top of each other (or some tried with > custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's > maintain access lists! -like it's 90's again. As an SP guy I'm just shaking > my head thinking did these guys ever heard of L2-VPNs which were around since > inception of MPLS? (so yes not telling people about mac addresses they should > not be talking to is better than telling everyone and then maintaining ACLs) > in SP sector we learned that in 90s. > Oh and then there's the Traffic-Engineering requirement to route mice flows > around elephant flows in the DC, not mentioning the ability to seamlessly > steer traffic flows right from VMs then across DC and MPLS core which is > impossible with VXLAN islands in form of DCs hanging off of MPLS core. > Rant\\ > > > > adam > > netconsultings.com > ::carrier-class solutions for the telecommunications industry:: > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
> Of Aaron1 > Sent: Thursday, November 15, 2018 4:23 PM > > Well, I’m a data center rookie, so I appreciate your patience > > I do understand that layer 2 emulation is needed between data centers, if I > do it with traditional mechanisms like VPLS or l2circuit martini, I’m just > afraid > if I make too many connections between spine and leaves that I might create > a loop > > However, I’m beginning to think that EVPN may take care of all that stuff, > again, still learning some of the stuff that data centers due > > Hey Aaron, My advice would be if you're building a new DC build it as part of your MPLS network (yes no boundaries). Rant// The whole networking industry got it very wrong with the VXLAN technology, that was one of the industry's biggest blunders. The VXLAN project of DC folks is a good example of short sighted goals and desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN came to be). SP folks then came up with EVPN as a replacement for VPLS and DC folks then shoehorned it on top of VXLAN. Then micro-segmentation buzzword came along and DC folks quickly realized that there's no field in the VXLAN header to indicate common access group nor the ability to stack VXLAN headers on top of each other (or some tried with custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's maintain access lists! -like it's 90's again. As an SP guy I'm just shaking my head thinking did these guys ever heard of L2-VPNs which were around since inception of MPLS? (so yes not telling people about mac addresses they should not be talking to is better than telling everyone and then maintaining ACLs) in SP sector we learned that in 90s. Oh and then there's the Traffic-Engineering requirement to route mice flows around elephant flows in the DC, not mentioning the ability to seamlessly steer traffic flows right from VMs then across DC and MPLS core which is impossible with VXLAN islands in form of DCs hanging off of MPLS core. Rant\\ adam netconsultings.com ::carrier-class solutions for the telecommunications industry:: ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
> Of Pavel Lunin > Sent: Friday, November 16, 2018 12:10 AM > > Gert Doering wrote: > > > > > EVPN is, basically, just putting a proper control-plane on top of MPLS > > or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it > > will scale like hell. > > > > "Like hell" is the right name for it. > > Not that I don't like EVPN but... a) EVPN is not necessarily L2 b) Ethernet is > still Ethernet, even over EVPN. In order to announce the MAC over BGP, you > first need to learn it. With all the consequences and prerequisites. > And, of course, mapping dynamically leaned stuff to BGP announces comes > at a cost of making BGP routes as stable as learned MACs. > > Magic doesn't exist. > It does and it's called PBB-EVPN No just kidding :) PBB on top of EVPN just brings back the conversational mac learning aspect of it and solves the scalability issues of pure EVPN (makes BGP independent of customer mac change rate or mac scale). But as you rightly pointed out it's still Ethernet with all its problems. Though I guess this "simulated" Ethernet is somewhat better than vanilla Ethernet since you have all these clever features like split horizon groups designated forwarders, multicast-style distribution of BOM traffic etc... which depending on who's driving might prevent one from shooting himself in the foot or provide enough rope to hang with... adam ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Gert Doering wrote: > > EVPN is, basically, just putting a proper control-plane on top of MPLS > or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it will > scale like hell. > "Like hell" is the right name for it. Not that I don't like EVPN but... a) EVPN is not necessarily L2 b) Ethernet is still Ethernet, even over EVPN. In order to announce the MAC over BGP, you first need to learn it. With all the consequences and prerequisites. And, of course, mapping dynamically leaned stuff to BGP announces comes at a cost of making BGP routes as stable as learned MACs. Magic doesn't exist. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Hi, On Thu, Nov 15, 2018 at 10:22:51AM -0600, Aaron1 wrote: > Well, I???m a data center rookie, so I appreciate your patience > > I do understand that layer 2 emulation is needed between data centers, if I > do it with traditional mechanisms like VPLS or l2circuit martini, I???m just > afraid if I make too many connections between spine and leaves that I might > create a loop Since these connections are all *routed*, the routing protocol takes care of loops. There is no redundant L2 anything (unless you do LACP links, but then LACP takes care of it) that could loop. The "user-visible layer2 network" stuff emulated via VXLAN, MPLS, ... might form loop, so how you attach downstream L2 "infrastructure" will pose some challenges - but this is totally independent from the leaf/spine infra. > However, I???m beginning to think that EVPN may take care of all that stuff, > again, still learning some of the stuff that data centers due EVPN is, basically, just putting a proper control-plane on top of MPLS or VXLAN for "L2 routing" - put your MAC addresses into BGP, and it will scale like hell. ISPs I've talked to like EVPN, because "this is BGP, I understand BGP". Enterprise folks find EVPN scary, because "this is BGP, nobody here knows about BGP"... :-) (and indeed, if BGP is news to you, there are way too many things that can be designed poorly, and half the "this is how you do a DC with EVPN" documents design their BGP in ways that I wouldn't do...) gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany g...@greenie.muc.de signature.asc Description: PGP signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Well, I’m a data center rookie, so I appreciate your patience I do understand that layer 2 emulation is needed between data centers, if I do it with traditional mechanisms like VPLS or l2circuit martini, I’m just afraid if I make too many connections between spine and leaves that I might create a loop However, I’m beginning to think that EVPN may take care of all that stuff, again, still learning some of the stuff that data centers due Aaron > On Nov 15, 2018, at 7:33 AM, Gert Doering wrote: > > Hi, > >> On Thu, Nov 15, 2018 at 07:31:30AM -0600, Aaron1 wrote: >> What about Layer 2 loop prevention? > > What is this "Layer 2 loop" thing? > > gert > -- > "If was one thing all people took for granted, was conviction that if you > feed honest figures into a computer, honest figures come out. Never doubted > it myself till I met a computer with a sense of humor." > Robert A. Heinlein, The Moon is a Harsh Mistress > > Gert Doering - Munich, Germany g...@greenie.muc.de ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Hi, On Thu, Nov 15, 2018 at 07:31:30AM -0600, Aaron1 wrote: > What about Layer 2 loop prevention? What is this "Layer 2 loop" thing? gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany g...@greenie.muc.de signature.asc Description: PGP signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
Thanks Hugo, what about leaf to leaf connection? Is that good? What about Layer 2 loop prevention? Aaron On Nov 14, 2018, at 10:51 PM, Hugo Slabbert wrote: >> This was all while talking about a data center redesign that we are working >> on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new >> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE >> >> I think we will connect each QFX to each mpc7e card. Is it best practice to >> not interconnect directly between the two QFX’s ? If so why not. > > Glib answer: because then it's not spine & leaf anymore ;) > > Less glib answer: > > 1. it's not needed and is suboptimal > > Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to > each spine. Connectivity between any two leafs is via any spine to which > they are both connected. Suppose you have 2 spines, spine1 and spine2, and, > say, 10 leaf switches. If a given leaf loses its connection to spine1, it > would then just reach all other leafs via spine2. > > If you add a connection between two spines, you do create an alternate path, > but it's also not an equal cost or optimal path. If we're going simple least > hops / shortest path, provided leaf1's connection to spine1 is lost, in > theory leaf2 could reach leaf1 via: > > leaf2 -> spine1 -> spine2 -> leaf1 > > ...but that would be a longer path than just going via the remaining: > > leaf2 -> spine2 -> leaf2 > > ...path. You could force it through the longer path, but why? > > 2. What's your oversub? > > The pitch on spine & leaf networks is generally their high bandwith, high > availability (lots of links), and low oversubscription ratios. For the sake > of illustration let's go away from chassis gear for spines to a simpler > option like, say, 32x100G Tomahawk spines. The spines there have capacity to > connect 32x leaf switches at line rate. Whatever connections the leaf > switches have to the spines do not have any further oversub imposed within > the spine layer. > > Now you interconnect your spines. How many of those 32x 100G ports are you > going to dedicate to spine interconnect? 2 links? If so, you've now dropped > the capacity for 2x more leafs in your fabric (and however many compute nodes > they were going to connect), and you're also only providing 200G interconnect > between spines for 3 Tbps of leaf connection capacity. Even if you ignore > the less optimal path thing from above and try to intentionally force a > fallback path on spine:leaf link failure to traverse your spine xconnect, you > can impose up to 15:1 oversub in that scenario. > > Or you could kill the oversub and carve out 16x of your 32x spine ports for > spine interconnects. But now you've shrunk your fabric significantly (can > only support 16 leaf switches)...and you've done so unnecessarily because the > redundancy model is for leafs to use their uplinks through spines directly > rather than using inter-spine links. > > 3. >2 spines > > What if we leaf1 loses its connection to spine2 and leafx loses its > connection to spine1? Have we not created a reachability problem? > > spine1 spine2 >/ \ > / \ > leaf1 leafx > > Why, yes we have. The design solution here is either >1 links between each > leaf & spine (cheating; blergh) or a greater number of spines. What's your > redundancy factor? Augment the above to 4x spines and you've significantly > shrunk your risk of creating connectivity islands. > > But if you've designed for interconnecting your spines, what do you for > interconnecting 4x spines? What about if you reach 6x spines? Again: the > model is that resilience is achieved at the leaf:spine interconnectivity > rather than at the "top of the tree" as you would have in a standard > hierarchical, 3-tier-type setup. > > -- > Hugo Slabbert | email, xmpp/jabber: h...@slabnet.com > pgp key: B178313E | also on Signal > >> On Tue 2018-Nov-06 12:38:22 -0600, Aaron1 wrote: >> >> This is a timely topic for me as I just got off a con-call yesterday with my >> Juniper SE and an SP specialist... >> >> They also recommended EVPN as the way ahead in place of things like fusion. >> They even somewhat shy away from MC-lag >> >> This was all while talking about a data center redesign that we are working >> on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new >> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE >> >> I think we will connect each QFX to each mpc7e card. Is it best practice to >> not interconnect directly between the two QFX’s ? If so why not. >> >> (please forgive, don’t mean to hijack thread, just some good topics going on >> here) >> >> Aaron ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
CoS will not work on the SD ports. On 15 Nov 2018, at 04:51, Hugo Slabbert wrote: >> This was all while talking about a data center redesign that we are working >> on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new >> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE >> >> I think we will connect each QFX to each mpc7e card. Is it best practice to >> not interconnect directly between the two QFX’s ? If so why not. > > Glib answer: because then it's not spine & leaf anymore ;) > > Less glib answer: > > 1. it's not needed and is suboptimal > > Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to > each spine. Connectivity between any two leafs is via any spine to which > they are both connected. Suppose you have 2 spines, spine1 and spine2, and, > say, 10 leaf switches. If a given leaf loses its connection to spine1, it > would then just reach all other leafs via spine2. > > If you add a connection between two spines, you do create an alternate path, > but it's also not an equal cost or optimal path. If we're going simple least > hops / shortest path, provided leaf1's connection to spine1 is lost, in > theory leaf2 could reach leaf1 via: > > leaf2 -> spine1 -> spine2 -> leaf1 > > ...but that would be a longer path than just going via the remaining: > > leaf2 -> spine2 -> leaf2 > > ...path. You could force it through the longer path, but why? > > 2. What's your oversub? > > The pitch on spine & leaf networks is generally their high bandwith, high > availability (lots of links), and low oversubscription ratios. For the sake > of illustration let's go away from chassis gear for spines to a simpler > option like, say, 32x100G Tomahawk spines. The spines there have capacity to > connect 32x leaf switches at line rate. Whatever connections the leaf > switches have to the spines do not have any further oversub imposed within > the spine layer. > > Now you interconnect your spines. How many of those 32x 100G ports are you > going to dedicate to spine interconnect? 2 links? If so, you've now dropped > the capacity for 2x more leafs in your fabric (and however many compute nodes > they were going to connect), and you're also only providing 200G interconnect > between spines for 3 Tbps of leaf connection capacity. Even if you ignore > the less optimal path thing from above and try to intentionally force a > fallback path on spine:leaf link failure to traverse your spine xconnect, you > can impose up to 15:1 oversub in that scenario. > > Or you could kill the oversub and carve out 16x of your 32x spine ports for > spine interconnects. But now you've shrunk your fabric significantly (can > only support 16 leaf switches)...and you've done so unnecessarily because the > redundancy model is for leafs to use their uplinks through spines directly > rather than using inter-spine links. > > 3. >2 spines > > What if we leaf1 loses its connection to spine2 and leafx loses its > connection to spine1? Have we not created a reachability problem? > > spine1 spine2 >/ \ > / \ > leaf1 leafx > > Why, yes we have. The design solution here is either >1 links between each > leaf & spine (cheating; blergh) or a greater number of spines. What's your > redundancy factor? Augment the above to 4x spines and you've significantly > shrunk your risk of creating connectivity islands. > > But if you've designed for interconnecting your spines, what do you for > interconnecting 4x spines? What about if you reach 6x spines? Again: the > model is that resilience is achieved at the leaf:spine interconnectivity > rather than at the "top of the tree" as you would have in a standard > hierarchical, 3-tier-type setup. > > -- > Hugo Slabbert | email, xmpp/jabber: h...@slabnet.com > pgp key: B178313E | also on Signal > >> On Tue 2018-Nov-06 12:38:22 -0600, Aaron1 wrote: >> >> This is a timely topic for me as I just got off a con-call yesterday with my >> Juniper SE and an SP specialist... >> >> They also recommended EVPN as the way ahead in place of things like fusion. >> They even somewhat shy away from MC-lag >> >> This was all while talking about a data center redesign that we are working >> on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new >> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE >> >> I think we will connect each QFX to each mpc7e card. Is it best practice to >> not interconnect directly between the two QFX’s ? If so why not. >> >> (please forgive, don’t mean to hijack thread, just some good topics going on >> here) >> >> Aaron > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman
[j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]
This was all while talking about a data center redesign that we are working on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new dual QFX5120 leaf to single MX960, dual MPC7E-MRATE I think we will connect each QFX to each mpc7e card. Is it best practice to not interconnect directly between the two QFX’s ? If so why not. Glib answer: because then it's not spine & leaf anymore ;) Less glib answer: 1. it's not needed and is suboptimal Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to each spine. Connectivity between any two leafs is via any spine to which they are both connected. Suppose you have 2 spines, spine1 and spine2, and, say, 10 leaf switches. If a given leaf loses its connection to spine1, it would then just reach all other leafs via spine2. If you add a connection between two spines, you do create an alternate path, but it's also not an equal cost or optimal path. If we're going simple least hops / shortest path, provided leaf1's connection to spine1 is lost, in theory leaf2 could reach leaf1 via: leaf2 -> spine1 -> spine2 -> leaf1 ...but that would be a longer path than just going via the remaining: leaf2 -> spine2 -> leaf2 ...path. You could force it through the longer path, but why? 2. What's your oversub? The pitch on spine & leaf networks is generally their high bandwith, high availability (lots of links), and low oversubscription ratios. For the sake of illustration let's go away from chassis gear for spines to a simpler option like, say, 32x100G Tomahawk spines. The spines there have capacity to connect 32x leaf switches at line rate. Whatever connections the leaf switches have to the spines do not have any further oversub imposed within the spine layer. Now you interconnect your spines. How many of those 32x 100G ports are you going to dedicate to spine interconnect? 2 links? If so, you've now dropped the capacity for 2x more leafs in your fabric (and however many compute nodes they were going to connect), and you're also only providing 200G interconnect between spines for 3 Tbps of leaf connection capacity. Even if you ignore the less optimal path thing from above and try to intentionally force a fallback path on spine:leaf link failure to traverse your spine xconnect, you can impose up to 15:1 oversub in that scenario. Or you could kill the oversub and carve out 16x of your 32x spine ports for spine interconnects. But now you've shrunk your fabric significantly (can only support 16 leaf switches)...and you've done so unnecessarily because the redundancy model is for leafs to use their uplinks through spines directly rather than using inter-spine links. 3. >2 spines What if we leaf1 loses its connection to spine2 and leafx loses its connection to spine1? Have we not created a reachability problem? spine1 spine2 / \ / \ leaf1 leafx Why, yes we have. The design solution here is either >1 links between each leaf & spine (cheating; blergh) or a greater number of spines. What's your redundancy factor? Augment the above to 4x spines and you've significantly shrunk your risk of creating connectivity islands. But if you've designed for interconnecting your spines, what do you for interconnecting 4x spines? What about if you reach 6x spines? Again: the model is that resilience is achieved at the leaf:spine interconnectivity rather than at the "top of the tree" as you would have in a standard hierarchical, 3-tier-type setup. -- Hugo Slabbert | email, xmpp/jabber: h...@slabnet.com pgp key: B178313E | also on Signal On Tue 2018-Nov-06 12:38:22 -0600, Aaron1 wrote: This is a timely topic for me as I just got off a con-call yesterday with my Juniper SE and an SP specialist... They also recommended EVPN as the way ahead in place of things like fusion. They even somewhat shy away from MC-lag This was all while talking about a data center redesign that we are working on currently. Replacing ToR VC EX4550’s connected LAG to ASR9K with new dual QFX5120 leaf to single MX960, dual MPC7E-MRATE I think we will connect each QFX to each mpc7e card. Is it best practice to not interconnect directly between the two QFX’s ? If so why not. (please forgive, don’t mean to hijack thread, just some good topics going on here) Aaron signature.asc Description: Digital signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp