Gyan,

Yes, the GW redundancy in the dci draft is based on an “Interconnect” Ethernet 
Segment (I-ES), that uses the same DF Election, split-horizon, mass withdraw 
and aliasing/backup procedures as any Ethernet Segment.

Thanks.
Jorge

From: Gyan Mishra <hayabusa...@gmail.com>
Date: Monday, April 27, 2020 at 1:50 AM
To: "Rabadan, Jorge (Nokia - US/Mountain View)" <jorge.raba...@nokia.com>
Cc: BESS <bess@ietf.org>, Jeff Tantsura <jefftant.i...@gmail.com>, "Lukas 
Krattiger (lkrattig)" <lkrat...@cisco.com>, "saja...@cisco.com" 
<saja...@cisco.com>
Subject: Re: [bess] VXLAN BGP EVPN Question


Jorge

In the BGP EVPN NVO RFC 8365 there are  controls built in for Mac flooding 
related to intra pod with all active multi homed hosts. So with any multi home 
failure the mass mac withdrawal all NVEs reconverge to new next hop when the ES 
of failed gateway is withdrawn.  Also the backup path aliasing for multi homed 
always active for load balancing of remote NVEs. Split horizon filtering for 
BUM traffic to prevent looping back to different ES gateway connected to host.


So with the DCI overlay draft those same EVPN procedures for intra pod NVE to 
help with convergence and  flooding is now applied to the inter pod stitched 
NVE  via the UMR route for BUM traffic.

So the new UMR route type prevents re-flooding when the routes are all known 
via alias to redundant gateway similar to the backup path aliasing for load 
balancing intra-site.

Kind regards

Gyan

On Sun, Apr 26, 2020 at 10:02 AM Rabadan, Jorge (Nokia - US/Mountain View) 
<jorge.raba...@nokia.com<mailto:jorge.raba...@nokia.com>> wrote:
Hi Gyan,

Actually we started with the evpn dci draft in 2013 :-)

The way I see the unknown mac route it saves flooding if all the MACs in the 
POD/DC are known beforehand. The unknown unicast traffic can be aliased to the 
GWs. In case of failure in one of the GWs, the AD per-ES route for the I-ES 
will be withdrawn (mass withdraw for all EVIs) and the unknown traffic can be 
sent to the redundant GWs. So this failure won’t generate any extra flooding.

Thanks.
Jorge

From: Gyan Mishra <hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>>
Date: Saturday, April 25, 2020 at 8:45 AM
To: "Lukas Krattiger (lkrattig)" 
<lkrat...@cisco.com<mailto:lkrat...@cisco.com>>, 
"saja...@cisco.com<mailto:saja...@cisco.com>" 
<saja...@cisco.com<mailto:saja...@cisco.com>>
Cc: BESS <bess@ietf.org<mailto:bess@ietf.org>>, Jeff Tantsura 
<jefftant.i...@gmail.com<mailto:jefftant.i...@gmail.com>>, "Rabadan, Jorge 
(Nokia - US/Mountain View)" 
<jorge.raba...@nokia.com<mailto:jorge.raba...@nokia.com>>
Subject: Re: [bess] VXLAN BGP EVPN Question


+ Ali

Lukas

I noticed that Ali was on the multi site draft which I which expired in 2017 
around the same time the DCI overlay  draft was submitted.  I went through the 
logs but did not go through the mail archives to see what happen to multi site 
draft.  My guess is these were two competing drafts and multi site was geared 
solely to EVPN procedures for vxlan encapsulation and thus did not achieve WG 
adoption, where your DCI overlay draft accounts for every encapsulation type 
using EVPN procedures and is more comprehensive approach to DCI providing an 
improved solution to Multisite vxlan overlay stitching.

I like the re-origination of the VNI and RD idea using local context on the 
gateway as an additional control mechanism which prevents Type 2 mac-ip routes 
from being flooded between pods that should not without flood filters. With the 
multi site feature there are no control and all mobility routes are flooded 
unfortunately active or not.

With this draft is it possible to add a feature for conversation learning of 
only active flows when the type 1 BGP a-d is sent for initial BUM advertisement 
for arp or nd, there could be a snooping mechanism similar to IGMP snooping 
that discovers the active flow and thus creates the control plane level type 2 
Mac-IP state followed by being flooded in data plane NVE tunnel overlay.  I 
think this concept could apply intra site fabric leaf to leaf but I think would 
be extremely beneficial for inter pod or inter site.

This could be separate feature or option to the selective advertisement.

So the selective advertisement works in conjunction with re-origination of RD 
and locally significant VNI.

So what I would envision with the conversation learning active flow detection 
feature you would use global VNI and now only the active type-2 Mac-IP routes 
would be propagated inter pod or site.

This feature would be a tremendous benefit to operators and help with mac scale.

In our Cisco multisite feature implementations we do use the recommended BUM 
traffic multi site feature specific suppression applied on the BGW.  So that 
definitely helps with the BUM suppression for sure.

In section 3.5.1 UMR - so the route type is like a default Mac route 0/48 with 
ESI set to DCI gateway I-ESI for all active multi homing, and so instead of 
flooding all mac’s and have to rely on mass mac withdrawals during a failure, 
now only the UMR is withdrawn.  Is that correct?

That’s a huge savings on resources.

Kind regards

Gyan

On Fri, Apr 24, 2020 at 3:25 PM Lukas Krattiger (lkrattig) 
<lkrat...@cisco.com<mailto:lkrat...@cisco.com>> wrote:
Thanks Jorge and Jeff for guiding all the way thru the features and functions 
we have around, in DCI-overlay and Multi-Site.

Gyan,

Specific to the VNI distribution, BUM handling and the re-origination in 
Multi-Site.
With re-origination, the RDs are changed on the GW node. With this in mind, the 
VNI could be Global or local significant. In the case of local significants, we 
can stitch VNIs together (ie (VNI1 - GW - VNI2 - GW - VNI3).
Further, MAC- or IP-VRFs that are not supposed to be extended to a remote Sites 
will not advertise any MAC or IP routes beyond the local GW. This way you will 
keep the control-plane clean and avoid unnecessary creation of flood lists. 
This is what we call selective advertisement, which is different than 
conversational learning. Conversational learning could be a complement to 
selective advertisement. The unknown MAC approach that Jorge mentioned is a 
different approach for similar optimizations.
In addition to ARP suppression, in the specific Cisco implementation of 
Multi-Site, we provide a BUM traffic policer to rate limit between Sites. This 
policer are located on the GW and acts in the egress direction.

So with the DCI EVPN VNI translation does that end up netting the desired 
effect control plane segregation from data plane and providing that reduced 
size Mac VRF showing only active interesting traffic type 2 Mac-IP routes intra 
pod within the DC.

In a certain way, yes

Kind Regards
-Lukas


On Apr 24, 2020, at 7:21 AM, Rabadan, Jorge (Nokia - US/Mountain View) 
<jorge.raba...@nokia.com<mailto:jorge.raba...@nokia.com>> wrote:

Hi Gyan,

The dci evpn overlay draft indeed provides that segmentation. EVPN routes are 
readvertised at the GWs with change in RD/VNI/Nhop, and this certainly 
optimizes the BUM replication. From end leaf nodes. The draft also introduces 
the use of an unknown Mac route that the GWs can advertise to their local POD, 
as opposed to readvertise all the received MAC routes. This can be used under 
the assumption that if a mac is unknown for a leaf, it must be somewhere beyond 
the GW. Finally, the draft also allows you to use an I-ES for multihoming and 
have all-active to two or more GWs.

Note that this draft has multiple implementations, and the only reason why is 
not an RFC yet is due to a normative reference that must be cleared first.

Thanks.
Jorge

From: Gyan Mishra <hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>>
Date: Friday, April 24, 2020 at 3:54 PM
To: "Rabadan, Jorge (Nokia - US/Mountain View)" 
<jorge.raba...@nokia.com<mailto:jorge.raba...@nokia.com>>
Cc: BESS <bess@ietf.org<mailto:bess@ietf.org>>, Jeff Tantsura 
<jefftant.i...@gmail.com<mailto:jefftant.i...@gmail.com>>
Subject: Re: [bess] VXLAN BGP EVPN Question


Hi Jorge

I read through the draft and it sounds this vxlan segmentation is similar to 
multi site segmented multi part LSP used for DCI.   How  does this option 
compare or contrast with the multi site draft below.

With DCI evpn overlay you mentioned, the VNIs on the ASBRs are translated and 
not global.  Interesting.

With multi site the VNIs are Globally significant inter of intra site and an RT 
rewrite happens for the BGW to BGW middle segment to establish for the NVE to 
be stitched.

So with the DCI EVPN VNI translation does that end up netting the desired 
effect control plane segregation from data plane and providing that reduced 
size Mac VRF showing only active interesting traffic type 2 Mac-IP routes intra 
pod within the DC.

Multi site DCI
https://datatracker.ietf.org/doc/draft-sharma-multi-site-evpn/


Kind regards

Gyan

On Fri, Apr 24, 2020 at 3:07 AM Rabadan, Jorge (Nokia - US/Mountain View) 
<jorge.raba...@nokia.com<mailto:jorge.raba...@nokia.com>> wrote:
Hi Gyan,

If I may, note that:
https://tools.ietf.org/html/draft-ietf-bess-dci-evpn-overlay-10#section-4.6

Also provides vxlan segmentation, and while the description is based on DCI, 
you can perfectly use it for inter-pod connectivity.

Thanks.
Jorge

From: BESS <bess-boun...@ietf.org<mailto:bess-boun...@ietf.org>> on behalf of 
Gyan Mishra <hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>>
Date: Friday, April 24, 2020 at 5:21 AM
To: Jeff Tantsura <jefftant.i...@gmail.com<mailto:jefftant.i...@gmail.com>>
Cc: BESS <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] VXLAN BGP EVPN Question


Hi Jeff

Yes - Cisco has a draft for multi site for use cases capability of inter pod or 
inter site segmented path between desperate POD fabrics intra DC or as DCI 
option inter DC without MPLS.  The segmentation localizes BUM traffic and has 
border gateway DF election for BUM traffic that is segmented stitched between 
PODs as I mentioned similar to inter-as L3 vpn opt b.  There is a extra load as 
you said on the BGW border gateway performing the network vtep dencap from leaf 
and then again encap towards the egress border gateway.  Due to that extra load 
on the border gateway it’s not recommended to have spine function on BGW thus 
an extra layer for multi site to be scalable.  Definitely requires proprietary 
asic and not merchant silicon or white box solution.  The BUM traffic is much 
reduced as you stated from multi fabric connected super spine or single fabric 
spine that contains all leafs.  That decoupling sounds like incongruent control 
and data plane with Mac only Type 2 routes which would result in more BUM 
traffic  but it sounds like that maybe trade off of conversation learning only 
active flows versus entire data center wide Mac VRF being learned everywhere.  
I wonder if their is an option to have that real decoupling of EVPN control 
plane and vxlan data plane overlay that does not impact convergence but adds 
stability and only active flow Type 2 Mac learner across the fabric.

https://datatracker.ietf.org/doc/draft-sharma-multi-site-evpn/

Kind regards

Gyan

On Thu, Apr 23, 2020 at 6:04 PM Jeff Tantsura 
<jefftant.i...@gmail.com<mailto:jefftant.i...@gmail.com>> wrote:
Gyan,

"Multi site” is not really an IETF terminology, this is a solution implement by 
NX-OS, there’s a draft though. Its main functionality is to localize VxLAN 
tunnels and provide segmented path vs end2end full mesh of VxLAN tunnels 
(participating in the same EVI). We are talking HER here.
The feature is heavily HW dependent as it requires BUM re-encapsulation at the 
boundaries (leaf1->BGW1-BGW2->leaf2..n). So good luck seeing it soon on low end 
silicon.
It doesn’t eliminate BUM traffic but significantly reduces the span of 
“broadcast domain” and reduces the need for large flood domains (modern HW 
gives you ~512 large flood groups, obviously depending on HW)

Wrt your question about Mac conversation learning - this is an implementation 
issue, nothing in EVPN specifications precludes you of doing so, moreover in 
the implementation I was designing (in my previous life) we indeed decoupled 
data plane learning from control plane advertisement so control plane was aware 
of “Active” flows.  Needless to say - this creates  an additional layer of 
complexity and all kinds of funky states in the system ;-).

Hope this helps

Cheers,
Jeff
On Apr 23, 2020, 8:30 AM -0700, Gyan Mishra 
<hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>>, wrote:


Slight clarification with the arp traffic.  What I meant was broadcast traffic 
translated into BUM traffic with the EVPN architecture is there any way to 
reduce the amount of BUM traffic with a data center design requirement with 
vlan anywhere sprawl with 1000s of type 2 Mac mobility routes being learned 
between all the leaf VTEPs.

The elimination of broadcast is a tremendous gain and with broadcast 
suppression of multicast that does help but it would be nice to not have such 
massive Mac tables type 2 route churn chatter with a conversation learning 
where only active flows are are in the type 2 rib.

Kind regards

Gyan

On Wed, Apr 22, 2020 at 6:47 PM Gyan Mishra 
<hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>> wrote:

In the description of the vxlan BGP evpn scenario has a typo on the multisite 
feature segmented LSP inter pod with the RT auto rewrite which is similar to 
MPLS inter-as option b not a.

Kind regards

Gyan

On Wed, Apr 22, 2020 at 5:57 PM Gyan Mishra 
<hayabusa...@gmail.com<mailto:hayabusa...@gmail.com>> wrote:

All

Had a question related to vxlan BGP EVPN architecture specifications defined in 
BGP EVPN NVO3 overlay RFC 8365 and VXLAN data plane RFC 7348.

In a Data Center environment where you have a multiple PODs individual fabrics 
per POD connected via a super spine extension using a Multi site feature doing 
auto rewrite of RTs to stitch the NVE tunnel between pods similar to inter-as 
option A.

So in this scenario where you have vlan sprawl everywhere with L2 and L3 VNIs 
everywhere as if it were a a single L2 domain.  The topology is a typical vxlan 
spine leaf topology where the L3 leafs are the TOR switch so very small 
physical  L2 fault domain. So I was wondering if with the vxlan architecture if 
this feature below is possible or if their is a way to do so in the current 
specification.

Cisco use to have a DC product called “fabric path” which was based on 
conversation learning.

Is there any way with existing vxlan BGP evpn specification or maybe future 
enhancement to have a Mac conversation learning capability so that only the 
active mac’s that are part of a conversations flow are the mac that are flooded 
throughout the vxlan fabric.  That would really help tremendously with arp 
storms so if new arp entries are generated locally on a leaf they are not 
flooded through the fabric unless their are active flows between leafs.

Also is there a way to filter type 2 Mac mobility routes between leaf switches 
at the control plane level based on remote vtep or maybe other parameters..  
That would also reduce arp storms BUM traffic.

Kind regards

Gyan
--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


_______________________________________________
BESS mailing list
BESS@ietf.org<mailto:BESS@ietf.org>
https://www.ietf.org/mailman/listinfo/bess
--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


_______________________________________________
BESS mailing list
BESS@ietf.org<mailto:BESS@ietf.org>
https://www.ietf.org/mailman/listinfo/bess

--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


--
Gyan  Mishra
Network Engineering & Technology
Verizon
Silver Spring, MD 20904
Phone: 301 502-1347
Email: gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>


_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

Reply via email to