Good day everyone.
For those of you that are using EVPN-MPLS, although this likely applies
equally
to VXLAN based transport, I have a question for you based on your
observations
in your production networks.
I have a basic configuration as follows:
* Two PE routers that provide connectivity to a single downstream device
via an all-active multi-homed LAG connection.
* The downstream device could be a single aggregation switch, or something
like an OLT. Either way, the downstream devices is configured with an
uplink port to each PE, and the ports in question use LACP.
* The two PE routers do not have any physical connections between each
other,
but instead have redundant connections to a pair of core routers. These
uplinks carry MPLS traffic.
* The upstream core routers are acting as route reflectors.
* MRAI is set to 0 for route exchange between the core and PE routers.
What I am curious to get feedback on is related to BUM traffic forwarding
in the
brief moment between the start and conclusion of a DF election; and risks
related to
BUM packets being forwarded back into the same ES from whence they came.
Scenario of concern:
* PE1 and PE2 are both in a steady state in the network, full routing
tables are
already propagated.
* The port between PE1 and the CE device is active, with LACP negotiated,
and with
PE1 having announced the relevant EVPN routes, specifically including the
type 1
and 4 routes; the port from PE2 to CE is not currently active.
* Upon the link between the CE device and PE2 becoming active, with LACP
negotiated,
PE2 should announce the relevant type 1 and 4 routes and the DF election
should commence.
Before the conclusion of the DF election, we expect the following to
happen, but we only
have reference to single vendor implementation and we know there can be RFC
interpretation
differences which lead to implementation differences.
* There will likely be a very brief moment wherein LACP is up and the port
on PE2 can
send and receive traffic, but the EVPN routes have not yet propagated
from PE2 to PE1 for
PE1 to include PE2's ESI label in outgoing BUM traffic that may need to
be delivered to
ports unrelated to this ES that may exist in the same EVI on PE2. The
result of which
is that those BUM packets could be forwarded by PE2 back to the CE device.
I'm not clear that this is avoidable, but I expect the propagation and
processing
period here is very short.
* Whereas, if PE2 received a BUM packet from the newly activated ES
interface
that was destined for PE1, it would include PE1's ES label in the stack.
As PE2,
in this scenario, already knows this label value prior to the activation
of the ES port,
I suspect there was never really a risk of PE1 receiving a BUM packet
from PE2,
from the same ES, that it then forwarded back into the ES.
* In this moment, we believe that PE2 should not assume the DF role, for no
other
reason than it clearly had received routes for this ES that indicated
another PE
already being active. My reading of RFC7432 in this regard does not seem
110%
explicit, but I don't know why PE2 would assume anything other than a
non-DF role
prior to the conclusion of the election.
* As soon as the type 1 and 4 routes reach from PE2 reach PE1 and they are
processed,
all future BUM packets sent to PE2 should have PE2's ESI label on the
stack, at
which point, PE2 should not forward BUM traffic into the ES, as long as
it didn't
assume the DF role prior to the conclusion of the election.
* After the election timer has concluded:
- the DF role may stay with PE1, at which point nothing really changes
other than the shared knowledge of that. All is good.
- the DF role could move to PE2, but both PE1 and PE2 have ESI labels
for each
other already, and it's really just the rest of the network adjusting
where it
sends BUM packets relative to this ES. I guess there's a chance that
there was
already a packet in flight to PE1 for this ES, and PE1 may not
forward the
packet into the ES; I'm not clear on this, but this isn't an area of
concern right now.
Other scenarios:
I'm frankly not worried about other scenarios as I suspect most platforms
have a holddown
timer that can be used to suppress forwarding of BUM packets into an ES
before routes
have a chance to propagate and the conclusion of the DF election.
What I'm concerned about in asking for this feedback is largely
interpretation of
section 8.5 (Designated Forwarder Election) of RFC7432, and how it, for
instance, doesn't
explicitly say that ahead of step 4, the new PE should assume a non-DF
role; and what
operators see in their production networks. Do the major manufacturers of
network gear
and network operating systems all do the same thing? Are there systemic
problems related
to ES looped BUM traffic prior to the conclusion of the DF election and we
just need to accept that?
Hopefully this all makes sense. If there is something I neglected to
comment on or consider,
or just got wrong, I'm happy to receive some education.
Thanks in advance,
Graham
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/[email protected]/message/2HKUYUYKKIAKZRBSNIA47IO2YO4E2LAK/