[GROW] Re: Question: How best to deal with network operator error in creation of ASPA?

Maria Matejka Sun, 13 Jul 2025 08:25:56 -0700

Hello Sriram,

(writing as an implementor, doing also techsupport)


please note that the large providers are _not_ these who would do any
ASPA deployment first. The end networks will do, and ultimately my
question is *How do I, as a leaf network operator, find out that I have
made an error?*

> The approach proposed by Maria (which you support) does not function
> as intended when the erring remote AS is multi-homed. In such cases,
> the remote AS’s alternate route propagates to all ASes in the Internet
> – whether they perform ASPA verification or not – resulting in the
> remote AS remaining unaware of the error in their ASPA.

To reiterate, the approach proposed by me, after discussing in the
previous thread, is ultimately this:

1. ingress check from Customer: prepend self, run Upstream algo
2. ingress check from Peer / RS: run Upstream algo
3. ingress check from Provider: prepend self, run Downstream algo
4. egress check to Customer: prepend self and the Customer, run Downstream algo
5. egress check to Peer / RS: prepend self, run Upstream algo
6. egress check to Provider: prepend self and the Provider, run Upstream algo

The motivation behind the prepending is this:

- the route is inevitably doomed to get that exact specific AS Path later on
- in cases 3, 5, 6, we catch our own error (this is the major advantage)
- in case 1, we ensure that our customer ran their own check (6)
- in case 4, we catch our Customer's error on our other side before
  they even run their check on ingress (3)

This way, we check as much and as soon as we can. And the BGP Role still
tells us which variant we use.

This indeed does not work for Complex relationships. That's OK, it's the
same case as with BGP Roles. Exactly the same case. They will figure it
out. We just have to design the algorithm in such a way that it fails at
the source of the error, or in other words, as Randy Bush said earlier
in that aforementioned thread on this topic last month, no garbage in,
no garbage out.

https://mailarchive.ietf.org/arch/msg/sidrops/Vs9Yx5x8T8qk5PsvcmUIjyP7oOY/

> As you seem to agree, the network operator at the local AS should not be left 
> unaware if a customer is effectively cut off (i.e., all their routes are 
> dropped). The local AS operator must have the ability to manage such 
> situations proactively.

Which means they should be able to see it _before_ they send anything
out.

> Considering Maria’s and your inputs, I suggest the following approach:
> 
> 
>   *   During ASPA verification, when the remote (sending) AS is a customer, 
> the following check if performed:
>      *   The remote AS has an ASPA record, and
>      *   The SPAS obtained from the ASPA does not include the local AS.
>   *   If this check evaluates to True, an alert MUST be generated for the 
> local AS.
>   *   The local AS operator MUST have an automated procedure to process this 
> alert and decide whether to terminate the BGP session with the remote AS.
>   *   Regardless of whether the BGP session is terminated, the local AS MUST 
> notify the remote AS about the error in their ASPA.
>   *   If the BGP session was terminated, it is re-initiated after the error 
> in the ASPA is fixed.

This needs:

- the implementation to implement an additional BGP instance check
  alongside ASPA validation, and generate specific alerts
- the operator to actually catch these alerts and deploy a customer
  notification tool which would be completely dormant for most of the
  time
- the provider of the erring customer to actually deploy ASPA at all.

This is what I call bending backwards, but on the operator side.

> Maria and I agreed earlier that the combination of the existing ASPA-based 
> path verification at ingress and the OTC procedure [RFC 9234] eliminate the 
> need for egress verification. Especially, when there is a supplementary 
> procedure (as described above) to remedy the omission error in the direct 
> customer’s ASPA.

I was, at least, very clear that I consider this very much suboptimal.

What may work is this, not running on ASPA verification but as an
auxiliary BGP session check.

- During BGP session initiation, both parties MUST check whether either:
  - the Customer has no ASPA record, or
  - their SPAS includes the Provider's AS.
  If the check fails, the BGP session MUST be terminated immediately.
- For any established BGP session, the check MUST be repeated any time
  the appropriate SPAS changes, appears or disappears. The session
  SHOULD be terminated immediately if the condition is not met anymore.
  If not terminated, the operators SHOULD resolve the issue as soon as
  possible to prevent possible ASPA Invalids being spread out.

In the end, considering the scenarios described by RFC 4264 in
conjunction with ASPA-Role discrepancy, I stand very firmly on the side
that the egress check is not only a much better option but also much
easier to implement, deploy and ultimately debug in production.

I'm willing to update the draft myself if the current authors lack time
or energy to do that.

Have a nice day!  
Maria

-- 
Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.

_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[GROW] Re: Question: How best to deal with network operator error in creation of ASPA?

Reply via email to