Dear PCE working group,

I am working on a PCE for SR-TE and currently trying to implement dual-PCE
redundancy. I prefer to follow the RFC and not implement any proprietary
inter-PCE protocols unless absolutely necessary.

PCEP has been so far not a very standard-friendly protocol in my opinion,
because every vendor implements things differently and there is not much
interoperability. I managed to make single-PCE scenario somewhat works,
although there is different logic in handling state sync and LSP-ID state
for different vendors.

So my question is how dual PCE redundancy is supposed to work; let's say
for now for PCE-initiated LSP only. RFC8231#section-5.7
<https://datatracker.ietf.org/doc/html/rfc8231#section-5.7> describes the
procedures, but again vendors implement it differently.

Cisco delegates LSP to the PCE that initiated it, and reports to the other
PCE (without delegate flag). and in case this PCE fails, redelegates to the
second PCE and the LSP remain delegated to it forever.
Juniper also delegates to the primary PCE and reports to the other without
delegate flag, but if the primary PCE fails, Juniper does not redelegate,
instead it waits a bit and then removes the LSP. My understanding is that
it expects the secondary PCE to re-initiate the LSP at this point.

I haven't tested Nokia, Huawei and others but wouldn't be surprised if they
also have different ways of handling this.

So my question is, how to properly implement dual-PCE redundancy? Is there
a standard way or it relies on vendor-proprietary mechanisms?

I think about the following logic:


   1.

   When session comes up, wait until state sync (message with all zero's) +
   some timer; then
   1.

      If received PCReports with delegate flag (and some of those matching
      our LSP) -> assume PCC elected us as primary PCE; send PCUpdate for these
      delegated LSP, send PCInit for our LSP not seen in PCReports
      2.

      If received PCReports matching our LSP but without delegate flag ->
      assume PCC elected us as secondary PCE; do not send PCUpdate or PCInit
      3.

      If did not receive any PCReports matching our LSP - [this is a tricky
      part - we can’t send PCInit because if 2 PCE send it simultaneously, PCC
      will complain] - maybe start a random timer and try to PCInit
after it, so
      the other PCE will receive reports with our LSP?
      2.

   When PCC re-delegates LSP to us -> assume we are now elected as primary
   LSP, send PCUpdate for delegated LSP and PCInit for LSP not seen in
   PCReports
   3.

   When PCC removes LSP but they are still active on PCE side -> assume
   Juniper-style redundancy, send PCInit for those LSP


I think this will work based on my observations so far but I want to ask
PCE experts just in case - what is the IETF way and are there any
implementations that interoperate with different vendors? Maybe the more
intelligent approach would be implementing draft-ietf-pce-state-sync-09 but
from reading it I'm not sure that will work in all scenarios including
Juniper PCC. So it can add more problems with split brain while not solving
the original issue.

Please let me know what you think.


Regards,
Dmytro
_______________________________________________
Pce mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to