RE: MINIMIZE BACKOFF SPF two technical points

bruno.decraene Mon, 28 Jul 2014 03:42:19 -0700

Hi Antoni,

Thanks for your feedback. More inlined.


> From: rtgwg [mailto:[email protected]] On Behalf Of Antoni  Przygienda > 
> Sent: Thursday, July 24, 2014 11:06 PM
> 
> As first, I'm supportive for the work & I think it's of solid applicable value

Thank you.

> albeit it's strictly not IETF territory (it's not necessary for interop 
> strictly
> speaking).
> 
> First is very blunt:  if you manage to really make all the routers in the area
> compute @ precisely the same time, you may not be doing yourself the favor
> you seek ;-)  What I mean is that generating perfectly synchronized peaks in a
> network tends to generate strange attractors, a good example was the
> synchronization of the HELLOs on all links over time that had to be jittered.
> Peaks can stress infra unexpectedly & lead to e.g. synchronized re-
> advertisement of LSAs  (or anything that SPF can trigger now and in the
> future).  Given on top that an SPF in the future is not necessarily the 2-3
> msec SPF seen today (rLFA & such runs seem to become the new flavor of
> SPF) I suggest to include a small configurable jitter before the first SPF is
> triggered (couple msecs should do the trick but I'm willing to hear the
> argument that flooding de-sync's the SPF runs enough already).

Interesting feedback. We'll try to keep it in mind.
FYI, note that so far we mainly had the opposite feedback: "You'll never manage 
to have a perfect synchronization. e.g. CPU scheduler delay between routers)

 
> The other issue is far more subtle but may merit a section in the draft.  This
> work is pushing the protocol in a very specific direction along the CAP
> paradigm, i.e. a link-state routing protocol is roughly
> 
> 1. Always 100% P (partitioned)
> 2. Basically 100% available  A  (tad hard to define given FIBs) 3. 
> _eventually_
> consistent C
> 
> Now, it is fairly well understood that having all 3 is not possible across 
> very
> wide set of CS problems and we are not exempt of that.  We cannot move
> P  so pushing on the C will cause A to move to the negative. Now, what do I
> mean by that.  

Well, I'm not an expert in theoretical computer science but:
- there seem be a debate on CAP itself
- I'm not sure it's applicable to SPF delay. In particular, SPF delay has no 
influence on the LSDB and hence in particular about its Consistency, 
Availability, and tolerance to Partition. Also, the proof of the CAP theorem 
seems to be limited to a replicated distributed system, while in LS IGP, only a 
single node is allowed to modify a given data (LSP/LSA) so we don't have the 
issue of a distributed system which gets partitioned and where we have 2 
simultaneous conflicting requests (asking for a different change).

>Triggering the SPFs more aggressively will give you better
> consistent&available in the scenario of a single link failure if things go 
> well.
> Now, compared to e.g. a batching algorithm that computes every 500 msecs
> without backing off and will show linear consistent&available even in case of
> fast-link flapping, many links failing consecutively and so on, exponential
> backoff will cause massively lower consistency after several link failure and
> this network-wide so certain people may loose big time when using that.
> Beside that, the quick SPFs can block lots of other things in the protocol 
> that
> are not parallelized or block other protocols waiting for SPFs to finish or 
> next
> SPF (2nd failure) stuck on FIB download running (all hypothetical, but
> availability in widest sense will go down if you see more consistency). Again,
> the work is good but the section will show people that it's not an 'universal'
> improvement but something triggered to ideally a seldom occurring 1 or 2
> links failure.

Let's step back a little bit on these 2 comments and in particular the CAP one.
The draft proposes to specify a common SPF delay specification. (full point).
At most (best IMHO, worst related to your CAP comment):
- only the SPF delay is changed
- all nodes uses the same spec.

I don't think it's possible to claim that there is a theoretical issue/risk, 
because a mono vendor AS (using the same implementation on all nodes) is going 
much further in term of consistent behavior. And I don't think that any vendor 
will say that in such circumstance, its implementation does not work or is more 
risky, and the AS MUST /SHOULD introduce another vendor.
Link State IGP do work even in mono-vendor network with all nodes running the 
same code and hence the same spec.

Thanks,
Bruno


> Thanks
> 
> --- tony
> 
> 
> 
> "FUTURE, n.
> That period of time in which our affairs prosper, our friends are true and our
> happiness is assured."
> ― Ambrose Bierce, The Unabridged Devil's Dictionary
> 

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

RE: MINIMIZE BACKOFF SPF two technical points

Reply via email to