I agree wholeheartedly with your point, David.

One other clarifying point (I'm not trying to be pedantic, here, but it may 
sound that way):

Reliability is not the same as Availability.  The two are quite different.

 Bufferbloat is pretty much an "availability" issue, not a reliability issue.  
In other words, packets are not getting lost.  The system is just preventing 
desired use.

Availability issues can be due to actual failures of components, but there are 
lots of availability issues that are caused (as you suggest) by attempts to 
focus narrowly on "loss of data" or "component failures".

When you build a system, there is a temptation to apply what is called the 
Fallacy of Composition (look it up on Wikipedia for precise definition).  The 
key thing in the Fallacy of Composition is that when a system of components has 
a property as a whole, then every component of the system must by definition 
have that property.

(The end-to-end argument is a specific rule that is based on a recognition of 
the Fallacy of Composition in one case.)

We all know that there is never a single moment when any moderately large part 
of the Internet does not contain failed components.  Yet the Internet has 
*very* high availability - 24x7x365, and we don't need to know very much about 
what parts are failing.  That's by design, of course. And it is a design that 
does not derive its properties from a trivial notion of "proof of correctness", 
or even "bug freeness"

The relevance of a "failure" or even a "design flaw" to system availability is 
a matter of a much bigger perspective of what the system does, and what its 
users perceive as to whether they can get work done.




On Tuesday, March 17, 2015 3:30pm, "David Lang" <da...@lang.hm> said:

> On Tue, 17 Mar 2015, Dave Taht wrote:
> 
>> My quest is always for an extra "9" of reliability. Anyplace where you can
>> make something more robust (even if it is out at the .9999999999) level, I
>> tend to like to do in order to have the highest MTBF possible in
>> combination with all the other moving parts on the spacecraft (spaceship
>> earth).
> 
> There are different ways to add reliability
> 
> one is to try and make sure nothing ever fails
> 
> the second is to have a way of recovering when things go wrong.
> 
> 
> Bufferbloat came about because people got trapped into the first mode of
> thinking (packets should never get lost), when the right answer ended up being
> to realize that we have a recovery method and use it.
> 
> Sometimes trying to make sure nothing ever fails adds a lot of complexity to 
> the
> code to handle all the corner cases, and the overall reliability will improve 
> by
> instead simplify normal flow, even if it add a small number of failures, if 
> that
> means that you can have a common set of recovery code that's well excercised 
> and
> tested.
> 
> As you are talking about loosing packets with route changes, watch out that 
> you
> don't fall into this trap.
> 
> David Lang
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 


_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to