Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey

Warren Kumari Thu, 08 Jul 2021 10:20:55 -0700

On Thu, Jul 8, 2021 at 8:32 AM Saku Ytti <s...@ytti.fi> wrote:
>
> On Thu, 8 Jul 2021 at 15:00, Vanbever Laurent <lvanbe...@ethz.ch> wrote:
>
> > Detecting whole-link and node failures is relatively easy nowadays (e.g., 
> > using BFD). But what about detecting gray failures that only affect a 
> > *subset* of the traffic, e.g. a router randomly dropping 0.1% of the 
> > packets? Does your network often experience these gray failures? Are they 
> > problematic? Do you care? And can we (network researchers) do anything 
> > about it?”
>
> Network experiences gray failures all the time, and I almost never
> care, unless a customer does. If there is a network which does not
> experience these, then it's likely due to lack of visibility rather
> than issues not existing.
>


I think that some of it depends on the type of failure -- for example,
some devices hash packets across an internal switch fabric, and so the
failure manifests as persistent issues to a specific 5-tuple (or
between a pair of 5-tuples). If this affects one in a thousand flows
it is likely more annoying than one in a thousand random packets being
dropped.

But, yes, all networks drop some set of packets some percentage of the
time (cue the "SEU caused by cosmic rays" response :-))

W


> Fixing these can take months of working with vendors and attempts to
> remedy will usually cause planned or unplanned outages. So it rarely
> makes sense to try to fix as they usually impact a trivial amount of
> traffic.
>
> Networks also routinely mangle packets in-memory which are not visible
> to FCS check.
>
> --
>   ++ytti



-- 
The computing scientist’s main challenge is not to get confused by the
complexities of his own making.
  -- E. W. Dijkstra

Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey

Reply via email to