On Thu, Jul 8, 2021 at 8:32 AM Saku Ytti <s...@ytti.fi> wrote: > > On Thu, 8 Jul 2021 at 15:00, Vanbever Laurent <lvanbe...@ethz.ch> wrote: > > > Detecting whole-link and node failures is relatively easy nowadays (e.g., > > using BFD). But what about detecting gray failures that only affect a > > *subset* of the traffic, e.g. a router randomly dropping 0.1% of the > > packets? Does your network often experience these gray failures? Are they > > problematic? Do you care? And can we (network researchers) do anything > > about it?” > > Network experiences gray failures all the time, and I almost never > care, unless a customer does. If there is a network which does not > experience these, then it's likely due to lack of visibility rather > than issues not existing. >
I think that some of it depends on the type of failure -- for example, some devices hash packets across an internal switch fabric, and so the failure manifests as persistent issues to a specific 5-tuple (or between a pair of 5-tuples). If this affects one in a thousand flows it is likely more annoying than one in a thousand random packets being dropped. But, yes, all networks drop some set of packets some percentage of the time (cue the "SEU caused by cosmic rays" response :-)) W > Fixing these can take months of working with vendors and attempts to > remedy will usually cause planned or unplanned outages. So it rarely > makes sense to try to fix as they usually impact a trivial amount of > traffic. > > Networks also routinely mangle packets in-memory which are not visible > to FCS check. > > -- > ++ytti -- The computing scientist’s main challenge is not to get confused by the complexities of his own making. -- E. W. Dijkstra