We have a similar gray issue, where switches in a virtual chassis
configuration with layer3-configuration seem to lose transit ICMP
messages like echo or echo-reply randomly. Once we estimated it around
0.00012% ( let alone variances, or errors in measuring ).
We noticed this when we replaced Nagios with some more bursting,
trigger-happy monitoring software a few years back. Since then, it's
reporting false positives from time to time, and this can become
annoying.
Besides spending a lot of time debugging this, we never had a
breakthrough in finding the root cause, just looking to replace things
in the next year.
On 8 Jul 2021, at 15:28, Mark Tinka wrote:
On 7/8/21 15:22, Vanbever Laurent wrote:
Did you folks manage to understand what was causing the gray issue in
the first place?
Nope, still chasing it. We suspect a FIB issue on a transit device,
but currently building a test to confirm.
Mark.