Hi,
I have two OpenBSD 6.9 servers: fw-1 (10.0.0.58) and fw-2 (10.0.0.59)
In last few days I got reports from our monitoring saying there is
packet loss to them.
So I tried to ping from fw-1 to fw-2:

fw-1$ ping -c 10 fw-2
PING fw-2 (10.0.0.59): 56 data bytes
64 bytes from 10.0.0.59: icmp_seq=0 ttl=255 time=0.533 ms
64 bytes from 10.0.0.59: icmp_seq=1 ttl=255 time=0.735 ms
64 bytes from 10.0.0.59: icmp_seq=2 ttl=255 time=0.517 ms
64 bytes from 10.0.0.59: icmp_seq=3 ttl=255 time=0.506 ms
64 bytes from 10.0.0.59: icmp_seq=4 ttl=255 time=0.609 ms
64 bytes from 10.0.0.59: icmp_seq=6 ttl=255 time=0.503 ms
64 bytes from 10.0.0.59: icmp_seq=7 ttl=255 time=0.479 ms
64 bytes from 10.0.0.59: icmp_seq=8 ttl=255 time=0.523 ms
64 bytes from 10.0.0.59: icmp_seq=9 ttl=255 time=0.507 ms

--- fw-2.snet.verza.net ping statistics ---
10 packets transmitted, 9 packets received, 10.0% packet loss
round-trip min/avg/max/std-dev = 0.479/0.546/0.735/0.075 ms

and tcpdump on fw-2 says it saw the icmp_seq=5 request but did not reply:

fw-2$ doas tcpdump -lnp -i trunk0 icmp and host 10.0.0.58
tcpdump: listening on trunk0, link-type EN10MB
11:56:13.087075 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:13.087094 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:14.092993 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:14.093005 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:15.092840 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:15.092851 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:16.092828 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:16.092839 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:17.092809 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:17.092822 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:18.092793 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:19.092776 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:19.092786 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:20.092726 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:20.092744 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:21.092756 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:21.092774 10.0.0.59 > 10.0.0.58: icmp: echo reply
11:56:22.092733 10.0.0.58 > 10.0.0.59: icmp: echo request
11:56:22.092743 10.0.0.59 > 10.0.0.58: icmp: echo reply

I can see the echo reply ICMP packet is missing from netstat stats as well:
fw-2$ netstat -ss -p icmp
icmp:
    101 calls to icmp_error
    Output packet histogram:
        echo reply: 40626
        destination unreachable: 101
        time stamp reply: 1
    Input packet histogram:
        echo reply: 247
        destination unreachable: 1
        echo: 40626
        time stamp: 1
        address mask request: 3
        #37: 1
    40627 message responses generated
..
10 ICMP requests
..
fw-2$ netstat -ss -p icmp
icmp:
    101 calls to icmp_error
    Output packet histogram:
        echo reply: 40635
        destination unreachable: 101
        time stamp reply: 1
    Input packet histogram:
        echo reply: 247
        destination unreachable: 1
        echo: 40635
        time stamp: 1
        address mask request: 3
        #37: 1
    40636 message responses generated

I've tried to disable pf but it did not have any impact.

Device trunk0 has two bnxt type interfaces.

Both servers are in place for years and both of them started to lose
packets in last few days.

How can I debug such problem please?

Disclaimer: ip addresses might have been changed to prevent information
leak as we are in audited environment.

Thanks,
Pavel Mateja


Reply via email to