On 5/12/23 03:28, Stuart Henderson wrote:
On 2023-05-12, Nick Holland <n...@holland-consulting.net> wrote:
Here's the problem I've seen:  I have my two machines flipping state
randomly(?).  This bothers me because that means it is breaking  people's
downloads.  Longest period betweek flips was less than two weeks.

So ... I cranked up the carp logging to 5 and then 7 to see what it had
to say about why...and it had almost nothing to say.

Does netstat -s -p carp give any enlightenment?


ok, I just skewed the stats by taking the opportunity to bring the now
backup up to -current, so node1 does not have the most recent flap:

node1 $ uptime
 7:18AM  up  8:22, 1 user, load averages: 0.00, 0.05, 0.08

node1 $ doas netstat -s -p carp
carp:
        29981 packets received (IPv4)
        0 packets received (IPv6)
                0 packets discarded for bad interface
                0 packets discarded for wrong TTL
                0 packets shorter than header
                0 discarded for bad checksums
                0 discarded packets with a bad version
                0 discarded because packet too short
                0 discarded for bad authentication
                0 discarded for unknown vhid
                0 discarded because of a bad address list
        0 packets sent (IPv4)
        0 packets sent (IPv6)
                0 send failed due to mbuf memory error
        0 transitions to master

 node2 $ uptime
 7:19AM  up 4 days, 20:58, 2 users, load averages: 0.83, 0.78, 0.73

$ ] netstat -s -p carp
carp:
        367836 packets received (IPv4)
        0 packets received (IPv6)
                0 packets discarded for bad interface
                0 packets discarded for wrong TTL
                0 packets shorter than header
                0 discarded for bad checksums
                0 discarded packets with a bad version
                0 discarded because packet too short
                0 discarded for bad authentication
                0 discarded for unknown vhid
                0 discarded because of a bad address list
        52806 packets sent (IPv4)
        0 packets sent (IPv6)
                0 send failed due to mbuf memory error
        2 transitions to master


Will monitor going forward, though.


I had several other people suggest network problems.  I'm not going to
say "impossible" or even "unlikely", but my understanding is that the
two machines are both plugged into the same switch, in the same rack.

Several people pointed out I was using the default advskew of 1 second,
which means a small network glitch (or system load?  maybe I'm all wrong
about this system never breaking a sweat, at least when it comes to
network traffic) would flip it, so I've increased it to 10 on both
machines (and apparently just induced a flip of my own. oops).  By the
nature of this system, some people will be annoyed by any flip, so it
really doesn't matter if it was a 1 second outage or a 30 second outage,
I just want the system available again after an unhappy event (or
routine maintenance).

Nick.

Reply via email to