On Sun, 20 Jan 2019, Bruce Evans wrote:

[iflib_media_change() is missing iflib_stop(), like iflib_resume() was]

I don't know what the media was after the broken resume.  Its reported
result can't be trusted anyway.  To recover from the broken resume, it
usually worked to repeat down/up a few times.  This is consistent with
bug -- eventually, previous down/up's change the state to close enough
to stopped.  But using the interface in any way (including pinging it
to see if it is still broken) makes it not so close to being stopped.

Further debugging after restoring the bug in resume:
- I use mainly zzz to suspend
- the bug usually doesn't break the interface if I copy zzz from nfs to
  non-nfs and use the copy.  This explains why almost no one except me
  noticed the bug -- zzz is usually not on nfs, and other nfs activity
  is usually lighter than mine too.  (Suspend apparently doesn't do enough
  stopping or syncing generally.  It should fsync() all files ...)
- the bug usually does break the interface if zzz is on nfs
- when the bug breaks the interface:
  - the media is reported as unchanged
  - after DUPs starting with a delay of many seconds and reducing by the
    ping interval of 1 second for each until the delay is less than 1
    second, the ping latency stabilizes at quite different values after
    each suspend/resume.  These values tend to be higher than for media
    change (several hundred ms instead of 76 ms).
  - my ifconfig excutable is one of several under /sbin which is not on nfs,
    but my ifconfig is actually a shell script in $HOME/bin; the script
    selects the correct version of ifconfig for the current kernel; it is
    on nfs, and uses utilties on nfs.  I sometimes forget this, and then
    running plain ifconfig to attempt to recover takes too long, and if I
    wait then the nfs activity for finding ifconfig not on nfs tends to
    propagate the broken interface (like zzz not on nfs breaks it).
    Manually selecting the correct version of ifconfig under /sbin and using
    it tends to work right (like zzz not on nfs).
  - even an mtu change is enough to recover.  This is not surprising, since
    it does slightly more than down/up as an implementation detail.  This
    shows that the reported media value is at least used by the reinit for
    the mtu change.
  - pinging the interface didn't make it active enough for the recovery to
    not usually work.

Bruce
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to