Reviewed by: Matthew Ahrens <mahr...@delphix.com>
Reviewed by: Dan Kimmel <dan.kim...@delphix.com>

When the system hibernates and restarts, the counter that it uses to
measure time gets reset to nearly zero. As a result, in the clock
subsystem, we add the counter's value to the current time if the counter
goes backwards by more than a second or two.

Unfortunately, when running on VMWare, sometimes VMWare does a bad thing
and sends the counter backwards by more than that in the course of
normal operations. As a result, we end up adding a time almost as large
as the current uptime to the clock, resulting in the uptime of the
system suddenly doubling and the clock being off by days or weeks.

This can cause a variety of problems; one of them is that it may cause
the deadman subsystem to trigger, thinking that the system has been
unresponsive for a long time.

The fix to this problem is to change the way we handle sudden jumps
backwards in time; if the counter jumps backwards a lot, but is still
larger than some small value (a second or two), we should not add it to
the current time; instead, we decide that this jump is probably a result
of VMWare's glitch, and we don't add to the time until we start getting
reliable readings again.
You can view, comment on, or merge this pull request online at:

  https://github.com/openzfs/openzfs/pull/67

-- Commit Summary --

  * 6641 deadman fires spuriously when running on VMware

-- File Changes --

    M usr/src/uts/i86pc/os/timestamp.c (46)

-- Patch Links --

https://github.com/openzfs/openzfs/pull/67.patch
https://github.com/openzfs/openzfs/pull/67.diff

---
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/67
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to