Re: clean on-disk filesystems through {suspend,hibernate}/resume
> > BTW, if anyone uses softdep *you have to tell me*, and then try > > to repeat problems you encounter without softdep. That is a > > totally different problem set. > > Yes, I am using softdep. I am not concerned with the softdep case. softdep needs a maintainer, and it isn't me. I'll provide hints for how to debug this though: First apply the following diff to the tree. This will keep the screen alive during the suspend cycle. On some inteldrm chipsets it will fail to resume afterwards, however. On x230 this works, newer models cannot handle this hack. Anyways the goal is is to observe why it isn't suceeding at completing the suspend sync. Having the screen alive makes it possible to add printf's to the ffs softdep code, in particular softdep_sync_metadata() and such functions. Figure out what the code is doing keeping so busy. Why does it keep doing IO? Is it writing data blocks for files? Is it repeatedly updating the same metadata? For this suspend case, the sync functions are being called with various _WAIT flags instead of _NOWAIT or _LAZY. It is being asked to achieve stability. What stops it from achieving stability? When you read the code in the area you'll be shocked at the comments. Try to figure out which cases are occurring. Anyone with rudimentary C skills and patience can do this. (But I won't be doing it, I have other things to do) Index: i915_drv.c === RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v retrieving revision 1.108 diff -u -p -u -r1.108 i915_drv.c --- i915_drv.c 30 Sep 2017 07:36:56 - 1.108 +++ i915_drv.c 21 Dec 2017 05:52:54 - @@ -673,6 +673,8 @@ static int i915_drm_suspend(struct drm_d pci_power_t opregion_target_state; int error; + return 0; + /* ignore lid events during suspend */ mutex_lock(&dev_priv->modeset_restore_lock); dev_priv->modeset_restore = MODESET_SUSPENDED; @@ -745,6 +747,8 @@ static int i915_drm_suspend_late(struct { struct drm_i915_private *dev_priv = drm_dev->dev_private; int ret; + + return 0; ret = intel_suspend_complete(dev_priv);
Re: clean on-disk filesystems through {suspend,hibernate}/resume
Hi, * Theo de Raadt wrote: > > BTW, if anyone uses softdep *you have to tell me*, and then try > to repeat problems you encounter without softdep. That is a > totally different problem set. Yes, I am using softdep. For testing, I removed softdep and performed all tests again and run the "extract src.tar while suspending" multiple times both on /tmp and /home. Now, the suspend process was quite fast and the file systems were marked clean every time. Cheers Matthias
Re: clean on-disk filesystems through {suspend,hibernate}/resume
> 4. Now the interesting case. Basically the same as (2) but now I > extracted src.tar.gz not in /tmp but in /home which is my largest > partition. This time, the suspend process does not finished and I > pulled the plug after some time. I've heard a report or two of it not completing sync. I don't know yet what causes this situation. BTW, if anyone uses softdep *you have to tell me*, and then try to repeat problems you encounter without softdep. That is a totally different problem set.
Re: clean on-disk filesystems through {suspend,hibernate}/resume
Hi, * Theo de Raadt wrote: > > I would appreciate reports, and later I'll cut this into pieces and > commit incremental changes. I run four tests on an Intel NUC with softraid CRYPTO and a keydisk. Although the sync+suspend does not finish in one test it is definitely am improvement and your work is highly appreciated! Cheers Matthias 1. zzz after manual fsync and pulled the plug after suspend. Works as expected and no softraid errors. Jan 6 20:59:21 tau /bsd: /var force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr/src force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr/ports force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr/obj force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr/local force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr/X11R6 force clean (0 0): fmod 1 clean 1 Jan 6 20:59:21 tau /bsd: /usr force clean (0 0): fmod 1 clean 1 Jan 6 20:59:22 tau /bsd: /tmp force clean (0 0): fmod 1 clean 1 Jan 6 20:59:22 tau /bsd: /home force clean (0 0): fmod 1 clean 1 Jan 6 20:59:22 tau /bsd: / force clean (0 0): fmod 1 clean 1 2. zzz while extracting src.tar.gz on /tmp and pulled the plug after suspend. Same result, works as expected. 3, ZZZ after manual fsync. Same result, works as expected. 4. Now the interesting case. Basically the same as (2) but now I extracted src.tar.gz not in /tmp but in /home which is my largest partition. This time, the suspend process does not finished and I pulled the plug after some time. My /home partition was extremely dirty but the others were marked as clean. So definitely an improvement over the current situation. /dev/sd2l: SIZE=762 MTIME=Sep 14 16:04 2016 (RECONNECTED) /dev/sd2l (1cae2f5f79b7f28f.l): UNREF FILE I=11553631 OWNER=xhr MODE=100644 [ Hundreds of unferenced files ] /dev/sd2l (1cae2f5f79b7f28f.l): FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED) /dev/sd2l (1cae2f5f79b7f28f.l): SUMMARY INFORMATION BAD (SALVAGED) /dev/sd2l (1cae2f5f79b7f28f.l): BLK(S) MISSING IN BIT MAPS (SALVAGED) /dev/sd2l (1cae2f5f79b7f28f.l): 176550 files, 13148621 used, 91136441 free (88761 frags, 11380960 blocks, 0.1% fragmentation) /dev/sd2l (1cae2f5f79b7f28f.l): MARKING FILE SYSTEM CLEAN /dev/sd2d (1cae2f5f79b7f28f.d): file system is clean; not checking /dev/sd2f (1cae2f5f79b7f28f.f): file system is clean; not checking /dev/sd2g (1cae2f5f79b7f28f.g): file system is clean; not checking /dev/sd2h (1cae2f5f79b7f28f.h): file system is clean; not checking /dev/sd2k (1cae2f5f79b7f28f.k): file system is clean; not checking /dev/sd2j (1cae2f5f79b7f28f.j): file system is clean; not checking /dev/sd2i (1cae2f5f79b7f28f.i): file system is clean; not checking /dev/sd2e (1cae2f5f79b7f28f.e): file system is clean; not checking Jan 6 21:09:25 tau /bsd: /var force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr/src force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr/ports force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr/obj force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr/local force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr/X11R6 force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /usr force clean (0 0): fmod 1 clean 1 Jan 6 21:09:25 tau /bsd: /tmp force clean (0 0): fmod 1 clean 1 Both / and /home are missing here and they were both marked as dirty. Here my disklabel as reference: # /dev/rsd2c: type: SCSI disk: SCSI disk label: SR CRYPTO duid: 1cae2f5f79b7f28f flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 58368 total sectors: 937697393 boundstart: 64 boundend: 937681920 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] a: 2097152 64 4.2BSD 2048 16384 12958 # / b: 33820888 2097216swap# none c:9376973930 unused d: 4194304 35918112 4.2BSD 2048 16384 12958 # /tmp e: 4194304 40112416 4.2BSD 2048 16384 12958 # /var f: 4194304 44306720 4.2BSD 2048 16384 12958 # /usr g: 2097152 48501024 4.2BSD 2048 16384 12958 # /usr/X11R6 h: 20971520 50598176 4.2BSD 2048 16384 12958 # /usr/local i: 8388608 71569696 4.2BSD 2048 16384 12958 # /usr/src j: 8399168 79958304 4.2BSD 2048 16384 12958 # /usr/ports k: 8385952 88357472 4.2BSD 2048 16384 12958 # /usr/obj l:840938496 96743424 4.2BSD 4096 32768 26062 # /home OpenBSD 6.2-current (GENERIC.MP) #0: Sat Jan 6 20:02:16 CET 2018 x...@tau.xosc.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17047859200 (16258MB) avail mem = 16524271616 (15758MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b1d5000 (58 entries) bios0: vendor Intel Corp. v
clean on-disk filesystems through {suspend,hibernate}/resume
I've been working for about a month to ensure filesystems are maximally syncronized and/or clean on-disk through a suspend/resume cycle. The idea is if a suspend/resume or hibernate/resume sequence gets broken (by pulling the power+battery during suspend, or similar circumstances during the hiberate-write sequence), we can be assured that the filesystems are in the best shape. And if done correctly, we'll even have marked-clean filesystems which don't need a fsck, so that fresh boot is faster. There is also a similar case when softraid (layers) underly the filesystems. These layers need proper syncronization to disk also. Previously we've been ignoring this issue, and frankly we've done mostly fine... The changes starts with a series of changes to suspend. It is a bit tricky to syncronize the in-memory soft-state of the fileystems to disk, and block new in-memory changes from happening. New allocations of vnodes are caused to sleep-spin, so that other processes cannot advance creating new files. All mountpoints are told to non-lazy sync their filesystems and locks are held on these mountpoints so that no new activity can occur. During this phase, the number of dangling inodes (nlink == 0) is counted, and if any are found the on-disk filesystem is marked dirty, otherwise marked clean. Next, softraid can be told to save it's state, but it uses vnodes so a hack allows it to bypass the sleep-spin mentioned earlier. Once the suspend code knows there are no more tsleep, it can unwind the mount locks so there is less to worry about upon resume. I would appreciate reports, and later I'll cut this into pieces and commit incremental changes. Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.335 diff -u -p -u -r1.335 acpi.c --- dev/acpi/acpi.c 29 Nov 2017 22:51:01 - 1.335 +++ dev/acpi/acpi.c 5 Jan 2018 17:29:37 - @@ -30,6 +30,8 @@ #include #include #include +#include +#include #ifdef HIBERNATE #include @@ -61,6 +63,7 @@ #include "wd.h" #include "wsdisplay.h" +#include "softraid.h" #ifdef ACPI_DEBUG intacpi_debug = 16; @@ -2438,11 +2441,15 @@ int acpi_sleep_state(struct acpi_softc *sc, int sleepmode) { extern int perflevel; + extern int vnode_sleep; extern int lid_action; int error = ENXIO; size_t rndbuflen = 0; char *rndbuf = NULL; int state, s; +#if NSOFTRAID > 0 + extern void sr_quiesce(void); +#endif switch (sleepmode) { case ACPI_SLEEP_SUSPEND: @@ -2481,8 +2488,12 @@ acpi_sleep_state(struct acpi_softc *sc, #ifdef HIBERNATE if (sleepmode == ACPI_SLEEP_HIBERNATE) { - uvmpd_hibernate(); + /* +* Discard useless memory, then attempt to +* create a hibernate work area +*/ hibernate_suspend_bufcache(); + uvmpd_hibernate(); if (hibernate_alloc()) { printf("%s: failed to allocate hibernate memory\n", sc->sc_dev.dv_xname); @@ -2495,18 +2506,38 @@ acpi_sleep_state(struct acpi_softc *sc, if (config_suspend_all(DVACT_QUIESCE)) goto fail_quiesce; - bufq_quiesce(); - #ifdef MULTIPROCESSOR acpi_sleep_mp(); #endif + vnode_sleep = 1; + vfs_stall(curproc, 1); +#if NSOFTRAID > 0 + sr_quiesce(); +#endif + bufq_quiesce(); + +#ifdef HIBERNATE + if (sleepmode == ACPI_SLEEP_HIBERNATE) { + /* +* VFS syncing churned lots of memory; so discard +* useless memory again, hoping no processes are +* still allocating.. +*/ + hibernate_suspend_bufcache(); + uvmpd_hibernate(); + } +#endif /* HIBERNATE */ + resettodr(); s = splhigh(); disable_intr(); /* PSL_I for resume; PIC/APIC broken until repair */ cold = 2; /* Force other code to delay() instead of tsleep() */ + vfs_stall(curproc, 0); + vnode_sleep = 0; + if (config_suspend_all(DVACT_SUSPEND) != 0) goto fail_suspend; acpi_sleep_clocks(sc, state); @@ -2568,6 +2599,7 @@ fail_suspend: #endif bufq_restart(); + wakeup(&vnode_sleep); fail_quiesce: config_suspend_all(DVACT_WAKEUP); @@ -2588,6 +2620,8 @@ fail_alloc: wsdisplay_resume(); rw_enter_write(&sc->sc_lck); #endif /* NWSDISPLAY > 0 */ + + sys_sync(curproc, NULL, NULL); /* Restore hw.setperf */ if (cpu_setperf != NULL) Index: dev/softraid.c === RCS file: /cvs/src/sys/dev/softraid.c,v retrieving revision 1.389 diff -u -p -u -r1.389 softraid.c --- dev/softraid.c 21 Dec 2017 07:29:15 - 1.389 +++ dev/softraid.c 6