We have an automated system for making regular (roughly hourly) snapshots of some especially important filesystems where we want fast restores. This has been running smoothly for some time and without problems. However, starting this week we have twice gone to do from-snapshot restores on one of the filesystems involved and discovered that almost all of the snapshots are mysteriously missing.
By 'missing' I mean that they aren't present in either <fs>/.zfs/snapshots or in 'zfs list -r -t all <fs>', which as far as I know means they don't exist at all. By 'mysteriously' I mean that not only did the snapshot-making process not report any errors to us, but 'zpool history' reports that the snapshot commands happened and there were no matching snapshot deletions. In addition this has only been happening on one of the filesystems that gets snapshots; all of the other ones (which are all in the same pool) have everything present. 'zpool history' recent output for the filesystem is: 2017-01-06.06:10:01 zfs snapshot fs0-admin-02/h/105@Fri-06 2017-01-06.07:10:01 zfs snapshot fs0-admin-02/h/105@Fri-07 2017-01-06.08:10:01 zfs snapshot fs0-admin-02/h/105@Fri-08 2017-01-06.09:10:01 zfs snapshot fs0-admin-02/h/105@Fri-09 2017-01-06.10:10:01 zfs snapshot fs0-admin-02/h/105@Fri-10 2017-01-06.11:10:01 zfs snapshot fs0-admin-02/h/105@Fri-11 2017-01-06.12:10:01 zfs snapshot fs0-admin-02/h/105@Fri-12 2017-01-06.13:10:01 zfs snapshot fs0-admin-02/h/105@Fri-13 2017-01-06.14:10:01 zfs snapshot fs0-admin-02/h/105@Fri-14 2017-01-06.15:10:01 zfs snapshot fs0-admin-02/h/105@Fri-15 2017-01-06.16:10:01 zfs snapshot fs0-admin-02/h/105@Fri-16 2017-01-06.16:45:55 zfs snapshot fs0-admin-02/h/105@Fri-16 The actual snapshots present in the pool are: NAME USED AVAIL REFER MOUNTPOINT fs0-admin-02/h/105@Fri-15 604M - 343G - fs0-admin-02/h/105@Fri-16 23.4M - 343G - (The second @Fri-16 snapshot was made when we discovered that the first one was missing.) As far as I can tell from 'zpool history', no errant broad 'zfs destroy' operations have been done against the pool that might have swept up these snapshots as a side effect. (I don't think thet's even possible, but ...) (Also, because of how our automation for this operates, I'm confident that none of the @Fri-NN snapshots existed before they were nominally created. If they had appeared in eg 'zfs list' output, the automation would have deleted them before trying to recreate them.) The fileserver in question has not suffered a power failure or crash since before this started happening. Does anyone have any idea what could be happening here? For example, is there some way where snapshots can be removed without that being logged in 'zpool history'? (I'm scrubbing the pool now, so far without errors.) Thanks in advance. - cks PS: we're on OmniOS r151014, kernel rev omnios-f090f73. Yes, I know, it's old. We like stability whenever possible, and testing & mostly qualifying upgrades takes a lot of work. (We can't be sure an upgrade works until we're running it in production, either; we can't reproduce production loads and stresses in testing.) _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss