On 4/17/23 21:28, José Pérez wrote:
Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we recover from the worst case scenario which is a machine with a kernel > 2a58b312b62f and ZFS root upgraded with block cloning enabled.

In particular, is it safe to turn such a machine on in the first place, and what are the risks involved in doing so? Any potential data loss?

Would such a machine be able to fix itself by compiling a kernel, or would compilation fail and might data be corrupted in the process?

I have two poudriere builders powered off (I am not alone in this situation) and I need to recover them, ideally minimizing data loss. The builders are also hosting current and used to build kernels and worlds for 13 and current: as of now all my production machines are stuck on the 13 they run, I cannot update binaries nor packages and I would like to be back online.

José,

I can only speak of block cloning in details, but I'll try to address everything.

The easiest way to avoid block_cloning-related corruption on the kernel after the last OpenZFS merge, but before e0bb199925 is to set the compress property to 'off' and the sync property to something other than 'disabled'. This will avoid the block_cloning-related corruption and zil_replaying() panic.

As for the other corruption, unfortunately I don't know the details, but my understanding is that it is happening under higher load. Not sure I'd trust a kernel built on a machine with this bug present. What I would do is to compile the kernel as of 068913e4ba somewhere else, boot the problematic machine in single-user mode and install the newly built kernel.

As far as I can tell, contrary to some initial reports, none of the problems introduced by the recent OpenZFS merge corrupt the pool metadata, only file's data. You can locate the files modified with the bogus kernel using find(1) with a proper modification time, but you have to decide what to do with them (either throw them away, restore them from backup or inspect them).

--
Pawel Jakub Dawidek


Reply via email to