I can't reproduce the problem given your file system image.  Given
your description, this is almost certainly operator error.

> $ e2fsck -f   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> e2fsck: MMP: e2fsck being run while checking MMP block
> MMP check failed: If you are sure the filesystem is not in use on any node, 
> run:
> 'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'
> MMP_block:
>     mmp_magic: 0x4d4d50
>     mmp_check_interval: 10
>     mmp_sequence: e24d4d50
>     mmp_update_date: Wed Mar  3 14:51:38 2021
>     mmp_update_time: 1614779498
>     mmp_node_name: tarta
>     mmp_device_name: /dev/zvol/filling/store/nabijacz

What this message means is that some *other* node was trying to run
fsck on the node at the same time as your e2fsck run.

The key in this message is 

MMP check failed: If you are sure the filesystem is not in use on any node, run:
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When you then run 

'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'

It forcibly clears the MMP block.  The MMP protects the file system
from simulatenous modification by more than one system.  Given that
you you had this file system on some kind of remote block device
(which presumably is why you were using the multi-mount protection
feature in the first place), forcibly overriding the MMP protection is
a bad thing.  It's the functional equivalent of turning off the gun's
safety, and then aiming the gun at your foot, and pulling the trigger.

> 
> $ tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test
> tune2fs 1.46.2 (28-Feb-2021)
> 
> $ e2fsck -fy   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> ext2fs_open2: Superblock checksum does not match superblock
> e2fsck: Superblock invalid, trying backup blocks...
> Superblock has invalid MMP magic.  Fix? yes

The bad checksum and the invalid MMP magic means that some other
system was also modifying the file system while tune2fs was clearing
the MMP block.

If I run the same set of commands in your logs after unpacking your image:

% unzstd rbd13.e2i.qcow2.zst
% e2image -r rbd13.e2i.qcow2 rbd13
% truncate -s 4T rbd13  # This "expands" the file to 4TB

... and then running the same set of commands using "rbd13" instead of
"/dev/zvol/filling/store/...", it works just fine.  But on the local
file, obviously there won't be any other system or node modifying the
file system, and there is no corruption.


I do see some problems in how resize2fs handles MMP devices.  In
particular resize2fs doesn't check the MMP block.  So if there is some
other process messing with the file system, the result can be file
system corruption.  But you really shouldn't have been trying to
resize the file system if someone else is trying to use the file
system.

And even if you do this, if you run "tune2fs -f -E clear_mmp ..." and
something else is using the file system, you're going to be doomed,
anyway.

I suppose we can add some "are you sure?  ARE YOU REALLY SURE?"
question to "tune2fs -f -E clear_mmp", and maybe even force the user
to type the string "I AGREE TO ASSUME RESPONSIBILITY FOR FILE SYSTEM
DESTRUCTION IF OTHER NODES ARE USING, MOUNTING, OR MODIFYING THE FILE
SYSTEM", but at the end of the day, there is only so much we can do
protect against PEBCAK failures....

                                        - Ted

Reply via email to