Re: Massive filesystem corruption since kernel 5.2 (ARCH)

Chris Murphy Tue, 30 Jul 2019 13:16:02 -0700

On Tue, Jul 30, 2019 at 2:09 AM Swâmi Petaramesh <sw...@petaramesh.org> wrote:
>
> On 7/29/19 9:10 PM, Chris Murphy wrote:
> > We've discussed many times how both file system repair, and file
> > system restore from backup, simply are not scalable for big file
> > systems. It takes too long.
>
> So what would be the solution ?


There presently is no solution, and I'm not aware of the future plan
either. I think it's a problem.

>
> IMHO yes, having to full backup then reformat then full restore is
> impractical for big FSes. Especially if they have a lot of subvols.
>
> Also most private individuals do not have enough disks to perform a full
> backup of their RAID NAS, etc.

I sympathize with the lack of resources. But no full disk backup
simply cannot be taken seriously in any computer science context. The
data cannot be that important by the user's own estimation if there
aren't backups. It's reasonable for resource limitations to have a
subset of data backed up. But if none of it is *shrug* there just
aren't that many people who will sympathize with data loss if there
are no backups.

Backup+restore is for sure a Byzantine work around for the data
storage problem, but you have no idea what will fail or what will
fail. There's not a file system list on earth that will tell you it's
OK to not have backups.


> I believe that we should have a repair tool that can fix a filesystem
> metadata and make it clean and usable again even if this is at the cost
> of losing a whole directory tree or subvols or whatever.

So far that isn't how it works. I don't know if it's a limitation of
the on disk format. Or a limitation on reconstructing from incorrect
information, even though the checksum is correct.


> But it would be better to lose clearly identified things and resume with
> a working FS and a list of files to be restored, rather than being
> unable to repair and having to reformat everything and restore everything...

Yep. That doesn't exist yet and I don't know if that's a design goal
of Btrfs eventually.

ZFS meanwhile has no repair tool. If it becomes inconsistent, that's
it, recreate the file system.

If your use case policy requires a repair tool, you really have to
disqualify both ZFS and Btrfs because the Btrfs repair tool is still
marked in the man page as dangerous. I just cannot take repair of
Btrfs seriously when Btrfs developers consider it dangerous on a case
by case basis.

It's always the case with any file system that a clean reproducer has
the best chance of getting developer attention. This is not easy. Part
of practical best practice is having a bulk of systems on some very
stable operating system with well maintained stable, or actively
maintained long term kernels. And to have some smaller percentage of
machines to test mainline kernels on. It might be annoying and
tedious, and definitely bad and a bug, to have a problem. But at least
your problem is restricted to your test machines.

There isn't enough history here to piece together with any certainty
why you're experiencing what you're experiencing beyond what Qu has
already stated.

-- 
Chris Murphy

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

Reply via email to