On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote:
> So the fs is REALLY corrupted.
*sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh*
another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually-
cause-data-loss...)

> BTW, lowmem mode seems to have a new false alert when checking the
> block 
> group item.

Anything you want to check me there?


> Did you have any "lightweight" method to reproduce the bug?
Na, not at all... as I've said this already happened to me once before,
and in both cases I was cleaning up old ro-snapshots.

At least in the current case the fs was only ever filled via
send/receive (well apart from minor mkdirs or so)... so there shouldn't
have been any "extreme ways" of using it.

I think (but not sure), that this was also the case on the other
occasion that happened to me with a different fs (i.e. I think it was
also a backup 8TB disk).


> For example, on a 1G btrfs fs with moderate operations, for example 
> 15min or so, to reproduce the bug?
Well I could try to produce it, but I guess you'd have far better means
to do so.

As I've said I was mostly doing send (with -p) | receive to do
incremental backups... and after a while I was cleaning up the old
snapshots on the backup fs.
Of course the snapshot subvols are pretty huge.. as I've said close to
8TB (7.5 or so)... everything from quite big files (4GB) to very small,
smylinks (no device/sockets/fifos)... perhaps some hardlinks...
Some refcopied files. The whole fs has compression enabled.


> > Shall I rw-mount the fs and do sync and wait and retry? Or is there
> > anything else that you want me to try before in order to get the
> > kernel
> > bug (if any) or btrfs-progs bug nailed down?
> 
> Personally speaking, rw mount would help, to verify if it's just a
> bug 
> that will disappear after the deletion is done.
Well but than we might loose any chance to further track it down.

And even if it would go away, it would still at least be a bug in terms
of fsck false positive.... if not more (in the sense of... corruptions
may happen if some affect parts of the fs are used while not cleaned up
again).


> But considering the size of your fs, it may not be a good idea as we 
> don't have reliable method to recover/rebuild extent tree yet.

So what do you effectively want now?
Wait and try something else?
RW mount and recheck to see whether it goes away with that? (And even
if, should I rather re-create/populate the fs from scratch just to be
sure?

What I can also offer in addition... as mentioned some times
previously, I do have full lists of the reg-files/dirs/symlinks as well
as SHA512 sums of each of the reg-files, as they are expected to be on
the fs respectively the snapshot.
So I can offer to do a full verification pass of these, to see whether
anything is missing or (file)data actually corrupted.

Of course that will take a while, and even if everything verifies, I'm
still not really sure whether I'd trust that fs anymore ;-)


Cheers,
Chris.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to