On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote: > So the fs is REALLY corrupted. *sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh* another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually- cause-data-loss...)
> BTW, lowmem mode seems to have a new false alert when checking the > block > group item. Anything you want to check me there? > Did you have any "lightweight" method to reproduce the bug? Na, not at all... as I've said this already happened to me once before, and in both cases I was cleaning up old ro-snapshots. At least in the current case the fs was only ever filled via send/receive (well apart from minor mkdirs or so)... so there shouldn't have been any "extreme ways" of using it. I think (but not sure), that this was also the case on the other occasion that happened to me with a different fs (i.e. I think it was also a backup 8TB disk). > For example, on a 1G btrfs fs with moderate operations, for example > 15min or so, to reproduce the bug? Well I could try to produce it, but I guess you'd have far better means to do so. As I've said I was mostly doing send (with -p) | receive to do incremental backups... and after a while I was cleaning up the old snapshots on the backup fs. Of course the snapshot subvols are pretty huge.. as I've said close to 8TB (7.5 or so)... everything from quite big files (4GB) to very small, smylinks (no device/sockets/fifos)... perhaps some hardlinks... Some refcopied files. The whole fs has compression enabled. > > Shall I rw-mount the fs and do sync and wait and retry? Or is there > > anything else that you want me to try before in order to get the > > kernel > > bug (if any) or btrfs-progs bug nailed down? > > Personally speaking, rw mount would help, to verify if it's just a > bug > that will disappear after the deletion is done. Well but than we might loose any chance to further track it down. And even if it would go away, it would still at least be a bug in terms of fsck false positive.... if not more (in the sense of... corruptions may happen if some affect parts of the fs are used while not cleaned up again). > But considering the size of your fs, it may not be a good idea as we > don't have reliable method to recover/rebuild extent tree yet. So what do you effectively want now? Wait and try something else? RW mount and recheck to see whether it goes away with that? (And even if, should I rather re-create/populate the fs from scratch just to be sure? What I can also offer in addition... as mentioned some times previously, I do have full lists of the reg-files/dirs/symlinks as well as SHA512 sums of each of the reg-files, as they are expected to be on the fs respectively the snapshot. So I can offer to do a full verification pass of these, to see whether anything is missing or (file)data actually corrupted. Of course that will take a while, and even if everything verifies, I'm still not really sure whether I'd trust that fs anymore ;-) Cheers, Chris.
smime.p7s
Description: S/MIME cryptographic signature