Re: btrfs send extremely slow (almost stuck)

Oliver Freyermuth Tue, 06 Sep 2016 14:54:06 -0700

Am 06.09.2016 um 04:46 schrieb Qu Wenruo:
> But your idea to locate the inode seems good enough for debugging though.

Based on this I even had another idea which seems to have worked well - and I
am now also able to provide any additional debug output you may need.

Since my procedure may be interesting / helpful for other "debugging users",
I'll shortly outline it here.
I had enough extra space on an external HDD. I cloned the full btrfs partition
with 'dd' to an image on this HDD.
I loop'ed that image read-only on another machine, created an overlay-file and
used device mapper
to get a read-write block device for any experiment (based on the read-only
image).
Details on that e.g. at
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
.

> In this case, before reverting to a backup, would you please run a "btrfsck
> check" and paste the output?

Now, I ran 'btrfs check' on that device. I'm using the very fresh btrfs-progs
4.7.2.
The output is here:
http://pastebin.com/rMrW40RU
Notably, it claims to have found some other issues, mainly wrong link counts
and dir isizes, but for various inodes...

Now, I could also safely run 'btrfs check --repair' on this device without any
risks.
The output from that is here:
http://pastebin.com/XW9ChuqU

Another 'btrfs check' run afterwards now reveals different issues:
http://pastebin.com/TFKJa81e

Now, another repair:
http://pastebin.com/33iqaE9E

Now, finally, btrfs check is happy:
http://pastebin.com/izkERtKp

After mounting, finally (kernel 4.7.2) I see in kernel log:
[12108.696912] BTRFS info (device dm-0): disk space caching is enabled
[12108.713176] BTRFS info (device dm-0): checking UUID tree

I can now delete the "broken" .thunderbird folder on this "repaired" fs.
I can also mount it and write data on it.

Concluding from these results that it should be safe to do the same to my
original block device with the same btrfs-progs version
I did just that (check, repair, check, repair, check) from a live system
directly on the machine.
Up until now, the FS seems to be doing well again - I took the chance to enable
skinny extents and am now doing a full metadata balance,
saving me about 0.25 % of metadata space.
So finally, first time in my life, 'btrfs check --repair' did not eat my data!
:-)

The cool thing is that now I still have the broken image (extracted with dd)
around and can play with it to provide you with any debug-info
without having to work directly with the broken FS on the machine itself.

Now, let's get started on that.

ls -aldi .thunderbird-broken/p6bm45oa.default/
162786 drwx------ 1 olifre olifre 2482 5. Sep 23:07
.thunderbird-broken/p6bm45oa.default/
As you can see, I had renamed .thunderbird to .thunderbird-broken. The real
issue is in any case the profile-subfolder within.
So the affected ino is indeed 162786 which also shows up (as one of several
issues...) in the btrfs check (and repair) output.

> Further more, your btrfs-debug-tree dump should provide more help for this
> case.

Just to make sure the debug-tree output matches the rest of all the information
I'm giving you, I re-ran that on the dd'ed image from the broken FS like so:
btrfs-debug-tree -t 442 xmg13.img | sed "s/name:.*//" > debug-tree

I ran the output through xz (or rather, pixz) and here it is:
https://cernbox.cern.ch/index.php/s/imjwqsOFerUklqr/download
I'll probably not keep the file up there forever, but at least for quite some
days.

If you can think of any other information which may be useful to diagnose the
underlying issue which caused that corruption
just let me know. I'll keep the image of the broken FS around for a few weeks.

Cheers,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: btrfs send extremely slow (almost stuck)

Reply via email to