Re: btrfs send extremely slow (almost stuck)
on 30.08.2016 at 02:48 Qu Wenruo wrote : > Not the first, but although still few. > There is a xfstest case submitted for it, and even before the test case, there are already report from IRC. > Anyway, I'll add Cc for you after the new IRC patch is out. Please count me in. I have this occur when I'm backing up a file server I use to hold reflinked incrementals from client machines. Backing up from clients to server is very quick (mere seconds, no incrementals there), but backup of the server volume itself is very slow even with limited changes. With clone detection enabled, that backup takes nearly seven hours. Sending a complete volume to a blank filesystem (so no reflinks are present at the destination) is a matter of only a few minutes. Many thanks to Hermann Schwarzler whose suggestion led me onto this. J. Hart -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Duncan wrote on Mon, 05 Sep 2016 19:14:30 -0700: > I had something very similar happen here a few weeks ago, except with my > firefox profile dir (I don't run thunderbird, preferring claws-mail, but > I do run firefox as my browser). Indeed, I also note Firefox doing a lot of IO especially if session recovery is enabled, so I can totally imagine this causing similar issues... > My use-case does neither snapshots nor send/receive, however, so it was > just the single root subvolume (5). But there was supposedly a file in > that dir according to bash's tab-completion, that would neither list, nor > rm, which meant the dir couldn't rm -r either. (Interestingly enough, rm > -i asked if I wanted to rm "weird file" whatever, and weird it indeed was! > ) Sadly, for me there is / was no file at all "visible", neither via tab nor via 'rm -i'. > So I immediately copied all the normal files to a new dir, and deleted > the normal files from the problem dir, leaving only the weird one. > Then I renamed the problem dir in ordered to be able to rename the new > dir (with the good files) back to the name firefox expected. That was exactly my "backup plan" I applied yesterday. In my case, luckily, I even had a full backup of the profile just a few hours old, so I just took that to replace the folder with a fresh one after renaming it. > Then I decided to see what I could do with the renamed dir. I believe I > rebooted (or umount/mount cycled the filesystem) as well. I think I had > to use the magic-sysrq m/remount-ro key as it refused to umount even from > systemd emergency mode. But here's the interesting part. At least after > the rename and a reboot, it *DID* let me delete (using mc) the dir! I > honestly didn't expect it'd let me, but it did. For me all the shutdowns went fine (the problem must / may have been present for weeks, I only noticed now that btrfs send finally did something - and errored out) - and the problem, sadly, was not fixed after any reboot. I guess after all for me it was corruption on the directory itself (or rather its isize), while for you it was some other sort of metadata corruption causing a "weirdly behaving" file. > The difference, however, is that I didn't have any snapshots/subvolumes > or other reflinks to the "weird" file, only the one normal hardlink. So > even if it's the same thing, I'm not sure if it'll work for you given the > multiple snapshot reflinks to the file, as it did for me with just the > one. I did at least try to delete all snapshots which could reference that file - did not help. I also tried running 'btrfs defrag' on that folder, which should have broken up any reflinks, this also did not help. But luckily (as you can see from my other mail) two "btrfs check --repair" iterations finally fixed my issue. I hope the experts can figure out something from my uploaded debug info to prevent such things in the future. Thanks a lot in any case for your experience report! I hope my "repair experience" from my other mail made from my user's perspective may at some point of time also be of help to you (even though, I hope, you'll never need it). Cheers and thanks again, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Am 06.09.2016 um 04:46 schrieb Qu Wenruo: > But your idea to locate the inode seems good enough for debugging though. Based on this I even had another idea which seems to have worked well - and I am now also able to provide any additional debug output you may need. Since my procedure may be interesting / helpful for other "debugging users", I'll shortly outline it here. I had enough extra space on an external HDD. I cloned the full btrfs partition with 'dd' to an image on this HDD. I loop'ed that image read-only on another machine, created an overlay-file and used device mapper to get a read-write block device for any experiment (based on the read-only image). Details on that e.g. at https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file . > In this case, before reverting to a backup, would you please run a "btrfsck > check" and paste the output? Now, I ran 'btrfs check' on that device. I'm using the very fresh btrfs-progs 4.7.2. The output is here: http://pastebin.com/rMrW40RU Notably, it claims to have found some other issues, mainly wrong link counts and dir isizes, but for various inodes... Now, I could also safely run 'btrfs check --repair' on this device without any risks. The output from that is here: http://pastebin.com/XW9ChuqU Another 'btrfs check' run afterwards now reveals different issues: http://pastebin.com/TFKJa81e Now, another repair: http://pastebin.com/33iqaE9E Now, finally, btrfs check is happy: http://pastebin.com/izkERtKp After mounting, finally (kernel 4.7.2) I see in kernel log: [12108.696912] BTRFS info (device dm-0): disk space caching is enabled [12108.713176] BTRFS info (device dm-0): checking UUID tree I can now delete the "broken" .thunderbird folder on this "repaired" fs. I can also mount it and write data on it. Concluding from these results that it should be safe to do the same to my original block device with the same btrfs-progs version I did just that (check, repair, check, repair, check) from a live system directly on the machine. Up until now, the FS seems to be doing well again - I took the chance to enable skinny extents and am now doing a full metadata balance, saving me about 0.25 % of metadata space. So finally, first time in my life, 'btrfs check --repair' did not eat my data! :-) The cool thing is that now I still have the broken image (extracted with dd) around and can play with it to provide you with any debug-info without having to work directly with the broken FS on the machine itself. Now, let's get started on that. ls -aldi .thunderbird-broken/p6bm45oa.default/ 162786 drwx-- 1 olifre olifre 2482 5. Sep 23:07 .thunderbird-broken/p6bm45oa.default/ As you can see, I had renamed .thunderbird to .thunderbird-broken. The real issue is in any case the profile-subfolder within. So the affected ino is indeed 162786 which also shows up (as one of several issues...) in the btrfs check (and repair) output. > Further more, your btrfs-debug-tree dump should provide more help for this > case. Just to make sure the debug-tree output matches the rest of all the information I'm giving you, I re-ran that on the dd'ed image from the broken FS like so: btrfs-debug-tree -t 442 xmg13.img | sed "s/name:.*//" > debug-tree I ran the output through xz (or rather, pixz) and here it is: https://cernbox.cern.ch/index.php/s/imjwqsOFerUklqr/download I'll probably not keep the file up there forever, but at least for quite some days. If you can think of any other information which may be useful to diagnose the underlying issue which caused that corruption just let me know. I'll keep the image of the broken FS around for a few weeks. Cheers, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
At 09/06/2016 05:29 AM, Oliver Freyermuth wrote: Am 05.09.2016 um 07:21 schrieb Qu Wenruo: Did you get the half way send stream? Luckily, yes! If the send stream has something, please use "--no-data" option to send the subvolume again to get the metadata only dump, and upload it for debug. Also the metadata-only dump fails with the same ioctl error (-2: No such file or directory). So I could only upload the stream up the occurence of that failure... Also, please paste "btrfs-debug-tree -t " output for debug. WARN: above "btrfs-debug-tree" command will contain file names. You could use the following sed to wipe filename: "btrfs-debug-tree -t 5 /dev/sda6 | sed "s/name:.*//" This indeed runs through without failure. It seems though that "btrfs send --no-data" which contains full metadata anyways contains all filenames (just from a quick look with 'strings'). I can probably not remove these without invalidating the stream, though... So I'd not like to upload this to some public location. Not a problem. You can try this branch of btrfs-progs: https://github.com/adam900710/btrfs-progs/tree/dump_send_stream Which adds a new subcommand "btrfs inspect dump-send". That command will dump all metadata for a send stream, like: -- ./btrfs ins dump-send < /tmp/output subvol: ./ro_snap uuid: 356a747f-b42f-1f4e-911d-fa5259f037f7, transid: 8 chown: ./ro_snap/ gid: 0, uid: 0 chmod: ./ro_snap/ mode: 755 utimes: ./ro_snap/ mkdir: ./ro_snap/o257-7-0 rename: ./ro_snap/o257-7-0 to ./ro_snap/etc utimes: ./ro_snap/ chown: ./ro_snap/etc gid: 0, uid: 0 chmod: ./ro_snap/etc mode: 755 utimes: ./ro_snap/etc mkfile: ./ro_snap/o258-7-0 rename: ./ro_snap/o258-7-0 to ./ro_snap/etc/hostname .. -- Where /tmp/output is a send stream. In that case you can mask all your file name. But your idea to locate the inode seems good enough for debugging though. However, you gave me an idea. I had a look at the output of running the file created by "btrfs send --no-data" piping that through "strings". This revealed the last files which btrfs send was able to treat before running into the ioctl failure. Indeed, this is my thunderbird profile directory, always a place with a lot of activity. Now the interesting part begins: Since of course I have a backup of this directory, I decided to move that profile to another FS and back. Turns out I can not run rm -rf ~/.thunderbird since it claims "directory not empty". Kernel log does no bug-on or OOPS or anything like that. That's reproducible not only in the snapshots, but also in my "home" subvolume for this folder. "stat -c %s" of the supposed-to-be-empty profile directory reveals indeed: 2482 In this case, before reverting to a backup, would you please run a "btrfsck check" and paste the output? Further more, your btrfs-debug-tree dump should provide more help for this case. With btrfs-debug-tree dump, at least we can find what's going wrong and causing the rm -rf failure. So I guess I should refresh my backups soon and either run "btrfs check --repair" or, if that fails, redo the FS... Likely btrfs check --repair will fail for me since (due to duperemove usage) I'll for sure also be hit by https://bugzilla.kernel.org/show_bug.cgi?id=155791 since I'm still using 4.7.1 so I'd like to update to 4.7.2 before trying out that repair strategy. I sadly can't do that in the next few days since I actively need the machine in question, so I'll rename that folder and restore just that from backup for now. Is the debug-information still of interest? If so, I can share it (but would not post it publicly to the list since many filenames are in there...). It weighs in at about 2 x 80 MiB after xz compression. Yes, debug dump is quite helpful. Better with your .thunderbird inode number. (ls -aldi .thunderbird can give the inode number, the first number) For debug tree filename problem, feel free to wipe the filename with the sed pipe I mentioned in previous mail. IIRC it should wipe all possible filename. Or is there anything else I can try safely? Despite debug-tree dump, which is sometimes overkilled, "btrfs check" (default in read-only mode) with v4.6.1 will help a lot. It will locate the direct problem very quickly and save us quite sometime to manually check the debug tree dump. (I assume your send problem is not related to send, but corrupted fs tree) Although it needs to be run on unmounted fs, so you may need to enter single user mode or use a liveCD/USB to do it. Thanks, Qu Thanks a lot in any case and cheers, Oliver Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://
Re: btrfs send extremely slow (almost stuck)
Oliver Freyermuth posted on Mon, 05 Sep 2016 23:29:08 +0200 as excerpted: > However, you gave me an idea. I had a look at the output of running the > file created by "btrfs send --no-data" piping that through "strings". > This revealed the last files which btrfs send was able to treat before > running into the ioctl failure. > Indeed, this is my thunderbird profile directory, always a place with a > lot of activity. > > Now the interesting part begins: Since of course I have a backup of this > directory, I decided to move that profile to another FS and back. > Turns out I can not run rm -rf ~/.thunderbird since it claims "directory > not empty". Kernel log does no bug-on or OOPS or anything like that. > > That's reproducible not only in the snapshots, but also in my "home" > subvolume for this folder. > > "stat -c %s" of the supposed-to-be-empty profile directory reveals > indeed: > 2482 > > So I guess I should refresh my backups soon and either run "btrfs check > --repair" or, if that fails, redo the FS... > Likely btrfs check --repair will fail for me since (due to duperemove > usage) I'll for sure also be hit by > https://bugzilla.kernel.org/show_bug.cgi?id=155791 since I'm still using > 4.7.1 so I'd like to update to 4.7.2 before trying out that repair > strategy. > > I sadly can't do that in the next few days since I actively need the > machine in question, so I'll rename that folder and restore just that > from backup for now. > > Is the debug-information still of interest? If so, I can share it (but > would not post it publicly to the list since many filenames are in > there...). > It weighs in at about 2 x 80 MiB after xz compression. > > Or is there anything else I can try safely? I had something very similar happen here a few weeks ago, except with my firefox profile dir (I don't run thunderbird, preferring claws-mail, but I do run firefox as my browser). My use-case does neither snapshots nor send/receive, however, so it was just the single root subvolume (5). But there was supposedly a file in that dir according to bash's tab-completion, that would neither list, nor rm, which meant the dir couldn't rm -r either. (Interestingly enough, rm -i asked if I wanted to rm "weird file" whatever, and weird it indeed was! ) So I immediately copied all the normal files to a new dir, and deleted the normal files from the problem dir, leaving only the weird one. Then I renamed the problem dir in ordered to be able to rename the new dir (with the good files) back to the name firefox expected. Then I decided to see what I could do with the renamed dir. I believe I rebooted (or umount/mount cycled the filesystem) as well. I think I had to use the magic-sysrq m/remount-ro key as it refused to umount even from systemd emergency mode. But here's the interesting part. At least after the rename and a reboot, it *DID* let me delete (using mc) the dir! I honestly didn't expect it'd let me, but it did. So I'd try that. After copying all the good files out and renaming the dir out of the way, so you can rename the dir you copied the good files into back into place, reboot (or umount and mount again if possible), possibly by going to single-user or emergency mode first and using magic srq remount-ro to force it, if necessary, before rebooting. Then try to delete the dir again, and see if it will. The difference, however, is that I didn't have any snapshots/subvolumes or other reflinks to the "weird" file, only the one normal hardlink. So even if it's the same thing, I'm not sure if it'll work for you given the multiple snapshot reflinks to the file, as it did for me with just the one. So it might not work at all for you, or might work but you have to delete it in each snapshot, or deleting it in one might delete it in all (which would be weird, but it's already a weird file we're dealing with, so who knows...), I don't know which. And that of course assumes it's even the same basic bug and would behave as it did for me if you had no snapshots. That was with kernel 4.7.0 (which I'm still running, I'll be upgrading to 4.8 rcs pretty soon now) I believe. If not, then it was late in the 4.7 rc cycle or possibly 4.6.0, but it was definitely not older than that. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Am 05.09.2016 um 07:21 schrieb Qu Wenruo: > Did you get the half way send stream? Luckily, yes! > If the send stream has something, please use "--no-data" option to send the > subvolume again to get the metadata only dump, and upload it for debug. Also the metadata-only dump fails with the same ioctl error (-2: No such file or directory). So I could only upload the stream up the occurence of that failure... > > Also, please paste "btrfs-debug-tree -t " output for debug. > WARN: above "btrfs-debug-tree" command will contain file names. > You could use the following sed to wipe filename: > > "btrfs-debug-tree -t 5 /dev/sda6 | sed "s/name:.*//" This indeed runs through without failure. It seems though that "btrfs send --no-data" which contains full metadata anyways contains all filenames (just from a quick look with 'strings'). I can probably not remove these without invalidating the stream, though... So I'd not like to upload this to some public location. However, you gave me an idea. I had a look at the output of running the file created by "btrfs send --no-data" piping that through "strings". This revealed the last files which btrfs send was able to treat before running into the ioctl failure. Indeed, this is my thunderbird profile directory, always a place with a lot of activity. Now the interesting part begins: Since of course I have a backup of this directory, I decided to move that profile to another FS and back. Turns out I can not run rm -rf ~/.thunderbird since it claims "directory not empty". Kernel log does no bug-on or OOPS or anything like that. That's reproducible not only in the snapshots, but also in my "home" subvolume for this folder. "stat -c %s" of the supposed-to-be-empty profile directory reveals indeed: 2482 So I guess I should refresh my backups soon and either run "btrfs check --repair" or, if that fails, redo the FS... Likely btrfs check --repair will fail for me since (due to duperemove usage) I'll for sure also be hit by https://bugzilla.kernel.org/show_bug.cgi?id=155791 since I'm still using 4.7.1 so I'd like to update to 4.7.2 before trying out that repair strategy. I sadly can't do that in the next few days since I actively need the machine in question, so I'll rename that folder and restore just that from backup for now. Is the debug-information still of interest? If so, I can share it (but would not post it publicly to the list since many filenames are in there...). It weighs in at about 2 x 80 MiB after xz compression. Or is there anything else I can try safely? Thanks a lot in any case and cheers, Oliver > > Thanks, > Qu > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
At 09/05/2016 05:41 AM, Oliver Freyermuth wrote: Am 30.08.2016 um 02:48 schrieb Qu Wenruo: Yes. And more specifically, it doesn't even affect delta backup. For shared extents caused by reflink/dedupe(out-of-band or even incoming in-band), it will be send as individual files. For contents, they are all the same, just more space usage. For those interested, I have now actually tested the btrfs send / btrfs receive backup for several subvolumes after applying this patch. The throughput is finally usable, almost hitting network / IO limits as expected - ideal so far! Also delta seemed fine for the subvolumes for which things worked. However, I now sadly get (for one of my subvolumes): send ioctl failed with -2: No such file or directory at some point during the transfer, it sadly seems to be reproducible. I do not think it's related to this patch, but of course this makes "btrfs send" still unusable to me - I guess it's not ready for general use just yet. Is there any information I can easily extract / provide to allow the experts to fix this issue? Did you get the half way send stream? If the send stream has something, please use "--no-data" option to send the subvolume again to get the metadata only dump, and upload it for debug. Also, please paste "btrfs-debug-tree -t " output for debug. WARN: above "btrfs-debug-tree" command will contain file names. You could use the following sed to wipe filename: "btrfs-debug-tree -t 5 /dev/sda6 | sed "s/name:.*//" Thanks, Qu The kernel log shows nothing. Thanks a lot, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Am 30.08.2016 um 02:48 schrieb Qu Wenruo: > Yes. > And more specifically, it doesn't even affect delta backup. > > For shared extents caused by reflink/dedupe(out-of-band or even incoming > in-band), it will be send as individual files. > > For contents, they are all the same, just more space usage. For those interested, I have now actually tested the btrfs send / btrfs receive backup for several subvolumes after applying this patch. The throughput is finally usable, almost hitting network / IO limits as expected - ideal so far! Also delta seemed fine for the subvolumes for which things worked. However, I now sadly get (for one of my subvolumes): send ioctl failed with -2: No such file or directory at some point during the transfer, it sadly seems to be reproducible. I do not think it's related to this patch, but of course this makes "btrfs send" still unusable to me - I guess it's not ready for general use just yet. Is there any information I can easily extract / provide to allow the experts to fix this issue? The kernel log shows nothing. Thanks a lot, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
At 08/31/2016 09:35 AM, Jeff Mahoney wrote: On 8/28/16 10:12 PM, Qu Wenruo wrote: At 08/29/2016 10:11 AM, Qu Wenruo wrote: At 08/28/2016 11:38 AM, Oliver Freyermuth wrote: Dear btrfs experts, I just tried to make use of btrfs send / receive for incremental backups (using btrbk to simplify the process). It seems that on my two machines, btrfs send gets stuck after transferring some GiB - it's not fully halted, but instead of making full use of the available I/O, I get something < 500 kiB on average, which are just some "full speed spikes" with many seconds / minutes of no I/O in between. During this "halting", btrfs send eats one full CPU core. A "perf top" shows this is spent in "find_parent_nodes" and "__merge_refs" inside the kernel. I am using btrfs-progs 4.7 and kernel 4.7.0. Unknown bug, while unfortunately no good idea to solve yet. Sorry, known bug, not unknown I'm working on a patch to replace the lists with a pair of trees that get merged after filling in the missing parents. Wow, nice. I was planning to do it but didn't get started yet. The list is really causing the problem. Converting to rb_tree should at least reduce the O(n^3)~O(n^4) to O(n^2logn). While the backref walk call in the loop of iterating every file extents is never a good idea for me, I'll still try to fix at the send side as an RFC patch too. Thanks, Qu The reflink xfstests don't complete, ever. btrfs/130 triggers soft lockups but do complete eventually -- and that's only with ~4k list elements. -Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
On 8/28/16 10:12 PM, Qu Wenruo wrote: > > > At 08/29/2016 10:11 AM, Qu Wenruo wrote: >> >> >> At 08/28/2016 11:38 AM, Oliver Freyermuth wrote: >>> Dear btrfs experts, >>> >>> I just tried to make use of btrfs send / receive for incremental >>> backups (using btrbk to simplify the process). >>> It seems that on my two machines, btrfs send gets stuck after >>> transferring some GiB - it's not fully halted, but instead of making >>> full use of the available I/O, I get something < 500 kiB on average, >>> which are just some "full speed spikes" with many seconds / minutes of >>> no I/O in between. >>> >>> During this "halting", btrfs send eats one full CPU core. >>> A "perf top" shows this is spent in "find_parent_nodes" and >>> "__merge_refs" inside the kernel. >>> I am using btrfs-progs 4.7 and kernel 4.7.0. >> >> Unknown bug, while unfortunately no good idea to solve yet. > > Sorry, known bug, not unknown I'm working on a patch to replace the lists with a pair of trees that get merged after filling in the missing parents. The reflink xfstests don't complete, ever. btrfs/130 triggers soft lockups but do complete eventually -- and that's only with ~4k list elements. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: btrfs send extremely slow (almost stuck)
At 08/29/2016 06:02 PM, Oliver Freyermuth wrote: Am 29.08.2016 um 04:11 schrieb Qu Wenruo: Unknown bug, while unfortunately no good idea to solve yet. I sent a RFC patch to completely disable shared extent detection, while got strong objection. I also submitted some other ideas on fixing it, while still got strong objection. Objection includes this is a performance problem, not a function problem and we should focus on function problem first and postpone such performance problem. And further more, Btrfs from the beginning of its design, focuses on fast snapshot creation, and takes backref walk as sacrifice. So it's not an easy thing to fix. As a user, I must say, thanks a lot for your work on this! I don't expect there will be even an agreement on how to fix the problem in v4.1x. Fixes in send will lead to obvious speed improvement, while cause incompatibility or super complex design. Fixes in backref will lead to a backref rework, which normally comes with new regression, and we are even unsure if it will really help. If you just hate the super slow send, and can accept the extra space usage, please try this RFC patch: https://patchwork.kernel.org/patch/9245287/ This patch, just as its name, will completely stop same extent(reflink) detection. Which will cause more space usage, while it skipped the super time consuming find_parent_nodes(), it should at least workaround your problem. If I interpret the code correctly, this only affects "btrfs send", and only causes "duplication" of previously shared extents, correct? Yes. And more specifically, it doesn't even affect delta backup. For shared extents caused by reflink/dedupe(out-of-band or even incoming in-band), it will be send as individual files. For contents, they are all the same, just more space usage. Then this is for me (as a user) perfectly fine - btrfs send should run much faster (< 3 hours instead of unusable 80 hours for my root volume) and I can just run duperemove on the readonly snapshots at the backup location later without issues (it's of course some extra I/O on disk and network, but at least it will be usable). Nice to hear that. I have some other idea to fix it with less aggressive idea, while since there is objection against it, I didn't code it further. But, since there are *REAL* *WORLD* users reporting such problem, I think I'd better restart the fix as an RFC. Thanks a lot, as a user I would certainly appreciate work in this area. I would not have expected that this really is a known issue, since I would have thought that btrfs send was commonly used for backup purposes, and offline deduplication on SSD drives especially on mobile devices to gain significant amount of space did not seem like an exotic usecase to me. So in short, I'm really suprised to be one of the first / few to complain about this as a user, I did not feel like my usecase was special or exotic (at least, up to now). Not the first, but although still few. There is a xfstest case submitted for it, and even before the test case, there are already report from IRC. Anyway, I'll add Cc for you after the new IRC patch is out. Thanks, Qu Thanks a lot, Oliver Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Am Sun, 28 Aug 2016 17:41:22 -0400 schrieb james harvey : > On Sun, Aug 28, 2016 at 12:15 PM, Oliver Freyermuth > wrote: > > For me, this means I have to stay with rsync backups, which are > > sadly incomplete since special FS attrs like "C" for nocow are not > > backed up. > > Should be able to make a script that creates a textfile with lsattr > for every file. Then either just leave that file as part of the > backup in case it's needed some day, or making a corresponding script > on the backup machine to restore those. The problem with this idea is that chattr +C will only work on empty files, so it needs to be applied in the "middle", read: upon creating the file and before filling it with content. It would be possible to let a script first create empty files according to this list and then use "rsync --no-whole-file --inplace" so it will build upon the empty files instead of its usual behavior to create files temporarily and then rename them into place. I'd recommend to use these options anyways if writing to btrfs snapshots to take advantage of shared extents. Apparently rsync cannot handle sparse files in this mode (tho there should be a patch to make this possible by using the hole-punching feature of newer kernels but it makes the rsync protocol incompatible to unpatched versions AFAIR). I think borgbackup suffers from the same problem. While in latest version it seems to support attrs, it does apply them after filling the files with contents (as most programs do, also attributes like mtime, owner etc are applied after closing the written file, for obvious reasons). This simply doesn't work for +C on btrfs. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Am 29.08.2016 um 04:11 schrieb Qu Wenruo: > Unknown bug, while unfortunately no good idea to solve yet. > > I sent a RFC patch to completely disable shared extent detection, while > got strong objection. > > I also submitted some other ideas on fixing it, while still got strong > objection. Objection includes this is a performance problem, not a > function problem and we should focus on function problem first and > postpone such performance problem. > > And further more, Btrfs from the beginning of its design, focuses on > fast snapshot creation, and takes backref walk as sacrifice. > So it's not an easy thing to fix. As a user, I must say, thanks a lot for your work on this! > > I don't expect there will be even an agreement on how to fix the problem > in v4.1x. > > Fixes in send will lead to obvious speed improvement, while cause > incompatibility or super complex design. > Fixes in backref will lead to a backref rework, which normally comes > with new regression, and we are even unsure if it will really help. > > If you just hate the super slow send, and can accept the extra space > usage, please try this RFC patch: > > https://patchwork.kernel.org/patch/9245287/ > > > This patch, just as its name, will completely stop same extent(reflink) > detection. > Which will cause more space usage, while it skipped the super time > consuming find_parent_nodes(), it should at least workaround your problem. If I interpret the code correctly, this only affects "btrfs send", and only causes "duplication" of previously shared extents, correct? Then this is for me (as a user) perfectly fine - btrfs send should run much faster (< 3 hours instead of unusable 80 hours for my root volume) and I can just run duperemove on the readonly snapshots at the backup location later without issues (it's of course some extra I/O on disk and network, but at least it will be usable). > I have some other idea to fix it with less aggressive idea, while since > there is objection against it, I didn't code it further. > > But, since there are *REAL* *WORLD* users reporting such problem, I > think I'd better restart the fix as an RFC. Thanks a lot, as a user I would certainly appreciate work in this area. I would not have expected that this really is a known issue, since I would have thought that btrfs send was commonly used for backup purposes, and offline deduplication on SSD drives especially on mobile devices to gain significant amount of space did not seem like an exotic usecase to me. So in short, I'm really suprised to be one of the first / few to complain about this as a user, I did not feel like my usecase was special or exotic (at least, up to now). Thanks a lot, Oliver > Thanks, > Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
At 08/29/2016 10:11 AM, Qu Wenruo wrote: At 08/28/2016 11:38 AM, Oliver Freyermuth wrote: Dear btrfs experts, I just tried to make use of btrfs send / receive for incremental backups (using btrbk to simplify the process). It seems that on my two machines, btrfs send gets stuck after transferring some GiB - it's not fully halted, but instead of making full use of the available I/O, I get something < 500 kiB on average, which are just some "full speed spikes" with many seconds / minutes of no I/O in between. During this "halting", btrfs send eats one full CPU core. A "perf top" shows this is spent in "find_parent_nodes" and "__merge_refs" inside the kernel. I am using btrfs-progs 4.7 and kernel 4.7.0. Unknown bug, while unfortunately no good idea to solve yet. Sorry, known bug, not unknown Thanks, Qu I sent a RFC patch to completely disable shared extent detection, while got strong objection. I also submitted some other ideas on fixing it, while still got strong objection. Objection includes this is a performance problem, not a function problem and we should focus on function problem first and postpone such performance problem. And further more, Btrfs from the beginning of its design, focuses on fast snapshot creation, and takes backref walk as sacrifice. So it's not an easy thing to fix. I googled a bit and found related patchwork (https://patchwork.kernel.org/patch/9238987/) which seems to workaround high load in this area and mentions a real solution is proposed but not yet there. Since this affects two machines of mine and backupping my root volume would take about 80 hours in case I can extrapolate the average rate, this means btrfs send is unusable to me. Can I assume this is a common issue which will be fixed in a later kernel release (4.8, 4.9) or can I do something to my FS's to workaround this issue? I don't expect there will be even an agreement on how to fix the problem in v4.1x. Fixes in send will lead to obvious speed improvement, while cause incompatibility or super complex design. Fixes in backref will lead to a backref rework, which normally comes with new regression, and we are even unsure if it will really help. If you just hate the super slow send, and can accept the extra space usage, please try this RFC patch: https://patchwork.kernel.org/patch/9245287/ This patch, just as its name, will completely stop same extent(reflink) detection. Which will cause more space usage, while it skipped the super time consuming find_parent_nodes(), it should at least workaround your problem. I have some other idea to fix it with less aggressive idea, while since there is objection against it, I didn't code it further. But, since there are *REAL* *WORLD* users reporting such problem, I think I'd better restart the fix as an RFC. Thanks, Qu One FS is only two weeks old, the other one now about 1 year. I did some balancing at some points of time to have more unallocated space for trimming, and used duperemove regularly to free space. One FS has skinny extents, the other has not. Mount options are "rw,noatime,compress=zlib,ssd,space_cache,commit=120". Apart from that: No RAID or any other special configuration involved. Cheers and any help appreciated, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
At 08/28/2016 11:38 AM, Oliver Freyermuth wrote: Dear btrfs experts, I just tried to make use of btrfs send / receive for incremental backups (using btrbk to simplify the process). It seems that on my two machines, btrfs send gets stuck after transferring some GiB - it's not fully halted, but instead of making full use of the available I/O, I get something < 500 kiB on average, which are just some "full speed spikes" with many seconds / minutes of no I/O in between. During this "halting", btrfs send eats one full CPU core. A "perf top" shows this is spent in "find_parent_nodes" and "__merge_refs" inside the kernel. I am using btrfs-progs 4.7 and kernel 4.7.0. Unknown bug, while unfortunately no good idea to solve yet. I sent a RFC patch to completely disable shared extent detection, while got strong objection. I also submitted some other ideas on fixing it, while still got strong objection. Objection includes this is a performance problem, not a function problem and we should focus on function problem first and postpone such performance problem. And further more, Btrfs from the beginning of its design, focuses on fast snapshot creation, and takes backref walk as sacrifice. So it's not an easy thing to fix. I googled a bit and found related patchwork (https://patchwork.kernel.org/patch/9238987/) which seems to workaround high load in this area and mentions a real solution is proposed but not yet there. Since this affects two machines of mine and backupping my root volume would take about 80 hours in case I can extrapolate the average rate, this means btrfs send is unusable to me. Can I assume this is a common issue which will be fixed in a later kernel release (4.8, 4.9) or can I do something to my FS's to workaround this issue? I don't expect there will be even an agreement on how to fix the problem in v4.1x. Fixes in send will lead to obvious speed improvement, while cause incompatibility or super complex design. Fixes in backref will lead to a backref rework, which normally comes with new regression, and we are even unsure if it will really help. If you just hate the super slow send, and can accept the extra space usage, please try this RFC patch: https://patchwork.kernel.org/patch/9245287/ This patch, just as its name, will completely stop same extent(reflink) detection. Which will cause more space usage, while it skipped the super time consuming find_parent_nodes(), it should at least workaround your problem. I have some other idea to fix it with less aggressive idea, while since there is objection against it, I didn't code it further. But, since there are *REAL* *WORLD* users reporting such problem, I think I'd better restart the fix as an RFC. Thanks, Qu One FS is only two weeks old, the other one now about 1 year. I did some balancing at some points of time to have more unallocated space for trimming, and used duperemove regularly to free space. One FS has skinny extents, the other has not. Mount options are "rw,noatime,compress=zlib,ssd,space_cache,commit=120". Apart from that: No RAID or any other special configuration involved. Cheers and any help appreciated, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
On Sun, Aug 28, 2016 at 12:15 PM, Oliver Freyermuth wrote: > For me, this means I have to stay with rsync backups, which are sadly > incomplete since special FS attrs > like "C" for nocow are not backed up. Should be able to make a script that creates a textfile with lsattr for every file. Then either just leave that file as part of the backup in case it's needed some day, or making a corresponding script on the backup machine to restore those. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
(sorry if my Message-ID header is missing, I am not subscribed to the mailing list, so I reply using mail-archive) > So a workaround would be reducing your duperemove usage and possibly > rewriting (for instance via defrag) the deduped files to kill the > multiple reflinks. Or simply delete the additional reflinked copies, if > your use-case allows it. Sadly, I need the extra space (that's why I was using duperemove in the first place) and can not delete all duped copies. These are mainly several checkouts of different repositories with partially common (partially large binary) content. > And thin down your snapshot retention if you have many snapshots per > subvolume. With the geometric scaling issues, thinning to under 300 per > subvolume should be quite reasonable in nearly all circumstances, and > thinning to under 100 per subvolume may be possible and should result in > dramatically reduced scaling issues. In addition, I have only ~ 5 snapshots for both those volumes, which should certainly not be too much. So in short, this just means btrfs send is (still) unusable for filesystems which rely on the offline dedupe feature (in the past 'btrfs send' got broken after dedupe which got fixed, now it is just extremely slow). For me, this means I have to stay with rsync backups, which are sadly incomplete since special FS attrs like "C" for nocow are not backed up. Cheers and thanks for your reply, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send extremely slow (almost stuck)
Oliver Freyermuth posted on Sun, 28 Aug 2016 05:38:00 +0200 as excerpted: > Dear btrfs experts, > > I just tried to make use of btrfs send / receive for incremental backups > (using btrbk to simplify the process). > It seems that on my two machines, btrfs send gets stuck after > transferring some GiB - it's not fully halted, but instead of making > full use of the available I/O, I get something < 500 kiB on average, > which are just some "full speed spikes" with many seconds / minutes of > no I/O in between. > > During this "halting", btrfs send eats one full CPU core. > A "perf top" shows this is spent in "find_parent_nodes" and > "__merge_refs" inside the kernel. > I am using btrfs-progs 4.7 and kernel 4.7.0. > > I googled a bit and found related patchwork > (https://patchwork.kernel.org/patch/9238987/) which seems to workaround > high load in this area and mentions a real solution is proposed but not > yet there. > > Since this affects two machines of mine and backupping my root volume > would take about 80 hours in case I can extrapolate the average rate, > this means btrfs send is unusable to me. > > Can I assume this is a common issue which will be fixed in a later > kernel release (4.8, 4.9) or can I do something to my FS's to workaround > this issue? > > One FS is only two weeks old, the other one now about 1 year. I did some > balancing at some points of time to have more unallocated space for > trimming, > and used duperemove regularly to free space. One FS has skinny extents, > the other has not. The problem is as the patch says, multiple references per extent increases process time geometrically. And dupremove works by doing just that, pointing multiple duplications to the same extents, increasing the reference count per extent, thereby exacerbating the problem on your system, if dupremove is actually finding a reasonable number of duplicates to reflink to the same extents. The other common multi-reflink usage is snapshots, since each snapshot creates another reflink to each extent it snapshots. However, being just a list regular and btrfs user, not a dev, and using neither dedupe nor snapshots nor send/receive in my own use-case, I'm not absolutely sure whether other snapshot references affect send/receive or whether it's only multiple reflinks per sent snapshot. Either way, over a few hundred snapshots per subvolume or a couple thousand snapshots per filesystem, they do seriously affect scaling of balance and fsck, even if they don't actually affect send/receive so badly. So a workaround would be reducing your duperemove usage and possibly rewriting (for instance via defrag) the deduped files to kill the multiple reflinks. Or simply delete the additional reflinked copies, if your use-case allows it. And thin down your snapshot retention if you have many snapshots per subvolume. With the geometric scaling issues, thinning to under 300 per subvolume should be quite reasonable in nearly all circumstances, and thinning to under 100 per subvolume may be possible and should result in dramatically reduced scaling issues. Note that the current patch doesn't really workaround the geometric scaling issues or extreme cpu usage bottlenecking send/receive, but rather, addresses the soft lockups problem due to not scheduling often enough to give other threads time to process. You didn't mention problems with soft lockups, so it's likely to be of limited help for the send/receive problem. As for the longer term, yes, it should be fixed, eventually, but keep in mind that btrfs isn't considered fully stable and mature yet, so this sort of problem isn't unexpected and indeed scaling issues like this are known to still be an issue, and while I haven't been tracking that red/ black tree work, in general it can be noted that btrfs fixes for this sort of problem often take rather longer than might be expected, so a fix may be more like a year or two out than a kernel cycle or two out. Unless of course you see otherwise from someone working on this problem specifically, and even then, sometimes the first fix doesn't get it quite right, and the problem may remain for some time as more is learned about the ultimate issue via multiple attempts to fix it. This has happened to the quota code a number of times for instance, as it as turned out to be a /really/ hard problem, with multiple rewrites necessary, such that even now, the practical recommendation is often to either just turn off quotas and not worry about them if you don't need them, or use a more mature filesystem where the quota code is known to be stable and mature, if your use-case depends on them. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majord