Re: btrfs send extremely slow (almost stuck)

2017-04-14 Thread J. Hart

on 30.08.2016 at 02:48 Qu Wenruo wrote :
> Not the first, but although still few.
> There is a xfstest case submitted for it, and even before the test 
case, there are already report from IRC.

> Anyway, I'll add Cc for you after the new IRC patch is out.

Please count me in.

I have this occur when I'm backing up a file server I use to hold 
reflinked incrementals from client machines.  Backing up from clients to 
server is very quick (mere seconds, no incrementals there), but backup 
of the server volume itself is very slow even with limited changes.  
With clone detection enabled, that backup takes nearly seven hours.  
Sending a complete volume to a blank filesystem (so no reflinks are 
present at the destination) is a matter of only a few minutes.


Many thanks to Hermann Schwarzler whose suggestion led me onto this.

J. Hart

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-06 Thread Oliver Freyermuth
Duncan wrote on Mon, 05 Sep 2016 19:14:30 -0700: 
> I had something very similar happen here a few weeks ago, except with my 
> firefox profile dir (I don't run thunderbird, preferring claws-mail, but 
> I do run firefox as my browser).

Indeed, I also note Firefox doing a lot of IO especially if session recovery is 
enabled,
so I can totally imagine this causing similar issues... 

> My use-case does neither snapshots nor send/receive, however, so it was 
> just the single root subvolume (5).  But there was supposedly a file in 
> that dir according to bash's tab-completion, that would neither list, nor 
> rm, which meant the dir couldn't rm -r either.  (Interestingly enough, rm 
> -i asked if I wanted to rm "weird file" whatever, and weird it indeed was!
> )

Sadly, for me there is / was no file at all "visible", neither via tab nor via 
'rm -i'. 

> So I immediately copied all the normal files to a new dir, and deleted 
> the normal files from the problem dir, leaving only the weird one.
> Then I renamed the problem dir in ordered to be able to rename the new 
> dir (with the good files) back to the name firefox expected.

That was exactly my "backup plan" I applied yesterday. In my case, luckily,
I even had a full backup of the profile just a few hours old, so I just took 
that
to replace the folder with a fresh one after renaming it. 

> Then I decided to see what I could do with the renamed dir.  I believe I 
> rebooted (or umount/mount cycled the filesystem) as well.  I think I had 
> to use the magic-sysrq m/remount-ro key as it refused to umount even from 
> systemd emergency mode.  But here's the interesting part.  At least after 
> the rename and a reboot, it *DID* let me delete (using mc) the dir!  I 
> honestly didn't expect it'd let me, but it did.

For me all the shutdowns went fine (the problem must / may have been present 
for weeks, 
I only noticed now that btrfs send finally did something - and errored out) 
- and the problem, sadly, was not fixed after any reboot. 
I guess after all for me it was corruption on the directory itself
(or rather its isize), 
while for you it was some other sort of metadata corruption causing 
a "weirdly behaving" file. 

> The difference, however, is that I didn't have any snapshots/subvolumes 
> or other reflinks to the "weird" file, only the one normal hardlink.  So 
> even if it's the same thing, I'm not sure if it'll work for you given the 
> multiple snapshot reflinks to the file, as it did for me with just the 
> one.

I did at least try to delete all snapshots which could reference that file - 
did not help. 
I also tried running 'btrfs defrag' on that folder, which should have broken up 
any reflinks, 
this also did not help. 

But luckily (as you can see from my other mail) two "btrfs check --repair" 
iterations finally
fixed my issue. I hope the experts can figure out something from my uploaded 
debug info
to prevent such things in the future. 

Thanks a lot in any case for your experience report! 

I hope my "repair experience" from my other mail made from my user's 
perspective may at some point
of time also be of help to you (even though, I hope, you'll never need it). 

Cheers and thanks again, 
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-06 Thread Oliver Freyermuth
Am 06.09.2016 um 04:46 schrieb Qu Wenruo:
> But your idea to locate the inode seems good enough for debugging though.

Based on this I even had another idea which seems to have worked well - and I 
am now also able to provide any additional debug output you may need. 


Since my procedure may be interesting / helpful for other "debugging users", 
I'll shortly outline it here. 
I had enough extra space on an external HDD. I cloned the full btrfs partition 
with 'dd' to an image on this HDD.
I loop'ed that image read-only on another machine, created an overlay-file and 
used device mapper
to get a read-write block device for any experiment (based on the read-only 
image).
Details on that e.g. at 
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
 . 


> In this case, before reverting to a backup, would you please run a "btrfsck 
> check" and paste the output?

Now, I ran 'btrfs check' on that device. I'm using the very fresh btrfs-progs 
4.7.2. 
The output is here:
http://pastebin.com/rMrW40RU
Notably, it claims to have found some other issues, mainly wrong link counts 
and dir isizes, but for various inodes...

Now, I could also safely run 'btrfs check --repair' on this device without any 
risks.
The output from that is here:
http://pastebin.com/XW9ChuqU

Another 'btrfs check' run afterwards now reveals different issues:
http://pastebin.com/TFKJa81e

Now, another repair:
http://pastebin.com/33iqaE9E

Now, finally, btrfs check is happy:
http://pastebin.com/izkERtKp

After mounting, finally (kernel 4.7.2) I see in kernel log:
[12108.696912] BTRFS info (device dm-0): disk space caching is enabled
[12108.713176] BTRFS info (device dm-0): checking UUID tree

I can now delete the "broken" .thunderbird folder on this "repaired" fs.
I can also mount it and write data on it.

Concluding from these results that it should be safe to do the same to my 
original block device with the same btrfs-progs version
I did just that (check, repair, check, repair, check) from a live system 
directly on the machine. 
Up until now, the FS seems to be doing well again - I took the chance to enable 
skinny extents and am now doing a full metadata balance,
saving me about 0.25 % of metadata space. 
So finally, first time in my life, 'btrfs check --repair' did not eat my data! 
:-) 


The cool thing is that now I still have the broken image (extracted with dd) 
around and can play with it to provide you with any debug-info
without having to work directly with the broken FS on the machine itself. 


Now, let's get started on that.

ls -aldi .thunderbird-broken/p6bm45oa.default/
162786 drwx-- 1 olifre olifre 2482  5. Sep 23:07 
.thunderbird-broken/p6bm45oa.default/
As you can see, I had renamed .thunderbird to .thunderbird-broken. The real 
issue is in any case the profile-subfolder within.
So the affected ino is indeed 162786 which also shows up (as one of several 
issues...) in the btrfs check (and repair) output.

> Further more, your btrfs-debug-tree dump should provide more help for this 
> case.

Just to make sure the debug-tree output matches the rest of all the information 
I'm giving you, I re-ran that on the dd'ed image from the broken FS like so:
btrfs-debug-tree -t 442 xmg13.img | sed "s/name:.*//" > debug-tree

I ran the output through xz (or rather, pixz) and here it is:
https://cernbox.cern.ch/index.php/s/imjwqsOFerUklqr/download
I'll probably not keep the file up there forever, but at least for quite some 
days.

If you can think of any other information which may be useful to diagnose the 
underlying issue which caused that corruption
just let me know. I'll keep the image of the broken FS around for a few weeks.

Cheers, 
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-05 Thread Qu Wenruo



At 09/06/2016 05:29 AM, Oliver Freyermuth wrote:

Am 05.09.2016 um 07:21 schrieb Qu Wenruo:

Did you get the half way send stream?


Luckily, yes!


If the send stream has something, please use "--no-data" option to send the 
subvolume again to get the metadata only dump, and upload it for debug.


Also the metadata-only dump fails with the same ioctl error (-2: No such file 
or directory).
So I could only upload the stream up the occurence of that failure...



Also, please paste "btrfs-debug-tree -t " output for debug.
WARN: above "btrfs-debug-tree" command will contain file names.
You could use the following sed to wipe filename:

"btrfs-debug-tree  -t 5 /dev/sda6  | sed "s/name:.*//"


This indeed runs through without failure.


It seems though that "btrfs send --no-data" which contains full metadata 
anyways contains all filenames (just from a quick look with 'strings').
I can probably not remove these without invalidating the stream, though... So 
I'd not like to upload this to some public location.


Not a problem.

You can try this branch of btrfs-progs:
https://github.com/adam900710/btrfs-progs/tree/dump_send_stream

Which adds a new subcommand "btrfs inspect dump-send".
That command will dump all metadata for a send stream, like:
--
./btrfs ins dump-send < /tmp/output
subvol: ./ro_snap   uuid: 
356a747f-b42f-1f4e-911d-fa5259f037f7, transid: 8

chown:  ./ro_snap/  gid: 0, uid: 0
chmod:  ./ro_snap/  mode: 755
utimes: ./ro_snap/
mkdir:  ./ro_snap/o257-7-0
rename: ./ro_snap/o257-7-0  to ./ro_snap/etc
utimes: ./ro_snap/
chown:  ./ro_snap/etc   gid: 0, uid: 0
chmod:  ./ro_snap/etc   mode: 755
utimes: ./ro_snap/etc
mkfile: ./ro_snap/o258-7-0
rename: ./ro_snap/o258-7-0  to ./ro_snap/etc/hostname
..
--

Where /tmp/output is a send stream.

In that case you can mask all your file name.
But your idea to locate the inode seems good enough for debugging though.



However, you gave me an idea. I had a look at the output of running the file created by "btrfs 
send --no-data" piping that through "strings".
This revealed the last files which btrfs send was able to treat before running 
into the ioctl failure.
Indeed, this is my thunderbird profile directory, always a place with a lot of 
activity.

Now the interesting part begins: Since of course I have a backup of this 
directory, I decided to move that profile to another FS and back.
Turns out I can not run
rm -rf ~/.thunderbird
since it claims "directory not empty". Kernel log does no bug-on or OOPS or 
anything like that.

That's reproducible not only in the snapshots, but also in my "home" subvolume 
for this folder.

"stat -c %s" of the supposed-to-be-empty profile directory reveals indeed:
2482


In this case, before reverting to a backup, would you please run a 
"btrfsck check" and paste the output?


Further more, your btrfs-debug-tree dump should provide more help for 
this case.


With btrfs-debug-tree dump, at least we can find what's going wrong and 
causing the rm -rf failure.




So I guess I should refresh my backups soon and either run "btrfs check 
--repair" or, if that fails, redo the FS...
Likely btrfs check --repair will fail for me since (due to duperemove usage) 
I'll for sure also be hit by https://bugzilla.kernel.org/show_bug.cgi?id=155791
since I'm still using 4.7.1 so I'd like to update to 4.7.2 before trying out 
that repair strategy.

I sadly can't do that in the next few days since I actively need the machine in 
question, so I'll rename that folder and restore just that from backup for now.

Is the debug-information still of interest? If so, I can share it (but would 
not post it publicly to the list since many filenames are in there...).
It weighs in at about 2 x 80 MiB after xz compression.


Yes, debug dump is quite helpful.
Better with your .thunderbird inode number. (ls -aldi .thunderbird can 
give the inode number, the first number)


For debug tree filename problem, feel free to wipe the filename with the 
sed pipe I mentioned in previous mail.

IIRC it should wipe all possible filename.



Or is there anything else I can try safely?


Despite debug-tree dump, which is sometimes overkilled, "btrfs check" 
(default in read-only mode) with v4.6.1 will help a lot.


It will locate the direct problem very quickly and save us quite 
sometime to manually check the debug tree dump.

(I assume your send problem is not related to send, but corrupted fs tree)

Although it needs to be run on unmounted fs, so you may need to enter 
single user mode or use a liveCD/USB to do it.


Thanks,
Qu



Thanks a lot in any case and cheers,
Oliver



Thanks,
Qu







--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://

Re: btrfs send extremely slow (almost stuck)

2016-09-05 Thread Duncan
Oliver Freyermuth posted on Mon, 05 Sep 2016 23:29:08 +0200 as excerpted:

> However, you gave me an idea. I had a look at the output of running the
> file created by "btrfs send --no-data" piping that through "strings".
> This revealed the last files which btrfs send was able to treat before
> running into the ioctl failure.
> Indeed, this is my thunderbird profile directory, always a place with a
> lot of activity.
> 
> Now the interesting part begins: Since of course I have a backup of this
> directory, I decided to move that profile to another FS and back.
> Turns out I can not run rm -rf ~/.thunderbird since it claims "directory
> not empty". Kernel log does no bug-on or OOPS or anything like that.
> 
> That's reproducible not only in the snapshots, but also in my "home"
> subvolume for this folder.
> 
> "stat -c %s" of the supposed-to-be-empty profile directory reveals
> indeed:
> 2482
> 
> So I guess I should refresh my backups soon and either run "btrfs check
> --repair" or, if that fails, redo the FS...
> Likely btrfs check --repair will fail for me since (due to duperemove
> usage) I'll for sure also be hit by
> https://bugzilla.kernel.org/show_bug.cgi?id=155791 since I'm still using
> 4.7.1 so I'd like to update to 4.7.2 before trying out that repair
> strategy.
> 
> I sadly can't do that in the next few days since I actively need the
> machine in question, so I'll rename that folder and restore just that
> from backup for now.
> 
> Is the debug-information still of interest? If so, I can share it (but
> would not post it publicly to the list since many filenames are in
> there...).
> It weighs in at about 2 x 80 MiB after xz compression.
> 
> Or is there anything else I can try safely?

I had something very similar happen here a few weeks ago, except with my 
firefox profile dir (I don't run thunderbird, preferring claws-mail, but 
I do run firefox as my browser).

My use-case does neither snapshots nor send/receive, however, so it was 
just the single root subvolume (5).  But there was supposedly a file in 
that dir according to bash's tab-completion, that would neither list, nor 
rm, which meant the dir couldn't rm -r either.  (Interestingly enough, rm 
-i asked if I wanted to rm "weird file" whatever, and weird it indeed was!
)

So I immediately copied all the normal files to a new dir, and deleted 
the normal files from the problem dir, leaving only the weird one.

Then I renamed the problem dir in ordered to be able to rename the new 
dir (with the good files) back to the name firefox expected.

Then I decided to see what I could do with the renamed dir.  I believe I 
rebooted (or umount/mount cycled the filesystem) as well.  I think I had 
to use the magic-sysrq m/remount-ro key as it refused to umount even from 
systemd emergency mode.  But here's the interesting part.  At least after 
the rename and a reboot, it *DID* let me delete (using mc) the dir!  I 
honestly didn't expect it'd let me, but it did.

So I'd try that.  After copying all the good files out and renaming the 
dir out of the way, so you can rename the dir you copied the good files 
into back into place, reboot (or umount and mount again if possible), 
possibly by going to single-user or emergency mode first and using magic 
srq remount-ro to force it, if necessary, before rebooting.

Then try to delete the dir again, and see if it will.

The difference, however, is that I didn't have any snapshots/subvolumes 
or other reflinks to the "weird" file, only the one normal hardlink.  So 
even if it's the same thing, I'm not sure if it'll work for you given the 
multiple snapshot reflinks to the file, as it did for me with just the 
one.

So it might not work at all for you, or might work but you have to delete 
it in each snapshot, or deleting it in one might delete it in all (which 
would be weird, but it's already a weird file we're dealing with, so who 
knows...), I don't know which.  And that of course assumes it's even the 
same basic bug and would behave as it did for me if you had no snapshots.

That was with kernel 4.7.0 (which I'm still running, I'll be upgrading to 
4.8 rcs pretty soon now) I believe.  If not, then it was late in the 4.7 
rc cycle or possibly 4.6.0, but it was definitely not older than that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-05 Thread Oliver Freyermuth
Am 05.09.2016 um 07:21 schrieb Qu Wenruo:
> Did you get the half way send stream?

Luckily, yes! 
 
> If the send stream has something, please use "--no-data" option to send the 
> subvolume again to get the metadata only dump, and upload it for debug.

Also the metadata-only dump fails with the same ioctl error (-2: No such file 
or directory). 
So I could only upload the stream up the occurence of that failure... 

> 
> Also, please paste "btrfs-debug-tree -t " output for debug.
> WARN: above "btrfs-debug-tree" command will contain file names.
> You could use the following sed to wipe filename:
> 
> "btrfs-debug-tree  -t 5 /dev/sda6  | sed "s/name:.*//"

This indeed runs through without failure. 


It seems though that "btrfs send --no-data" which contains full metadata 
anyways contains all filenames (just from a quick look with 'strings'). 
I can probably not remove these without invalidating the stream, though... So 
I'd not like to upload this to some public location. 

However, you gave me an idea. I had a look at the output of running the file 
created by "btrfs send --no-data" piping that through "strings". 
This revealed the last files which btrfs send was able to treat before running 
into the ioctl failure. 
Indeed, this is my thunderbird profile directory, always a place with a lot of 
activity. 

Now the interesting part begins: Since of course I have a backup of this 
directory, I decided to move that profile to another FS and back. 
Turns out I can not run
rm -rf ~/.thunderbird
since it claims "directory not empty". Kernel log does no bug-on or OOPS or 
anything like that. 

That's reproducible not only in the snapshots, but also in my "home" subvolume 
for this folder. 

"stat -c %s" of the supposed-to-be-empty profile directory reveals indeed:
2482

So I guess I should refresh my backups soon and either run "btrfs check 
--repair" or, if that fails, redo the FS... 
Likely btrfs check --repair will fail for me since (due to duperemove usage) 
I'll for sure also be hit by https://bugzilla.kernel.org/show_bug.cgi?id=155791 
since I'm still using 4.7.1 so I'd like to update to 4.7.2 before trying out 
that repair strategy. 

I sadly can't do that in the next few days since I actively need the machine in 
question, so I'll rename that folder and restore just that from backup for now. 

Is the debug-information still of interest? If so, I can share it (but would 
not post it publicly to the list since many filenames are in there...). 
It weighs in at about 2 x 80 MiB after xz compression. 

Or is there anything else I can try safely? 

Thanks a lot in any case and cheers, 
Oliver

> 
> Thanks,
> Qu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-04 Thread Qu Wenruo



At 09/05/2016 05:41 AM, Oliver Freyermuth wrote:

Am 30.08.2016 um 02:48 schrieb Qu Wenruo:

Yes.
And more specifically, it doesn't even affect delta backup.

For shared extents caused by reflink/dedupe(out-of-band or even incoming 
in-band), it will be send as individual files.

For contents, they are all the same, just more space usage.


For those interested, I have now actually tested the btrfs send / btrfs receive 
backup for several subvolumes after applying this patch.
The throughput is finally usable, almost hitting network / IO limits as 
expected - ideal so far!
Also delta seemed fine for the subvolumes for which things worked.

However, I now sadly get (for one of my subvolumes):

send ioctl failed with -2: No such file or directory

at some point during the transfer, it sadly seems to be reproducible.
I do not think it's related to this patch, but of course this makes "btrfs 
send" still unusable to me -
I guess it's not ready for general use just yet.
Is there any information I can easily extract / provide to allow the experts to 
fix this issue?


Did you get the half way send stream?

If the send stream has something, please use "--no-data" option to send 
the subvolume again to get the metadata only dump, and upload it for debug.


Also, please paste "btrfs-debug-tree -t " output for 
debug.

WARN: above "btrfs-debug-tree" command will contain file names.
You could use the following sed to wipe filename:

"btrfs-debug-tree  -t 5 /dev/sda6  | sed "s/name:.*//"

Thanks,
Qu


The kernel log shows nothing.

Thanks a lot,
Oliver





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-09-04 Thread Oliver Freyermuth
Am 30.08.2016 um 02:48 schrieb Qu Wenruo:
> Yes.
> And more specifically, it doesn't even affect delta backup.
> 
> For shared extents caused by reflink/dedupe(out-of-band or even incoming 
> in-band), it will be send as individual files.
> 
> For contents, they are all the same, just more space usage.

For those interested, I have now actually tested the btrfs send / btrfs receive 
backup for several subvolumes after applying this patch. 
The throughput is finally usable, almost hitting network / IO limits as 
expected - ideal so far! 
Also delta seemed fine for the subvolumes for which things worked. 

However, I now sadly get (for one of my subvolumes): 

send ioctl failed with -2: No such file or directory

at some point during the transfer, it sadly seems to be reproducible. 
I do not think it's related to this patch, but of course this makes "btrfs 
send" still unusable to me - 
I guess it's not ready for general use just yet. 
Is there any information I can easily extract / provide to allow the experts to 
fix this issue?
The kernel log shows nothing. 

Thanks a lot, 
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-30 Thread Qu Wenruo



At 08/31/2016 09:35 AM, Jeff Mahoney wrote:

On 8/28/16 10:12 PM, Qu Wenruo wrote:



At 08/29/2016 10:11 AM, Qu Wenruo wrote:



At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:

Dear btrfs experts,

I just tried to make use of btrfs send / receive for incremental
backups (using btrbk to simplify the process).
It seems that on my two machines, btrfs send gets stuck after
transferring some GiB - it's not fully halted, but instead of making
full use of the available I/O, I get something < 500 kiB on average,
which are just some "full speed spikes" with many seconds / minutes of
no I/O in between.

During this "halting", btrfs send eats one full CPU core.
A "perf top" shows this is spent in "find_parent_nodes" and
"__merge_refs" inside the kernel.
I am using btrfs-progs 4.7 and kernel 4.7.0.


Unknown bug, while unfortunately no good idea to solve yet.


Sorry, known bug, not unknown


I'm working on a patch to replace the lists with a pair of trees that
get merged after filling in the missing parents.


Wow, nice.
I was planning to do it but didn't get started yet.

The list is really causing the problem.
Converting to rb_tree should at least reduce the O(n^3)~O(n^4)
to O(n^2logn).


While the backref walk call in the loop of iterating every file extents 
is never a good idea for me, I'll still try to fix at the send side as 
an RFC patch too.


Thanks,
Qu


The reflink xfstests don't complete, ever.  btrfs/130 triggers soft
lockups but do complete eventually -- and that's only with ~4k list
elements.

-Jeff





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-30 Thread Jeff Mahoney
On 8/28/16 10:12 PM, Qu Wenruo wrote:
> 
> 
> At 08/29/2016 10:11 AM, Qu Wenruo wrote:
>>
>>
>> At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:
>>> Dear btrfs experts,
>>>
>>> I just tried to make use of btrfs send / receive for incremental
>>> backups (using btrbk to simplify the process).
>>> It seems that on my two machines, btrfs send gets stuck after
>>> transferring some GiB - it's not fully halted, but instead of making
>>> full use of the available I/O, I get something < 500 kiB on average,
>>> which are just some "full speed spikes" with many seconds / minutes of
>>> no I/O in between.
>>>
>>> During this "halting", btrfs send eats one full CPU core.
>>> A "perf top" shows this is spent in "find_parent_nodes" and
>>> "__merge_refs" inside the kernel.
>>> I am using btrfs-progs 4.7 and kernel 4.7.0.
>>
>> Unknown bug, while unfortunately no good idea to solve yet.
> 
> Sorry, known bug, not unknown

I'm working on a patch to replace the lists with a pair of trees that
get merged after filling in the missing parents.

The reflink xfstests don't complete, ever.  btrfs/130 triggers soft
lockups but do complete eventually -- and that's only with ~4k list
elements.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: btrfs send extremely slow (almost stuck)

2016-08-29 Thread Qu Wenruo



At 08/29/2016 06:02 PM, Oliver Freyermuth wrote:

Am 29.08.2016 um 04:11 schrieb Qu Wenruo:

Unknown bug, while unfortunately no good idea to solve yet.

I sent a RFC patch to completely disable shared extent detection, while
got strong objection.

I also submitted some other ideas on fixing it, while still got strong
objection. Objection includes this is a performance problem, not a
function problem and we should focus on function problem first and
postpone such performance problem.

And further more, Btrfs from the beginning of its design, focuses on
fast snapshot creation, and takes backref walk as sacrifice.
So it's not an easy thing to fix.


As a user, I must say, thanks a lot for your work on this!



I don't expect there will be even an agreement on how to fix the problem
in v4.1x.

Fixes in send will lead to obvious speed improvement, while cause
incompatibility or super complex design.
Fixes in backref will lead to a backref rework, which normally comes
with new regression, and we are even unsure if it will really help.

If you just hate the super slow send, and can accept the extra space
usage, please try this RFC patch:

https://patchwork.kernel.org/patch/9245287/


This patch, just as its name, will completely stop same extent(reflink)
detection.
Which will cause more space usage, while it skipped the super time
consuming find_parent_nodes(), it should at least workaround your problem.


If I interpret the code correctly, this only affects "btrfs send", and
only causes "duplication" of previously shared extents, correct?


Yes.
And more specifically, it doesn't even affect delta backup.

For shared extents caused by reflink/dedupe(out-of-band or even incoming 
in-band), it will be send as individual files.


For contents, they are all the same, just more space usage.



Then this is for me (as a user) perfectly fine - btrfs send should run
much faster (< 3 hours instead of unusable 80 hours for my root volume)
and I can just run duperemove on the readonly snapshots at the backup
location later without issues (it's of course some extra I/O on disk and
network, but at least it will be usable).


Nice to hear that.




I have some other idea to fix it with less aggressive idea, while since
there is objection against it, I didn't code it further.

But, since there are *REAL* *WORLD* users reporting such problem, I
think I'd better restart the fix as an RFC.


Thanks a lot, as a user I would certainly appreciate work in this area.

I would not have expected that this really is a known issue,
since I would have thought that btrfs send was commonly used for backup
purposes, and offline deduplication on SSD drives especially on mobile
devices to gain significant amount of space did not seem like an exotic
usecase to me.

So in short, I'm really suprised to be one of the first / few to
complain about this as a user, I did not feel like my usecase was
special or exotic (at least, up to now).


Not the first, but although still few.

There is a xfstest case submitted for it, and even before the test case, 
there are already report from IRC.


Anyway, I'll add Cc for you after the new IRC patch is out.

Thanks,
Qu



Thanks a lot,
Oliver



Thanks,
Qu






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-29 Thread Kai Krakow
Am Sun, 28 Aug 2016 17:41:22 -0400
schrieb james harvey :

> On Sun, Aug 28, 2016 at 12:15 PM, Oliver Freyermuth
>  wrote:
> > For me, this means I have to stay with rsync backups, which are
> > sadly incomplete since special FS attrs like "C" for nocow are not
> > backed up.  
> 
> Should be able to make a script that creates a textfile with lsattr
> for every file.  Then either just leave that file as part of the
> backup in case it's needed some day, or making a corresponding script
> on the backup machine to restore those.

The problem with this idea is that chattr +C will only work on empty
files, so it needs to be applied in the "middle", read: upon creating
the file and before filling it with content.

It would be possible to let a script first create empty files according
to this list and then use "rsync --no-whole-file --inplace" so it will
build upon the empty files instead of its usual behavior to create
files temporarily and then rename them into place. I'd recommend to use
these options anyways if writing to btrfs snapshots to take advantage
of shared extents. Apparently rsync cannot handle sparse files in this
mode (tho there should be a patch to make this possible by using the
hole-punching feature of newer kernels but it makes the rsync protocol
incompatible to unpatched versions AFAIR).

I think borgbackup suffers from the same problem. While in latest
version it seems to support attrs, it does apply them after filling the
files with contents (as most programs do, also attributes like mtime,
owner etc are applied after closing the written file, for obvious
reasons). This simply doesn't work for +C on btrfs.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-29 Thread Oliver Freyermuth
Am 29.08.2016 um 04:11 schrieb Qu Wenruo:
> Unknown bug, while unfortunately no good idea to solve yet.
> 
> I sent a RFC patch to completely disable shared extent detection, while
> got strong objection.
> 
> I also submitted some other ideas on fixing it, while still got strong
> objection. Objection includes this is a performance problem, not a
> function problem and we should focus on function problem first and
> postpone such performance problem.
> 
> And further more, Btrfs from the beginning of its design, focuses on
> fast snapshot creation, and takes backref walk as sacrifice.
> So it's not an easy thing to fix.

As a user, I must say, thanks a lot for your work on this!

> 
> I don't expect there will be even an agreement on how to fix the problem
> in v4.1x.
> 
> Fixes in send will lead to obvious speed improvement, while cause
> incompatibility or super complex design.
> Fixes in backref will lead to a backref rework, which normally comes
> with new regression, and we are even unsure if it will really help.
> 
> If you just hate the super slow send, and can accept the extra space
> usage, please try this RFC patch:
> 
> https://patchwork.kernel.org/patch/9245287/
> 
> 
> This patch, just as its name, will completely stop same extent(reflink)
> detection.
> Which will cause more space usage, while it skipped the super time
> consuming find_parent_nodes(), it should at least workaround your problem.

If I interpret the code correctly, this only affects "btrfs send", and
only causes "duplication" of previously shared extents, correct?

Then this is for me (as a user) perfectly fine - btrfs send should run
much faster (< 3 hours instead of unusable 80 hours for my root volume)
and I can just run duperemove on the readonly snapshots at the backup
location later without issues (it's of course some extra I/O on disk and
network, but at least it will be usable).

> I have some other idea to fix it with less aggressive idea, while since
> there is objection against it, I didn't code it further.
> 
> But, since there are *REAL* *WORLD* users reporting such problem, I
> think I'd better restart the fix as an RFC.

Thanks a lot, as a user I would certainly appreciate work in this area.

I would not have expected that this really is a known issue,
since I would have thought that btrfs send was commonly used for backup
purposes, and offline deduplication on SSD drives especially on mobile
devices to gain significant amount of space did not seem like an exotic
usecase to me.

So in short, I'm really suprised to be one of the first / few to
complain about this as a user, I did not feel like my usecase was
special or exotic (at least, up to now).

Thanks a lot,
Oliver


> Thanks,
> Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-28 Thread Qu Wenruo



At 08/29/2016 10:11 AM, Qu Wenruo wrote:



At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:

Dear btrfs experts,

I just tried to make use of btrfs send / receive for incremental
backups (using btrbk to simplify the process).
It seems that on my two machines, btrfs send gets stuck after
transferring some GiB - it's not fully halted, but instead of making
full use of the available I/O, I get something < 500 kiB on average,
which are just some "full speed spikes" with many seconds / minutes of
no I/O in between.

During this "halting", btrfs send eats one full CPU core.
A "perf top" shows this is spent in "find_parent_nodes" and
"__merge_refs" inside the kernel.
I am using btrfs-progs 4.7 and kernel 4.7.0.


Unknown bug, while unfortunately no good idea to solve yet.


Sorry, known bug, not unknown

Thanks,
Qu


I sent a RFC patch to completely disable shared extent detection, while
got strong objection.

I also submitted some other ideas on fixing it, while still got strong
objection. Objection includes this is a performance problem, not a
function problem and we should focus on function problem first and
postpone such performance problem.

And further more, Btrfs from the beginning of its design, focuses on
fast snapshot creation, and takes backref walk as sacrifice.
So it's not an easy thing to fix.



I googled a bit and found related patchwork
(https://patchwork.kernel.org/patch/9238987/) which seems to
workaround high load in this area and mentions a real solution is
proposed but not yet there.

Since this affects two machines of mine and backupping my root volume
would take about 80 hours in case I can extrapolate the average rate,
this means btrfs send is unusable to me.

Can I assume this is a common issue which will be fixed in a later
kernel release (4.8, 4.9) or can I do something to my FS's to
workaround this issue?


I don't expect there will be even an agreement on how to fix the problem
in v4.1x.

Fixes in send will lead to obvious speed improvement, while cause
incompatibility or super complex design.
Fixes in backref will lead to a backref rework, which normally comes
with new regression, and we are even unsure if it will really help.

If you just hate the super slow send, and can accept the extra space
usage, please try this RFC patch:

https://patchwork.kernel.org/patch/9245287/


This patch, just as its name, will completely stop same extent(reflink)
detection.
Which will cause more space usage, while it skipped the super time
consuming find_parent_nodes(), it should at least workaround your problem.

I have some other idea to fix it with less aggressive idea, while since
there is objection against it, I didn't code it further.

But, since there are *REAL* *WORLD* users reporting such problem, I
think I'd better restart the fix as an RFC.

Thanks,
Qu


One FS is only two weeks old, the other one now about 1 year. I did
some balancing at some points of time to have more unallocated space
for trimming,
and used duperemove regularly to free space. One FS has skinny
extents, the other has not.

Mount options are "rw,noatime,compress=zlib,ssd,space_cache,commit=120".

Apart from that: No RAID or any other special configuration involved.

Cheers and any help appreciated,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-28 Thread Qu Wenruo



At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:

Dear btrfs experts,

I just tried to make use of btrfs send / receive for incremental backups (using 
btrbk to simplify the process).
It seems that on my two machines, btrfs send gets stuck after transferring some 
GiB - it's not fully halted, but instead of making full use of the available I/O, 
I get something < 500 kiB on average,
which are just some "full speed spikes" with many seconds / minutes of no I/O 
in between.

During this "halting", btrfs send eats one full CPU core.
A "perf top" shows this is spent in "find_parent_nodes" and "__merge_refs" 
inside the kernel.
I am using btrfs-progs 4.7 and kernel 4.7.0.


Unknown bug, while unfortunately no good idea to solve yet.

I sent a RFC patch to completely disable shared extent detection, while 
got strong objection.


I also submitted some other ideas on fixing it, while still got strong 
objection. Objection includes this is a performance problem, not a 
function problem and we should focus on function problem first and 
postpone such performance problem.


And further more, Btrfs from the beginning of its design, focuses on 
fast snapshot creation, and takes backref walk as sacrifice.

So it's not an easy thing to fix.



I googled a bit and found related patchwork 
(https://patchwork.kernel.org/patch/9238987/) which seems to workaround high 
load in this area and mentions a real solution is proposed but not yet there.

Since this affects two machines of mine and backupping my root volume would 
take about 80 hours in case I can extrapolate the average rate, this means 
btrfs send is unusable to me.

Can I assume this is a common issue which will be fixed in a later kernel 
release (4.8, 4.9) or can I do something to my FS's to workaround this issue?


I don't expect there will be even an agreement on how to fix the problem 
in v4.1x.


Fixes in send will lead to obvious speed improvement, while cause 
incompatibility or super complex design.
Fixes in backref will lead to a backref rework, which normally comes 
with new regression, and we are even unsure if it will really help.


If you just hate the super slow send, and can accept the extra space 
usage, please try this RFC patch:


https://patchwork.kernel.org/patch/9245287/


This patch, just as its name, will completely stop same extent(reflink) 
detection.
Which will cause more space usage, while it skipped the super time 
consuming find_parent_nodes(), it should at least workaround your problem.


I have some other idea to fix it with less aggressive idea, while since 
there is objection against it, I didn't code it further.


But, since there are *REAL* *WORLD* users reporting such problem, I 
think I'd better restart the fix as an RFC.


Thanks,
Qu


One FS is only two weeks old, the other one now about 1 year. I did some 
balancing at some points of time to have more unallocated space for trimming,
and used duperemove regularly to free space. One FS has skinny extents, the 
other has not.

Mount options are "rw,noatime,compress=zlib,ssd,space_cache,commit=120".

Apart from that: No RAID or any other special configuration involved.

Cheers and any help appreciated,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-28 Thread james harvey
On Sun, Aug 28, 2016 at 12:15 PM, Oliver Freyermuth
 wrote:
> For me, this means I have to stay with rsync backups, which are sadly 
> incomplete since special FS attrs
> like "C" for nocow are not backed up.

Should be able to make a script that creates a textfile with lsattr
for every file.  Then either just leave that file as part of the
backup in case it's needed some day, or making a corresponding script
on the backup machine to restore those.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-28 Thread Oliver Freyermuth
(sorry if my Message-ID header is missing, I am not subscribed to the mailing 
list, 
so I reply using mail-archive)

> So a workaround would be reducing your duperemove usage and possibly 
> rewriting (for instance via defrag) the deduped files to kill the 
> multiple reflinks.  Or simply delete the additional reflinked copies, if 
> your use-case allows it.

Sadly, I need the extra space (that's why I was using duperemove in the first 
place)
and can not delete all duped copies. These are mainly several checkouts of 
different repositories
with partially common (partially large binary) content. 

> And thin down your snapshot retention if you have many snapshots per 
> subvolume.  With the geometric scaling issues, thinning to under 300 per 
> subvolume should be quite reasonable in nearly all circumstances, and 
> thinning to under 100 per subvolume may be possible and should result in 
> dramatically reduced scaling issues.

In addition, I have only ~ 5 snapshots for both those volumes, which should 
certainly not be too much. 


So in short, this just means btrfs send is (still) unusable
for filesystems which rely on the offline dedupe feature (in the past 'btrfs 
send' got broken
after dedupe which got fixed, now it is just extremely slow). 


For me, this means I have to stay with rsync backups, which are sadly 
incomplete since special FS attrs
like "C" for nocow are not backed up. 


Cheers and thanks for your reply, 
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-28 Thread Duncan
Oliver Freyermuth posted on Sun, 28 Aug 2016 05:38:00 +0200 as excerpted:

> Dear btrfs experts,
> 
> I just tried to make use of btrfs send / receive for incremental backups
> (using btrbk to simplify the process).
> It seems that on my two machines, btrfs send gets stuck after
> transferring some GiB - it's not fully halted, but instead of making
> full use of the available I/O, I get something < 500 kiB on average,
> which are just some "full speed spikes" with many seconds / minutes of
> no I/O in between.
> 
> During this "halting", btrfs send eats one full CPU core.
> A "perf top" shows this is spent in "find_parent_nodes" and
> "__merge_refs" inside the kernel.
> I am using btrfs-progs 4.7 and kernel 4.7.0.
> 
> I googled a bit and found related patchwork
> (https://patchwork.kernel.org/patch/9238987/) which seems to workaround
> high load in this area and mentions a real solution is proposed but not
> yet there.
> 
> Since this affects two machines of mine and backupping my root volume
> would take about 80 hours in case I can extrapolate the average rate,
> this means btrfs send is unusable to me.
> 
> Can I assume this is a common issue which will be fixed in a later
> kernel release (4.8, 4.9) or can I do something to my FS's to workaround
> this issue?
> 
> One FS is only two weeks old, the other one now about 1 year. I did some
> balancing at some points of time to have more unallocated space for
> trimming,
> and used duperemove regularly to free space. One FS has skinny extents,
> the other has not.

The problem is as the patch says, multiple references per extent 
increases process time geometrically.

And dupremove works by doing just that, pointing multiple duplications to 
the same extents, increasing the reference count per extent, thereby 
exacerbating the problem on your system, if dupremove is actually finding 
a reasonable number of duplicates to reflink to the same extents.

The other common multi-reflink usage is snapshots, since each snapshot 
creates another reflink to each extent it snapshots.  However, being just 
a list regular and btrfs user, not a dev, and using neither dedupe nor 
snapshots nor send/receive in my own use-case, I'm not absolutely sure 
whether other snapshot references affect send/receive or whether it's 
only multiple reflinks per sent snapshot.  Either way, over a few hundred 
snapshots per subvolume or a couple thousand snapshots per filesystem, 
they do seriously affect scaling of balance and fsck, even if they don't 
actually affect send/receive so badly.

So a workaround would be reducing your duperemove usage and possibly 
rewriting (for instance via defrag) the deduped files to kill the 
multiple reflinks.  Or simply delete the additional reflinked copies, if 
your use-case allows it.

And thin down your snapshot retention if you have many snapshots per 
subvolume.  With the geometric scaling issues, thinning to under 300 per 
subvolume should be quite reasonable in nearly all circumstances, and 
thinning to under 100 per subvolume may be possible and should result in 
dramatically reduced scaling issues.

Note that the current patch doesn't really workaround the geometric 
scaling issues or extreme cpu usage bottlenecking send/receive, but 
rather, addresses the soft lockups problem due to not scheduling often 
enough to give other threads time to process.  You didn't mention 
problems with soft lockups, so it's likely to be of limited help for the 
send/receive problem.

As for the longer term, yes, it should be fixed, eventually, but keep in 
mind that btrfs isn't considered fully stable and mature yet, so this 
sort of problem isn't unexpected and indeed scaling issues like this are 
known to still be an issue, and while I haven't been tracking that red/
black tree work, in general it can be noted that btrfs fixes for this 
sort of problem often take rather longer than might be expected, so a fix 
may be more like a year or two out than a kernel cycle or two out.

Unless of course you see otherwise from someone working on this problem 
specifically, and even then, sometimes the first fix doesn't get it quite 
right, and the problem may remain for some time as more is learned about 
the ultimate issue via multiple attempts to fix it.  This has happened to 
the quota code a number of times for instance, as it as turned out to be 
a /really/ hard problem, with multiple rewrites necessary, such that even 
now, the practical recommendation is often to either just turn off quotas 
and not worry about them if you don't need them, or use a more mature 
filesystem where the quota code is known to be stable and mature, if your 
use-case depends on them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majord