Hello. So I split the job in 2 tasks as per your suggestion. I create the differential snapshot with btrfs send and save it on SSD - so far this is very efficient and the sending happens almost at full SSD speed.
When I try to "receive" the snapshot on the HDD - the speed is just as low as before (as when I do ionice'd pipe). No ionice is used. The hdd raw speed is, according to hdparm: Timing cached reads: 15848 MB in 2.00 seconds = 7928.82 MB/sec Timing buffered disk reads: 310 MB in 3.01 seconds = 103.02 MB/sec And I have more than 100Gb free space on it, but the speed is still low. So, as you mentioned it - I might be dealing with a very fragmented system. Now there are some conclusions and questions: 1. The btrfs send is out of question - it works great with or without ionice. 2. The receive is slow no matter what I do, even if run alone. (as for the what kind of data is being sent, i sent the snapshot of / and /home and both are slow for btrfs receive) 3. How to check how fragmented the filesystem is? (i.e. i want to know if this is the real cause) 4. How to defragment all those read-only snapshots without breaking the compatibility with differential btrfs send. (if i understand it correctly the parent snapshot must be the same on source and destination, is this correct?) 5. Will making those snapshots writable, defragmenting them and re-snapshoting them as read-only break compatibility with btrfs differential send? E.g. will I still be able to "btrfs receive" a differential snapshot after defragmentation? Also for your suggestion to do it in a break - I would have done it but it sometimes takes hours to sync, thats why i tried to ionice it so I can work while it runs. Thank you a lot for your explanations and effort! On 15.12.2014 10:49, Robert White wrote: > On 12/14/2014 11:41 PM, Nick Dimov wrote: >> Hi, thanks for the answer, I will answer between the lines. >> >> On 15.12.2014 08:45, Robert White wrote: >>> On 12/14/2014 08:50 PM, Nick Dimov wrote: >>>> Hello everyone! >>>> >>>> First, thanks for amazing work on btrfs filesystem! >>>> >>>> Now the problem: >>>> I use a ssd as my system drive (/dev/sda2) and use daily snapshots on >>>> it. Then, from time to time, i sync those on HDD (/dev/sdb4) by using >>>> btrfs send / receive like this: >>>> >>>> ionice -c3 btrfs send -p /ssd/previously_synced_snapshot >>>> /ssd/snapshot-X >>>> | pv | btrfs receive /hdd/snapshots >>>> >>>> I use pv to measure speed and i get ridiculos speeds like 5-200kiB/s! >>>> (rarely it goes over 1miB). However if i replace the btrfs receive >>>> with >>>> cat >/dev/null - the speed is 400-500MiB/s (almost full SSD speed) >>>> so I >>>> understand that the problem is the fs on the HDD... Do you have any >>>> idea >>>> of how to trace this problem down? >>> >>> >>> You have _lots_ of problems with that above... >>> >>> (1) your ionice is causing the SSD to stall the send every time the >>> receiver does _anything_. >> I will try to remove completely ionice - but them my system becomes >> irresponsive :( > > Yep, see below. > > Then again if it only goes bad for a minute or two, then just launch > the backup right as you go for a break. > >>> >>> (1a) The ionice doesn't apply to the pipeline, it only applies to the >>> command it proceeds. So it's "ionice -c btrfs send..." then pipeline >>> then "btrfs receive" at the default io scheduling class. You need to >>> specify it twice, or wrap it all in a script. >>> >>> ionice -c 3 btrfs send -p /ssd/parent /ssd/snapshot-X | >>> ionice -c 3 btrfs receive /hdd/snapshots >> This is usually what i do but I wanted to show that there is no throtle >> on the receiver. (i tested it with and without - the result is the same) >>> >>> (1b) Your comparison case is flawed because cat >/dev/null results in >>> no actual IO (e.g. writing to dev-null doesn't transfer any data >>> anywhere, it just gets rubber-stamped okay at the kernel method level). >> This was only an intention to show that the sender itself is OK. > > I understood why you did it, I was just trying to point out that since > there was no other IO competing with the btrfs send, it would give you > are really outrageously false positive. Particularly if you always > used ionice. > >>> >>> (2) You probably get negative-to-no value from using ionice on the >>> sending side, particularly since SSDs don't have physical heads to >>> seek around. >> yeah in theory it should be like this, but in practice on my system - >> when i use no ionice my system becomes very unresponsive (ubuntu 14.10). > > What all is in the snapshot? Is it your whole system or just /home or > what? e.g. what are your subvolume boundaries if any? > > btrfs send is very efficent, but that efficency means that it can > rifle through a heck of a lot of the parent snapshot and decide it > doesn't need sending, and it can do so very fast, and that can be a > huge hit on other activities. If most of your system doesn't change > between snapshots the send will plow through your disk yelling "nope" > and "skip this" like a shopper in a black firday riot. > > >>> (2a) The value of nicing your IO is trivial on the actual SATA buss, >>> the real value is only realized on a rotating media where the cost of >>> interrupting other-normal-process is very high because you have to >>> seek the heads way over there---> when other-normal-process needs them >>> right here. >>> >>> (2b) Any pipeline will naturally throttle a more-left-end process to >>> wait for a more-right-end process to read the data. The default buffer >>> is really small, living in the neighborhood of "MAX_PIPE" so like 5k >>> last time I looked. If you want to throttle this sort of transfer just >>> throttle the writer. So.. >>> >>> btrfs send -p /ssd/parent /ssd/snapshot-X | >>> ionice -c 3 btrfs receive /hdd/snapshots >>> >>> (3) You need to find out if you are treating things nicely on >>> /hdd/snapshots. If you've hoarded a lot of snapshots on there, or your >>> snapshot history is really deep, or you are using the drive for other >>> purposes that are unfriendly to large allocations, or you've laid out >>> a multi-drive array poorly, then it may need some maintenance. >> Yes this is what I suspect too, that the system is too fragmented. I >> have about 15 snapshots now, but new snapshots are created and older >> ones are deleted, is it possible that this caused the problem? >> is there a way to tell how badly the file system is fragmented? > > Fifteen snapshots is fine. There have been some people on here taking > snapshots every hour and keeping them for months. That gets excessive. > As long as there is a reasonable amount of free space and you don't > have weeks of hourly snapshots hanging around, this shouldn't be an > issue. > >>> >>> (3a) Test the raw receive throughput if you have the space to share. >>> To do this save the output of btrfs send to a file with -f some_file. >>> Then run the receive with -f some_file. Ideally some_file will be on >>> yet a third media, but it's okay if its on /hdd/snapshot somewhere. >>> Watch the output of iotop or your favorite graphical monitoring >>> daemon. If the write throughput is suspiciously low you may be dealing >>> with a fragmented filesystem. >> Great idea. Will try this. >>> >>> (3b) If /hdd/snapshots is a multi-device filesystem and you are using >>> "single" for data extents, try switching to RAID0. It's just as >>> _unsafe_ as "single" but its distributed write layout will speed up >>> your storage. >> Its single device. >>> >>> (4) If your system is busy enough that you really need the ionice, you >>> likely just need to really re-think your storage layouts and whatnot. >> Well, its a laptop :) and i'm not doing anything when the sync happens. >> I do ionice because without it - the system becomes unresponsive and I >> can't even browse the internet (it just freezes for 20 seconds or so). >> But this has to do with the sender somehow... (probably saturates the >> SATA throughput at 500mb/s?) > > On a laptop, yea, that's probably gonna happen way more than on a > server of some sort. > > There are a lot of interractions between the "hot" parts of programs, > the way programs are brought into memory with mmap(), and how the disk > cache works. When you start hoovering things up off your hard disk, > particularly while send is comparing the snapshots and finding what > it's _not_ going to send (such as all of /bin /usr/bin /lib /usr/lib > etc) that high performance drive with that high performance bus will > just muscle-aside significant parts of your browser and all the other > "user facing" stuff. > > Then you have to get back in line to re-fetch the stuff that just got > dropped from memory when you go back to your browser or whatever. > > With a fast SSD and fast SATA bus you _are_ going to feel the send if > its full speed. Your system is going to be _busy_. > > But that's a separate thing from your question about effective > throughput. The pipeline you gave wiht both ends ioniced will turn > into a dainty little tea-party with each actor walking in lock-step > and repeatedly saying "no, after you" and then waiting for the other > to finish. > > Check out the ionice page description of the Idle scheduler... > > A program running with idle io priority will only get disk time when > no other program has asked for disk io for a defined *grace* *period*. > The impact of idle io processes on normal system activity should be > zero. This scheduling class does not take a priority argument. > Presently, this scheduling class is permitted for an ordinary user > (since kernel 2.6.25). > > Grace Period. Think about those two words... > > So btrfs send goes out and tells receive to create a file named XXX, > as soon as receive acts on that message btrfs send _stops_ _dead_, > waits for it to finish, waits for the grace period, _then_ does its > next thing. > > If they are _both_ niced then they _both_ wait for eachother with a > little grace period before the other gets to go. > > That will take _all_ the time... So much so that your darn right, the > send/receive will _not_ get the chance to effect your browser. > > Oh, and every cookie your browser writes will barge through and stop > them both. > > So yeah... _slow_... really, really, slow with two ionice processes in > a pipeline. > > I'd just leave off the ionice and schedule my snapshot resends for > right before I take a bathroom break... 8-) > > >>> (5) Remember to evaluate the secondary system effects. For example If >>> /hdd/snapshots is really a USB attached external storage unit, make >>> sure it's USB3 and so is the port you're plugging it into. Make sure >>> you aren't paging/swapping in the general sense (or just on the cusp >>> of doing so) as both of the programs are going to start competing with >>> your system for buffer cache and whatnot. A "nearly busy" system can >>> be pushed over the edge into thrashing by adding two large-IO tasks >>> like these. >>> >>> (5b) Give /hdd/snapshots the once-over with smartmontools, especially >>> if it's old, to make sure its not starting to have read/write retry >>> delays. Old disks can get "slower" before the get all "failed". >>> >>> And remember, you may not be able to do squat about your results. If >>> you are I/O bound, and you just _can't_ bear to part wiht some subset >>> of your snapshot hoard for any reason (e.g. I know some government >>> programs with hellish retention policies), then you might just be >>> living the practical outcome of your policies and available hardware. >>> >>> >> Thanks again for the answer I will try to do the tests described here >> and get back. >> Cheers! >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html