A small update... Original (long) message: https://www.spinics.net/lists/linux-btrfs/msg64446.html
On 04/08/2017 10:19 PM, Hans van Kranenburg wrote: > [...] > > == But! The Meta Mummy returns! == > > After changing to nossd, another thing happened. The expiry process, > which normally takes about 1.5 hour to remove ~2500 subvolumes (keeping > it queued up to a 100 orphans all the time), suddenly took the entire > rest of the day, not being done before the nightly backups had to start > again at 10PM... > > And the only thing it seemed to do is writing, writing, writing 100MB/s > all day long. This behaviour was observed with a 4.7.5 linux kernel. When running 4.9.25 now with -o nossd, this weird behaviour is gone. I have no idea what change between 4.7 and 4.9 is responsible for this, but it's good. > == So, what do we want? ssd? nossd? == > > Well, both don't do it for me. I want my expensive NetApp disk space to > be filled up, without requiring me to clean up after it all the time > using painful balance actions and I want to quickly get rid of old > snapshots. > > So currently, there's two mount -o remount statements before and after > doing the expiries... With 4.9+ now, it stays on nossd for sure, everywhere. :) I keep doing daily btrfs-heatmap pictures, here's a nice timelapse of Feb 22 until May 26th. One picture per day. https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-05-28-btrfs-nossd-whoa.mp4 These images use --sort virtual, so the block groups jump around a bit because of the free-space-fragmentation-level-score-based btrfs balance that I did for a few weeks. Total fs size is close to 40TiB. At 17 seconds into the movie, I switched over to -o nossd. The effect is very clearly visible. Suddenly the filesystem starts filling up all empty space, starting at the beginning of the virtual address space. In the last few months the amount of allocated but unused space went down from about 6 TiB to a bit more than 2 TiB now, and it's still decreasing every day. \o/ This actually means that forcing -o nossd solved the main headache and cause of babysitting requirements when using btrfs that I have been experiencing from the very beginning of trying it... By the way, being able to use only nossd only is also a big improvement for the (a few dozen) smaller filesystems that we use with replication for DR purposes (yay, btrbk). We don't have to look around and respond to alerts all the time any more to see which filesystem is choking itself to death today and then rescue it with btrfs balance, and the snapshot and send/receive schedule and expiry doesn't cause abnormal write IO any more. \o/ > [...] > > == Work to do == > > The next big change on this system will be to move from the 4.7 kernel > to the 4.9 LTS kernel and Debian Stretch. After starting to upgrade other btrfs filesystems to use kernel 4.9 in the last few weeks (including the smaller backup servers), I did the biggest one today. It's running 4.9.25 now, or Debian 4.9.25-1~bpo8+1 to be exact. Currently it's working its way through the nightlies, looking good. > Note that our metadata is still DUP, and it doesn't have skinny extent > tree metadata yet. It was originally created with btrfs-progs 3.17, and > when we realized we should have single it was too late. I want to change > that and see if I can convert on a NetApp clone. This should reduce > extent tree metadata size by maybe more than 60% and whoknowswhat will > happen to the abhorrent write traffic. Yeah, blabla... Converting metadata from DUP to single is a big no go with btrfs balance, that's what I clearly got figured out now. > Before switching over to the clone as live backup server, all missing > snapshots can be rsynced over from the live backup server. Using snapshot/clone functionality of our NetApp storage, I did the move from 4.7 to 4.9 in the last two days. Since mounting with 4.9 requires a rebuild of the free space tree (and since I didn't feel like hacking the feature bit in instead), this wasn't going to be a quick maintenance action. Two days ago I cloned the luns that make up the (now) 40TiB filesystem and did the skinny-metadata and free space tree changes, and also cleaned out the free space cache v1 (byebye..) -# time btrfsck --clear-space-cache v2 /dev/xvdb Clear free space cache v2 free space cache v2 cleared real 10m47.854s user 0m17.200s sys 0m11.040s -# time btrfsck --clear-space-cache v1 /dev/xvdb Clearing free space cache Free space cache cleared real 195m8.970s user 161m32.380s sys 24m23.476s ^^notice the cpu usage... -# time btrfstune -x /dev/xvdb real 17m4.647s user 0m16.856s sys 0m3.944s -# time mount -o noatime,nossd,space_cache=v2 /dev/xvdb /srv/backup real 289m55.671s user 0m0.000s sys 1m11.156s Yeah, random read IO sucks... :| In the two days after, I ran the same expiries as the production backup server was doing, and synced new backup data to the clone. Tonight, just before the nightly run, I swapped the production luns and the clones so the real backup server could quickly continue using the prepared filesystem. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html