Ok, so this is what I did: 1. Copied the sparse 315GB (with 302GB inside) to another server 2. Re-formatted the btrfs partition 3. chattr +C on the parent dir 4. Copied the 315GB file back to the btrfs partition (the file is not sparse any more due to the copying)
This is the end result: root@s4 /opt/drives/ssd # ls -alhs total 316G 16K drwxr-xr-x 1 libvirt-qemu libvirt-qemu 42 Dec 20 07:00 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 315G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 20 09:11 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu 0 Dec 20 06:53 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 316G . root@s4 /opt/drives/ssd # df -h /dev/md3 411G 316G 94G 78% /opt/drives/ssd root@s4 /opt/drives/ssd # btrfs filesystem df /opt/drives/ssd Data, single: total=323.01GiB, used=315.08GiB System, DUP: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.00GiB, used=880.00KiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=16.00MiB, used=0.00 root@s4 /opt/drives/ssd # lsattr ---------------- ./snapshots ---------------C ./disk_208.img As you can see, it looks much better now. The file takes as much space as it should and the Metadata is only 880kb. I will do some writes inside the VM and see if the file grows on the "outside". If everything is ok, it should not. 2014-12-20 5:17 GMT+08:00 Josef Bacik <jba...@fb.com>: > On 12/19/2014 04:10 PM, Josef Bacik wrote: >> >> On 12/18/2014 09:59 AM, Daniele Testa wrote: >>> >>> Hey, >>> >>> I am hoping you guys can shed some light on my issue. I know that it's >>> a common question that people see differences in the "disk used" when >>> running different calculations, but I still think that my issue is >>> weird. >>> >>> root@s4 / # mount >>> /dev/md3 on /opt/drives/ssd type btrfs >>> (rw,noatime,compress=zlib,discard,nospace_cache) >>> >>> root@s4 / # btrfs filesystem df /opt/drives/ssd >>> Data: total=407.97GB, used=404.08GB >>> System, DUP: total=8.00MB, used=52.00KB >>> System: total=4.00MB, used=0.00 >>> Metadata, DUP: total=1.25GB, used=672.21MB >>> Metadata: total=8.00MB, used=0.00 >>> >>> root@s4 /opt/drives/ssd # ls -alhs >>> total 302G >>> 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . >>> 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. >>> 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 >>> disk_208.img >>> 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu 0 Dec 18 10:08 snapshots >>> >>> root@s4 /opt/drives/ssd # du -h >>> 0 ./snapshots >>> 302G . >>> >>> As seen above, I have a 410GB SSD mounted at "/opt/drives/ssd". On >>> that partition, I have one single starse file, taking 302GB of space >>> (max 315GB). The snapshots directory is completely empty. >>> >>> However, for some weird reason, btrfs seems to think it takes 404GB. >>> The big file is a disk that I use in a virtual server and when I write >>> stuff inside that virtual server, the disk-usage of the btrfs >>> partition on the host keeps increasing even if the sparse-file is >>> constant at 302GB. I even have 100GB of "free" disk-space inside that >>> virtual disk-file. Writing 1GB inside the virtual disk-file seems to >>> increase the usage about 4-5GB on the "outside". >>> >>> Does anyone have a clue on what is going on? How can the difference >>> and behaviour be like this when I just have one single file? Is it >>> also normal to have 672MB of metadata for a single file? >>> >> >> Hello and welcome to the wonderful world of btrfs, where COW can really >> suck hard without being super clear why! It's 4pm on a Friday right >> before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to >> use pretty pictures. You have this case to start with >> >> file offset 0 offset 302g >> [-------------------------prealloced 302g extent----------------------] >> >> (man it's impressive I got all that lined up right) >> >> On disk you have 2 things. First your file which has file extents which >> says >> >> inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen >> 302g >> >> and then in the extent tree, who keeps track of actual allocated space >> has this >> >> extent bytenr 123, len 302g, refs 1 >> >> Now say you boot up your virt image and it writes 1 4k block to offset >> 0. Now you have this >> >> [4k][--------------------302g-4k--------------------------------------] >> >> And for your inode you now have this >> >> inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), >> disklen 4k >> inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, >> disklen 302g >> >> and in your extent tree you have >> >> extent bytenr 123, len 302g, refs 1 >> extent bytenr whatever, len 4k, refs 1 >> >> See that? Your file is still the same size, it is still 302g. If you >> cp'ed it right now it would copy 302g of information. But what you have >> actually allocated on disk? Well that's now 302g + 4k. Now lets say >> your virt thing decides to write to the middle, lets say at offset 12k, >> now you have this >> >> inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), >> disklen 4k >> inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen >> 302g >> inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, >> disklen 4k >> inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, >> disklen 302g >> >> and in the extent tree you have this >> >> extent bytenr 123, len 302g, refs 2 >> extent bytenr whatever, len 4k, refs 1 >> extent bytenr notimportant, len 4k, refs 1 >> >> See that refs 2 change? We split the original extent, so we have 2 file >> extents pointing to the same physical extents, so we bumped the ref >> count. This will happen over and over again until we have completely >> overwritten the original extent, at which point your space usage will go >> back down to ~302g. >> >> We split big extents with cow, so unless you've got lots of space to >> spare or are going to use nodatacow you should probably not pre-allocate >> virt images. Thanks, >> > > Sorry should have added a > > tl;dr: Cow means you can in the worst case end up using 2 * filesize - > blocksize of data on disk and the file will appear to be filesize. Thanks, > > Josef > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html