2016-07-26 5:49 GMT+09:00 Chris Murphy <li...@colorremedies.com>: > On Mon, Jul 25, 2016 at 1:25 AM, Kurt Seo <tiger.anam.mana...@gmail.com> > wrote: >> Hi all >> >> >> I am currently running a project for building servers with btrfs. >> Purposes of servers are exporting disk images through iscsi targets >> and disk images are generated from btrfs subvolume snapshot. > > How is the disk image generated from Btrfs subvolume snapshot? > > On what file system is the disk image stored? > >
When i create empty original disk image on btrfs. I do like : btrfs sub create /mnt/test/test_disk chattr -R +C /mnt/test/test_disk fallocate -l 50G /mnt/test/test_disk/master.img then do fdisk things for partitioning image. And the file system of disk image is ntfs. all clients are Windows. i create snapshots from original subvolume when clients boot up using 'btrfs sub snap'. The reason i stored disk image in subvolume is that subvolume way is faster than 'cp --reflink' and i needed to disable cow, so 'cp --reflink' became unavailable anyway. >> Maximum number of clients is 500 and each client uses two snapshots of >> disk images. the first disk image's size is about 50GB and second one >> is about 1.5TB. >> Important thing is that the original 1.5TB disk image is mounted with >> loop device and modified real time - eg. continuously downloading >> torrents in it. >> snapshots are made when clients boot up and deleted when they turned off. >> >> So server has two original disk images and about a thousand of >> snapshots in total. >> I made a list of factors affect server's performance and stability. >> >> 1. Raid Configuration - Mdadm raid vs btrfs raid, configuration and >> options for them. >> 2. How to format btrfs - nodesize, features >> 3. Mount options - nodatacow and compression things. >> 4. Kernel parameter tuning. >> 5. Hardware specification. >> >> >> My current setups are >> >> 1. mdadm raid10 with 1024k chunk and 12 disks of 512GB ssd. >> 2. nodesize 32k and nothing else. >> 3. nodatacow, noatime, nodiratime, nospace_cache, ssd, compress=lzo >> 4. Ubuntu with 4.1.27 kernel without additional configurations. >> 5. >> CPU : Xeon E3- 1225v2 Quad Core 3.2Ghz >> RAM : 2 x DDR3 8GB ECC ( total 16GB) >> NIC : 2 x 10Gbe >> >> >> The result of test so far is >> >> 1. btrfs-transaction and btrfs-cleaner assume cpu regularly. >> 2. When cpu is busy for those processes, creating snapshots takes long. >> 3. The performance is getting slow as time goes by. >> >> >> So if there are any wrong and missing configurations , can you suggest some? >> like i need to increase physical memory. >> >> Any idea would help me a lot. > > Off hand it sounds like you have a file system inside a disk image > which itself is stored on a file system. So there's two file systems. > And somehow you have to create the disk image from a subvolume, which > isn't going to be very fast. And also something I read recently on the > XFS list makes me wonder if loop devices are production worthy. > > I'd reconsider the layout for any one of these reasons alone. > > > 1. mdadm raid10 + LVM thinp + either XFS or Btrfs. The first LV you > create is the one the host is constantly updating. You can use XFS > freeze to freeze the file system, take the snapshot, and then release > the freeze. You now have the original LV which is still being updated > by the host, but you have a 2nd LV that itself can be exported as an > iSCSI target to a client system. There's no need to create a disk > image, so the creation of the snapshot and iSCSI target is much > faster. > > 2. Similar to the above, but you could make the 2nd LV (the snapshot) > a Btrfs seed device that all of the clients share, and they are each > pointed to their own additional LV used for the Btrfs sprout device. > > The issue I had a year ago with LVM thin provisioning is when the > metadata pool gets full, the entire VG implodes very badly and I > didn't get any sufficient warnings in advance that the setup was > suboptimal, or that it was about to run out of metadata space, and it > wasn't repairable. But it was just a test. I haven't substantially > played with LVM thinp with more than a dozen snapshots. But LVM being > like emacs, you've got a lot of levers to adjust things depending on > the workload whereas Btrfs has very few. > > Therefore, the plus of the 2nd option is you're only using a handful > of LVM thinp snapshots. And you're also not really using Btrfs > snapshots either, you're using the union-like fs feature of the > seed-sprout capability of Btrfs. The other nice thing is that when the > clients quit, the LV's are removed at an LVM extent level. There is no > need for file system cleanup processes to decrement reference counts > on all the affected extents. So it'd be fast. > > > -- > Chris Murphy Thanks for your answer. Actually i have been trying almost every ways for this project. LVM thin pool is one of them. I tried zfs on linux, too. As you mentioned, when metadata is full, entire lvm pool become unrepairable. So i increased size of metadata LV of thin pool to 1 percent of thin pool. And that problem was gone. Anyway, if lvm is better option than btrfs for my purpose, what about zfs? zfs supports zvol and raid option. Furthermore, i do not need to use loopback device for mounting specific partition in images and devices. So you're saying i need to re-consider using btrfs and look for other options like lvm thin pool. I think it makes sense. i have two more questions. 1. If i move to lvm from btrfs, what about mdadm chunk size? I am still not sure what is the best chunk size for numerous cloned disks. And any recommends options of LVM thin? 2. What about zfs on linux? I think zol is similar with lvm in some ways. It's weird for me that btrfs is not the best choice for me and i am still asking btrfs mailing list. But your advice is helpful. Thank you. Seo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html