2016-07-26 5:49 GMT+09:00 Chris Murphy <li...@colorremedies.com>:
> On Mon, Jul 25, 2016 at 1:25 AM, Kurt Seo <tiger.anam.mana...@gmail.com>
> wrote:
>> Hi all
>>
>>
>> I am currently running a project for building servers with btrfs.
>> Purposes of servers are exporting disk images through iscsi targets
>> and disk images are generated from btrfs subvolume snapshot.
>
> How is the disk image generated from Btrfs subvolume snapshot?
>
> On what file system is the disk image stored?
>
>
When i create empty original disk image on btrfs. I do like :
btrfs sub create /mnt/test/test_disk
chattr -R +C /mnt/test/test_disk
fallocate -l 50G /mnt/test/test_disk/master.img
then do fdisk things for partitioning image.
And the file system of disk image is ntfs. all clients are Windows.
i create snapshots from original subvolume when clients boot up using
'btrfs sub snap'.
The reason i stored disk image in subvolume is that subvolume way is
faster than 'cp --reflink' and i needed to disable cow, so 'cp
--reflink' became unavailable anyway.
>> Maximum number of clients is 500 and each client uses two snapshots of
>> disk images. the first disk image's size is about 50GB and second one
>> is about 1.5TB.
>> Important thing is that the original 1.5TB disk image is mounted with
>> loop device and modified real time - eg. continuously downloading
>> torrents in it.
>> snapshots are made when clients boot up and deleted when they turned off.
>>
>> So server has two original disk images and about a thousand of
>> snapshots in total.
>> I made a list of factors affect server's performance and stability.
>>
>> 1. Raid Configuration - Mdadm raid vs btrfs raid, configuration and
>> options for them.
>> 2. How to format btrfs - nodesize, features
>> 3. Mount options - nodatacow and compression things.
>> 4. Kernel parameter tuning.
>> 5. Hardware specification.
>>
>>
>> My current setups are
>>
>> 1. mdadm raid10 with 1024k chunk and 12 disks of 512GB ssd.
>> 2. nodesize 32k and nothing else.
>> 3. nodatacow, noatime, nodiratime, nospace_cache, ssd, compress=lzo
>> 4. Ubuntu with 4.1.27 kernel without additional configurations.
>> 5.
>> CPU : Xeon E3- 1225v2 Quad Core 3.2Ghz
>> RAM : 2 x DDR3 8GB ECC ( total 16GB)
>> NIC : 2 x 10Gbe
>>
>>
>> The result of test so far is
>>
>> 1. btrfs-transaction and btrfs-cleaner assume cpu regularly.
>> 2. When cpu is busy for those processes, creating snapshots takes long.
>> 3. The performance is getting slow as time goes by.
>>
>>
>> So if there are any wrong and missing configurations , can you suggest some?
>> like i need to increase physical memory.
>>
>> Any idea would help me a lot.
>
> Off hand it sounds like you have a file system inside a disk image
> which itself is stored on a file system. So there's two file systems.
> And somehow you have to create the disk image from a subvolume, which
> isn't going to be very fast. And also something I read recently on the
> XFS list makes me wonder if loop devices are production worthy.
>
> I'd reconsider the layout for any one of these reasons alone.
>
>
> 1. mdadm raid10 + LVM thinp + either XFS or Btrfs. The first LV you
> create is the one the host is constantly updating. You can use XFS
> freeze to freeze the file system, take the snapshot, and then release
> the freeze. You now have the original LV which is still being updated
> by the host, but you have a 2nd LV that itself can be exported as an
> iSCSI target to a client system. There's no need to create a disk
> image, so the creation of the snapshot and iSCSI target is much
> faster.
>
> 2. Similar to the above, but you could make the 2nd LV (the snapshot)
> a Btrfs seed device that all of the clients share, and they are each
> pointed to their own additional LV used for the Btrfs sprout device.
>
> The issue I had a year ago with LVM thin provisioning is when the
> metadata pool gets full, the entire VG implodes very badly and I
> didn't get any sufficient warnings in advance that the setup was
> suboptimal, or that it was about to run out of metadata space, and it
> wasn't repairable. But it was just a test. I haven't substantially
> played with LVM thinp with more than a dozen snapshots. But LVM being
> like emacs, you've got a lot of levers to adjust things depending on
> the workload whereas Btrfs has very few.
>
> Therefore, the plus of the 2nd option is you're only using a handful
> of LVM thinp snapshots. And you're also not really using Btrfs
> snapshots either, you're using the union-like fs feature of the
> seed-sprout ca