On Tue, Apr 10, 2018 at 3:08 PM, Niels de Vos <nde...@redhat.com> wrote:

> Recently I have been implementing "volume clone" support in Heketi. This
> uses the snapshot+clone functionality from Gluster. In order to create
> snapshots and clone them, it is required to use LVM thin-pools on the
> bricks. This is where my current problem originates....
>
> When there are cloned volumes, the bricks of these volumes use the same
> thin-pool as the original bricks. This makes sense, and allows cloning
> to be really fast! There is no need to copy data from one brick to a new
> one, the thin-pool provides copy-on-write semantics.
>
> Unfortunately it can be rather difficult to estimate how large the
> thin-pool should be when the initial Gluster Volume is created.
> Over-allocation is likely needed, but by how much? It may not be clear
> how many clones there will be made, nor how much % of data will change
> on each of the clones.
>
> A wrong estimate can easily cause the thin-pool to become full. When
> that happens, the filesystem on the bricks will go readonly. Mounting
> the filesystem read-writable may not be possible at all. I've even seen
> /dev entries for the LV getting removed. This makes for a horrible
> Gluster experience, and it can be tricky to recover from it.
>
> In order to make thin-provisioning more stable in Gluster, I would like
> to see integrated monitoring of (thin) LVs and some form of acting on
> crucial events. One idea would be to make the Gluster Volume read-only
> when it detects that a brick is almost out-of-space. This is close to
> what local filesystems do when their block-device is having issues.
>
> The 'dmeventd' process already monitors LVM, and by default writes to
> 'dmesg'. Checking dmesg for warnings is not really a nice solution, so
> maybe we should write a plugin for dmeventd. Possibly something exists
> already what we can use, or take inspiration from.
>
> Please provide ideas, thoughts and any other comments. Thanks!
>

For the oVirt-Gluster integration, where gluster volumes are managed and
consumed as VM image store by oVirt - a feature was added to monitor and
report guaranteed capacity for bricks as opposed to the reported size when
created on thin-provisioned LVs/vdo devices. The feature page provide some
details -
https://ovirt.org/develop/release-management/features/gluster/gluster-multiple-bricks-per-storage/.
Also, adding Denis, the feature owner.


Niels
>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Reply via email to