From: Sarthak Kukreti <sarthakkukr...@chromium.org>

Hi,

This patch series is an RFC of a mechanism to pass through provision requests 
on stacked thinly provisioned storage devices/filesystems.

The linux kernel provides several mechanisms to set up thinly provisioned block 
storage abstractions (eg. dm-thin, loop devices over sparse files), either 
directly as block devices or backing storage for filesystems. Currently, short 
of writing data to either the device or filesystem, there is no way for users 
to pre-allocate space for use in such storage setups. Consider the following 
use-cases:

1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that 
the underlying thinpool metadata is not modified during the suspend mechanism, 
the dm-thin device needs to be fully provisioned.
2) If a filesystem uses a loop device over a sparse file, fallocate() on the 
filesystem will allocate blocks for files but the underlying sparse file will 
remain intact.
3) Another example is virtual machine using a sparse file/dm-thin as a storage 
device; by default, allocations within the VM boundaries will not affect the 
host.
4) Several storage standards support mechanisms for thin provisioning on real 
hardware devices. For example:
  a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: 
"When the THINP bit in the NSFEAT field of the Identify Namespace data 
structure is set to ‘1’, the controller ... shall track the number of allocated 
blocks in the Namespace Utilization field"
  b. The SCSi Block Commands reference - 4 section references "Thin provisioned 
logical units",
  c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".

In all of the above situations, currently the only way for pre-allocating space 
is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does not 
scale well with larger pre-allocation sizes. 

This patchset introduces primitives to support block-level provisioning (note: 
the term 'provisioning' is used to prevent overloading the term 
'allocations/pre-allocations') requests across filesystems and block devices. 
This allows fallocate() and file creation requests to reserve space across 
stacked layers of block devices and filesystems. Currently, the patchset covers 
a prototype on the device-mapper targets, loop device and ext4, but the same 
mechanism can be extended to other filesystems/block devices as well as 
extended for use with devices in 4 a-c.

Patch 1 introduces REQ_OP_PROVISION as a new request type. The provision 
request acts like the inverse of a discard request; instead of notifying lower 
layers that the block range will no longer be used, provision acts as a request 
to lower layers to provision disk space for the given block range. Real 
hardware storage devices will currently disable the provisioing capability but 
for the standards listed in 4a.-c., REQ_OP_PROVISION can be overloaded for use 
as the provisioing primitive for future devices.

Patch 2 implements REQ_OP_PROVISION handling for some of the device-mapper 
targets. This additionally adds support for pre-allocating space for thinly 
provisioned logical volumes via fallocate()

Patch 3 implements the handling for virtio-blk.

Patch 4 introduces an fallocate() mode (FALLOC_FL_PROVISION) that sends a 
provision request to the underlying block device (and beyond). This acts as the 
primary mechanism for file-level provisioing.

Patch 5 wires up the loop device handling of REQ_OP_PROVISION.

Patches 6-8 cover a prototype implementation for ext4, which includes wiring up 
the fallocate() implementation, introducing a filesystem level option (called 
'provision') to control the default allocation behaviour and finally a file 
level override to retain current handling, even on filesystems mounted with 
'provision'

Testing:
--------
- A backport of this patch series was tested on ChromiumOS using a 5.10 kernel.
- File on ext4 on a thin logical volume: fallocate(FALLOC_FL_PROVISION) : 4.6s, 
dd if=/dev/zero of=...: 6 mins.

TODOs:
------
1) The stacked block devices (dm-*, loop etc.) currently unconditionally pass 
through provision requests. Add support for provision, similar to how discard 
handling is set up (with options to disable, passdown or passthrough requests).
2) Blktests and Xfstests for validating provisioning.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Reply via email to