On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote:
> From: Sarthak Kukreti <sarthakkukr...@chromium.org>
> 
> Hi,
> 
> This patch series is an RFC of a mechanism to pass through provision requests 
> on stacked thinly provisioned storage devices/filesystems.
> 
> The linux kernel provides several mechanisms to set up thinly provisioned 
> block storage abstractions (eg. dm-thin, loop devices over sparse files), 
> either directly as block devices or backing storage for filesystems. 
> Currently, short of writing data to either the device or filesystem, there is 
> no way for users to pre-allocate space for use in such storage setups. 
> Consider the following use-cases:
> 
> 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that 
> the underlying thinpool metadata is not modified during the suspend 
> mechanism, the dm-thin device needs to be fully provisioned.
> 2) If a filesystem uses a loop device over a sparse file, fallocate() on the 
> filesystem will allocate blocks for files but the underlying sparse file will 
> remain intact.
> 3) Another example is virtual machine using a sparse file/dm-thin as a 
> storage device; by default, allocations within the VM boundaries will not 
> affect the host.
> 4) Several storage standards support mechanisms for thin provisioning on real 
> hardware devices. For example:
>   a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: 
> "When the THINP bit in the NSFEAT field of the Identify Namespace data 
> structure is set to ‘1’, the controller ... shall track the number of 
> allocated blocks in the Namespace Utilization field"
>   b. The SCSi Block Commands reference - 4 section references "Thin 
> provisioned logical units",
>   c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".

When REQ_OP_PROVISION is sent on an already-allocated range of blocks,
are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this
way, for example. That behavior is counterintuitive since the operation
name suggests it just affects the logical block's provisioning state,
not the contents of the blocks.

> In all of the above situations, currently the only way for pre-allocating 
> space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does 
> not scale well with larger pre-allocation sizes. 

What exactly is the issue with WRITE_ZEROES scalability? Are you
referring to cases where the device doesn't support an efficient
WRITE_ZEROES command and actually writes blocks filled with zeroes
instead of updating internal allocation metadata cheaply?

Stefan

Attachment: signature.asc
Description: PGP signature

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Reply via email to