On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukr...@chromium.org> > > Hi, > > This patch series is an RFC of a mechanism to pass through provision requests > on stacked thinly provisioned storage devices/filesystems. > > The linux kernel provides several mechanisms to set up thinly provisioned > block storage abstractions (eg. dm-thin, loop devices over sparse files), > either directly as block devices or backing storage for filesystems. > Currently, short of writing data to either the device or filesystem, there is > no way for users to pre-allocate space for use in such storage setups. > Consider the following use-cases: > > 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that > the underlying thinpool metadata is not modified during the suspend > mechanism, the dm-thin device needs to be fully provisioned. > 2) If a filesystem uses a loop device over a sparse file, fallocate() on the > filesystem will allocate blocks for files but the underlying sparse file will > remain intact. > 3) Another example is virtual machine using a sparse file/dm-thin as a > storage device; by default, allocations within the VM boundaries will not > affect the host. > 4) Several storage standards support mechanisms for thin provisioning on real > hardware devices. For example: > a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: > "When the THINP bit in the NSFEAT field of the Identify Namespace data > structure is set to ‘1’, the controller ... shall track the number of > allocated blocks in the Namespace Utilization field" > b. The SCSi Block Commands reference - 4 section references "Thin > provisioned logical units", > c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".
When REQ_OP_PROVISION is sent on an already-allocated range of blocks, are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this way, for example. That behavior is counterintuitive since the operation name suggests it just affects the logical block's provisioning state, not the contents of the blocks. > In all of the above situations, currently the only way for pre-allocating > space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does > not scale well with larger pre-allocation sizes. What exactly is the issue with WRITE_ZEROES scalability? Are you referring to cases where the device doesn't support an efficient WRITE_ZEROES command and actually writes blocks filled with zeroes instead of updating internal allocation metadata cheaply? Stefan
signature.asc
Description: PGP signature
-- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel