Re: [dm-devel] [PATCH] dm-zoned: free dmz->ddev array in dmz_put_zoned_device
On 9/20/23 12:51, Fedor Pchelkin wrote: Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices") made the pointers to additional zoned devices to be stored in a dynamically allocated dmz->ddev array. However, this array is not freed. Free it when cleaning up zoned device information inside dmz_put_zoned_device(). Assigning NULL to dmz->ddev elements doesn't make sense there as they are not supposed to be reused later and the whole dmz target structure is being cleaned anyway. Found by Linux Verification Center (linuxtesting.org). Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices") Cc: sta...@vger.kernel.org Signed-off-by: Fedor Pchelkin --- drivers/md/dm-zoned-target.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c index ad8e670a2f9b..e25cd9db6275 100644 --- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -753,12 +753,10 @@ static void dmz_put_zoned_device(struct dm_target *ti) struct dmz_target *dmz = ti->private; int i; - for (i = 0; i < dmz->nr_ddevs; i++) { - if (dmz->ddev[i]) { + for (i = 0; i < dmz->nr_ddevs; i++) + if (dmz->ddev[i]) dm_put_device(ti, dmz->ddev[i]); - dmz->ddev[i] = NULL; - } - } + kfree(dmz->ddev); } static int dmz_fixup_devices(struct dm_target *ti) Hmm. I'm not that happy with it; dmz_put_zoned_device() is using dm_target as an argument, whereas all of the functions surrounding the call sites is using the dmz_target directly. Mind to modify the function to use 'struct dmz_target' as an argument? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 09/12] dm: Add support for copy offload
On 9/11/23 09:07, Nitesh Shetty wrote: On Fri, Sep 08, 2023 at 08:13:37AM +0200, Hannes Reinecke wrote: On 9/6/23 18:38, Nitesh Shetty wrote: Before enabling copy for dm target, check if underlying devices and dm target support copy. Avoid split happening inside dm target. Fail early if the request needs split, currently splitting copy request is not supported. And here is where I would have expected the emulation to take place; didn't you have it in one of the earlier iterations? No, but it was the other way round. In dm-kcopyd we used device offload, if that was possible, before using default dm-mapper copy. It was dropped in the current series, to streamline the patches and make the series easier to review. After all, device-mapper already has the infrastructure for copying data between devices, so adding a copy-offload emulation for device-mapper should be trivial. I did not understand this, can you please elaborate ? Please see my comments to patch 04. We should only implement copy-offload if there is a dedicated infrastructure in place. But we should not have a 'generic' copy-offload emulation. Problem is that 'real' copy-offload functionalities (ie for NVMe or SCSI) are riddled with corner-cases where copy-offload does _not_ work, and where commands might fail if particular conditions are not met. Falling back to a generic implementation will cause applications to assume that copy-offload worked, and that it gained performance as the application just had to issue a single command. Whereas in fact the opposite is true; it wasn't a single command, and the application might have performed better by issuing the commands itself. Returning -EOPNOTSUPP in these cases will inform the application that the attempt didn't work, and that it will have to fall back to the 'normal' copy. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 04/12] block: add emulation for copy
On 9/11/23 09:09, Nitesh Shetty wrote: On Fri, Sep 08, 2023 at 08:06:38AM +0200, Hannes Reinecke wrote: On 9/6/23 18:38, Nitesh Shetty wrote: For the devices which does not support copy, copy emulation is added. It is required for in-kernel users like fabrics, where file descriptor is not available and hence they can't use copy_file_range. Copy-emulation is implemented by reading from source into memory and writing to the corresponding destination. Also emulation can be used, if copy offload fails or partially completes. At present in kernel user of emulation is NVMe fabrics. Leave out the last sentence; I really would like to see it enabled for SCSI, too (we do have copy offload commands for SCSI ...). Sure, will do that And it raises all the questions which have bogged us down right from the start: where is the point in calling copy offload if copy offload is not implemented or slower than copying it by hand? And how can the caller differentiate whether copy offload bring a benefit to him? IOW: wouldn't it be better to return -EOPNOTSUPP if copy offload is not available? Present approach treats copy as a background operation and the idea is to maximize the chances of achieving copy by falling back to emulation. Having said that, it should be possible to return -EOPNOTSUPP, in case of offload IO failure or device not supporting offload. We will update this in next version. That is also what I meant with my comments to patch 09/12: I don't see it as a benefit to _always_ fall back to a generic copy-offload emulation. After all, that hardly brings any benefit. Where I do see a benefit is to tie in the generic copy-offload _infrastructure_ to existing mechanisms (like dm-kcopyd). But if there is no copy-offload infrastructure available then we really should return -EOPNOTSUPP as it really is not supported. In the end, copy offload is not a command which 'always works'. It's a command which _might_ deliver benefits (ie better performance) if dedicated implementations are available and certain parameters are met. If not then copy offload is not the best choice, and applications will need to be made aware of that. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 12/12] null_blk: add support for copy offload
On 9/6/23 18:38, Nitesh Shetty wrote: Implementation is based on existing read and write infrastructure. copy_max_bytes: A new configfs and module parameter is introduced, which can be used to set hardware/driver supported maximum copy limit. Only request based queue mode will support for copy offload. Added tracefs support to copy IO tracing. Suggested-by: Damien Le Moal Signed-off-by: Anuj Gupta Signed-off-by: Nitesh Shetty Signed-off-by: Vincent Fu --- Documentation/block/null_blk.rst | 5 ++ drivers/block/null_blk/main.c | 97 ++- drivers/block/null_blk/null_blk.h | 1 + drivers/block/null_blk/trace.h| 23 4 files changed, 123 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 11/12] null: Enable trace capability for null block
On 9/6/23 18:38, Nitesh Shetty wrote: This is a prep patch to enable copy trace capability. At present only zoned null_block is using trace, so we decoupled trace and zoned dependency to make it usable in null_blk driver also. Signed-off-by: Nitesh Shetty Signed-off-by: Anuj Gupta --- drivers/block/null_blk/Makefile | 2 -- drivers/block/null_blk/main.c | 3 +++ drivers/block/null_blk/trace.h | 2 ++ drivers/block/null_blk/zoned.c | 1 - 4 files changed, 5 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 10/12] dm: Enable copy offload for dm-linear target
On 9/6/23 18:38, Nitesh Shetty wrote: Setting copy_offload_supported flag to enable offload. Signed-off-by: Nitesh Shetty --- drivers/md/dm-linear.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index f4448d520ee9..1d1ee30bbefb 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) ti->num_discard_bios = 1; ti->num_secure_erase_bios = 1; ti->num_write_zeroes_bios = 1; + ti->copy_offload_supported = 1; ti->private = lc; return 0; Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 09/12] dm: Add support for copy offload
On 9/6/23 18:38, Nitesh Shetty wrote: Before enabling copy for dm target, check if underlying devices and dm target support copy. Avoid split happening inside dm target. Fail early if the request needs split, currently splitting copy request is not supported. And here is where I would have expected the emulation to take place; didn't you have it in one of the earlier iterations? After all, device-mapper already has the infrastructure for copying data between devices, so adding a copy-offload emulation for device-mapper should be trivial. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 08/12] nvmet: add copy command support for bdev and file ns
On 9/6/23 18:38, Nitesh Shetty wrote: Add support for handling nvme_cmd_copy command on target. For bdev-ns if backing device supports copy offload we call device copy offload (blkdev_copy_offload). In case of partial completion from above or absence of device copy offload capability, we fallback to copy emulation (blkdev_copy_emulation) For file-ns we call vfs_copy_file_range to service our request. Currently target always shows copy capability by setting NVME_CTRL_ONCS_COPY in controller ONCS. loop target has copy support, which can be used to test copy offload. trace event support for nvme_cmd_copy. Signed-off-by: Nitesh Shetty Signed-off-by: Anuj Gupta --- drivers/nvme/target/admin-cmd.c | 9 ++- drivers/nvme/target/io-cmd-bdev.c | 97 +++ drivers/nvme/target/io-cmd-file.c | 50 drivers/nvme/target/nvmet.h | 4 ++ drivers/nvme/target/trace.c | 19 ++ 5 files changed, 177 insertions(+), 2 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 07/12] nvme: add copy offload support
On 9/6/23 18:38, Nitesh Shetty wrote: Current design only supports single source range. We receive a request with REQ_OP_COPY_SRC. Parse this request which consists of src(1st) and dst(2nd) bios. Form a copy command (TP 4065) trace event support for nvme_copy_cmd. Set the device copy limits to queue limits. Signed-off-by: Kanchan Joshi Signed-off-by: Nitesh Shetty Signed-off-by: Javier González Signed-off-by: Anuj Gupta --- drivers/nvme/host/constants.c | 1 + drivers/nvme/host/core.c | 79 +++ drivers/nvme/host/trace.c | 19 + include/linux/blkdev.h| 1 + include/linux/nvme.h | 43 +-- 5 files changed, 140 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 06/12] fs, block: copy_file_range for def_blk_ops for direct block device
On 9/6/23 18:38, Nitesh Shetty wrote: For direct block device opened with O_DIRECT, use copy_file_range to issue device copy offload, and fallback to generic_copy_file_range incase device copy offload capability is absent or the device files are not open with O_DIRECT. Signed-off-by: Anuj Gupta Signed-off-by: Nitesh Shetty --- block/fops.c | 25 + 1 file changed, 25 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 05/12] fs/read_write: Enable copy_file_range for block device.
On 9/6/23 18:38, Nitesh Shetty wrote: From: Anuj Gupta This is a prep patch. Allow copy_file_range to work for block devices. Relaxing generic_copy_file_checks allows us to reuse the existing infra, instead of adding a new user interface for block copy offload. Change generic_copy_file_checks to use ->f_mapping->host for both inode_in and inode_out. Allow block device in generic_file_rw_checks. Signed-off-by: Anuj Gupta Signed-off-by: Nitesh Shetty --- fs/read_write.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 04/12] block: add emulation for copy
On 9/6/23 18:38, Nitesh Shetty wrote: For the devices which does not support copy, copy emulation is added. It is required for in-kernel users like fabrics, where file descriptor is not available and hence they can't use copy_file_range. Copy-emulation is implemented by reading from source into memory and writing to the corresponding destination. Also emulation can be used, if copy offload fails or partially completes. At present in kernel user of emulation is NVMe fabrics. Leave out the last sentence; I really would like to see it enabled for SCSI, too (we do have copy offload commands for SCSI ...). And it raises all the questions which have bogged us down right from the start: where is the point in calling copy offload if copy offload is not implemented or slower than copying it by hand? And how can the caller differentiate whether copy offload bring a benefit to him? IOW: wouldn't it be better to return -EOPNOTSUPP if copy offload is not available? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 03/12] block: add copy offload support
On 9/7/23 09:16, Nitesh Shetty wrote: On 07/09/23 07:49AM, Hannes Reinecke wrote: On 9/6/23 18:38, Nitesh Shetty wrote: Hmm. That looks a bit odd. Why do you have to use wait_for_completion? wait_for_completion is waiting for all the copy IOs to complete, when caller does not pass endio handler. Copy IO submissions are still async, as in previous revisions. Can't you submit the 'src' bio, and then submit the 'dst' bio from the endio handler of the 'src' bio? We can't do this with the current bio merging approach. 'src' bio waits for the 'dst' bio to arrive in request layer. Note that both bio's should be present in request reaching the driver, to form the copy-cmd. Hmm. I guess it would be possible, but in the end we can always change it later once the infrastructure is in. So: Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 03/12] block: add copy offload support
nce this bio reaches request layer and find a request with previously + * sent source info we merge the destination bio and return. + * 3. Release the plug and request is sent to driver + * This design works only for drivers with request queue. + */ +ssize_t blkdev_copy_offload(struct block_device *bdev, loff_t pos_in, + loff_t pos_out, size_t len, + void (*endio)(void *, int, ssize_t), + void *private, gfp_t gfp) +{ + struct blkdev_copy_io *cio; + struct blkdev_copy_offload_io *offload_io; + struct bio *src_bio, *dst_bio; + ssize_t rem, chunk, ret; + ssize_t max_copy_bytes = bdev_max_copy_sectors(bdev) << SECTOR_SHIFT; + struct blk_plug plug; + + if (!max_copy_bytes) + return -EINVAL; + + ret = blkdev_copy_sanity_check(bdev, pos_in, bdev, pos_out, len); + if (ret) + return ret; + + cio = kzalloc(sizeof(*cio), GFP_KERNEL); + if (!cio) + return -ENOMEM; + atomic_set(>refcount, 1); + cio->waiter = current; + cio->endio = endio; + cio->private = private; + + /* +* If there is a error, copied will be set to least successfully +* completed copied length +*/ + cio->copied = len; + for (rem = len; rem > 0; rem -= chunk) { + chunk = min(rem, max_copy_bytes); + + offload_io = kzalloc(sizeof(*offload_io), GFP_KERNEL); + if (!offload_io) + goto err_free_cio; + offload_io->cio = cio; + /* +* For partial completion, we use offload_io->offset to truncate +* successful copy length +*/ + offload_io->offset = len - rem; + + src_bio = bio_alloc(bdev, 0, REQ_OP_COPY_SRC, gfp); + if (!src_bio) + goto err_free_offload_io; + src_bio->bi_iter.bi_size = chunk; + src_bio->bi_iter.bi_sector = pos_in >> SECTOR_SHIFT; + + blk_start_plug(); + dst_bio = blk_next_bio(src_bio, bdev, 0, REQ_OP_COPY_DST, gfp); + if (!dst_bio) + goto err_free_src_bio; + dst_bio->bi_iter.bi_size = chunk; + dst_bio->bi_iter.bi_sector = pos_out >> SECTOR_SHIFT; + dst_bio->bi_end_io = blkdev_copy_offload_dst_endio; + dst_bio->bi_private = offload_io; + + atomic_inc(>refcount); + submit_bio(dst_bio); + blk_finish_plug(); + pos_in += chunk; + pos_out += chunk; + } + + if (atomic_dec_and_test(>refcount)) + blkdev_copy_endio(cio); + if (cio->endio) + return -EIOCBQUEUED; + + return blkdev_copy_wait_io_completion(cio); + +err_free_src_bio: + bio_put(src_bio); +err_free_offload_io: + kfree(offload_io); +err_free_cio: + cio->copied = min_t(ssize_t, cio->copied, (len - rem)); + cio->status = -ENOMEM; + if (rem == len) { + kfree(cio); + return cio->status; + } + if (cio->endio) + return cio->status; + + return blkdev_copy_wait_io_completion(cio); +} +EXPORT_SYMBOL_GPL(blkdev_copy_offload); Hmm. That looks a bit odd. Why do you have to use wait_for_completion? Can't you submit the 'src' bio, and then submit the 'dst' bio from the endio handler of the 'src' bio? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v15 02/12] Add infrastructure for copy offload in block and request layer.
On 9/6/23 18:38, Nitesh Shetty wrote: We add two new opcode REQ_OP_COPY_SRC, REQ_OP_COPY_DST. Since copy is a composite operation involving src and dst sectors/lba, each needs to be represented by a separate bio to make it compatible with device mapper. We expect caller to take a plug and send bio with source information, followed by bio with destination information. Once the src bio arrives we form a request and wait for destination bio. Upon arrival of destination we merge these two bio's and send corresponding request down to device driver. Signed-off-by: Nitesh Shetty Signed-off-by: Anuj Gupta --- block/blk-core.c | 7 +++ block/blk-merge.c | 41 +++ block/blk.h | 16 +++ block/elevator.h | 1 + include/linux/bio.h | 6 +- include/linux/blk_types.h | 10 ++ 6 files changed, 76 insertions(+), 5 deletions(-) Having two separate bios is okay, and what one would expect. What is slightly strange is the merging functionality; That could do with some more explanation why this approach was taken. And also some checks in the merging code to avoid merging non-copy offload bios. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 29/30] block: store the holder in file->private_data
On 6/8/23 13:02, Christoph Hellwig wrote: Store the file struct used as the holder in file->private_data as an indicator that this file descriptor was opened exclusively to remove the last use of FMODE_EXCL. Signed-off-by: Christoph Hellwig --- block/fops.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 27/30] block: replace fmode_t with a block-specific type for block open flags
On 6/8/23 13:02, Christoph Hellwig wrote: The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig Acked-by: Jack Wang [rnbd] --- arch/um/drivers/ubd_kern.c | 8 +++--- arch/xtensa/platforms/iss/simdisk.c | 2 +- block/bdev.c| 32 +++--- block/blk-zoned.c | 8 +++--- block/blk.h | 11 block/fops.c| 32 +- block/genhd.c | 8 +++--- block/ioctl.c | 42 + drivers/block/amiflop.c | 12 - drivers/block/aoe/aoeblk.c | 4 +-- drivers/block/ataflop.c | 25 + drivers/block/drbd/drbd_main.c | 7 ++--- drivers/block/drbd/drbd_nl.c| 2 +- drivers/block/floppy.c | 28 +-- drivers/block/loop.c| 22 +++ drivers/block/mtip32xx/mtip32xx.c | 4 +-- drivers/block/nbd.c | 4 +-- drivers/block/pktcdvd.c | 17 ++-- drivers/block/rbd.c | 2 +- drivers/block/rnbd/rnbd-clt.c | 4 +-- drivers/block/rnbd/rnbd-srv.c | 4 +-- drivers/block/sunvdc.c | 2 +- drivers/block/swim.c| 16 +-- drivers/block/swim3.c | 24 - drivers/block/ublk_drv.c| 2 +- drivers/block/xen-blkback/xenbus.c | 2 +- drivers/block/xen-blkfront.c| 2 +- drivers/block/z2ram.c | 2 +- drivers/block/zram/zram_drv.c | 6 ++--- drivers/cdrom/cdrom.c | 6 ++--- drivers/cdrom/gdrom.c | 4 +-- drivers/md/bcache/bcache.h | 2 +- drivers/md/bcache/request.c | 4 +-- drivers/md/bcache/super.c | 6 ++--- drivers/md/dm-cache-target.c| 12 - drivers/md/dm-clone-target.c| 10 +++ drivers/md/dm-core.h| 7 +++-- drivers/md/dm-era-target.c | 6 +++-- drivers/md/dm-ioctl.c | 10 +++ drivers/md/dm-snap.c| 4 +-- drivers/md/dm-table.c | 11 drivers/md/dm-thin.c| 9 --- drivers/md/dm-verity-fec.c | 2 +- drivers/md/dm-verity-target.c | 6 ++--- drivers/md/dm.c | 10 +++ drivers/md/dm.h | 2 +- drivers/md/md.c | 8 +++--- drivers/mmc/core/block.c| 8 +++--- drivers/mtd/devices/block2mtd.c | 4 +-- drivers/mtd/mtd_blkdevs.c | 4 +-- drivers/mtd/ubi/block.c | 5 ++-- drivers/nvme/host/core.c| 2 +- drivers/nvme/host/ioctl.c | 8 +++--- drivers/nvme/host/multipath.c | 2 +- drivers/nvme/host/nvme.h| 4 +-- drivers/nvme/target/io-cmd-bdev.c | 2 +- drivers/s390/block/dasd.c | 6 ++--- drivers/s390/block/dasd_genhd.c | 3 ++- drivers/s390/block/dasd_int.h | 3 ++- drivers/s390/block/dasd_ioctl.c | 2 +- drivers/s390/block/dcssblk.c| 4 +-- drivers/scsi/sd.c | 19 ++--- drivers/scsi/sr.c | 10 +++ drivers/target/target_core_iblock.c | 5 ++-- drivers/target/target_core_pscsi.c | 4 +-- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/super.c| 8 +++--- fs/btrfs/volumes.c | 16 +-- fs/btrfs/volumes.h | 4 +-- fs/erofs/super.c| 2 +- fs/ext4/super.c | 2 +- fs/f2fs/super.c | 2 +- fs/jfs/jfs_logmgr.c | 2 +- fs/nfs/blocklayout/dev.c| 5 ++-- fs/ocfs2/cluster/heartbeat.c| 3 ++- fs/reiserfs/journal.c | 4 +-- fs/xfs/xfs_super.c | 2 +- include/linux/blkdev.h | 30 - include/linux/cdrom.h | 3 ++- include/linux/device-mapper.h | 8 +++--- kernel/power/swap.c | 6 ++--- mm/swapfile.c | 2 +- 82 files changed, 334 insertions(+), 315 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 27/30] block: replace fmode_t with a block-specific type for block open flags
On 6/8/23 13:02, Christoph Hellwig wrote: The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig Acked-by: Jack Wang [rnbd] --- arch/um/drivers/ubd_kern.c | 8 +++--- arch/xtensa/platforms/iss/simdisk.c | 2 +- block/bdev.c| 32 +++--- block/blk-zoned.c | 8 +++--- block/blk.h | 11 block/fops.c| 32 +- block/genhd.c | 8 +++--- block/ioctl.c | 42 + drivers/block/amiflop.c | 12 - drivers/block/aoe/aoeblk.c | 4 +-- drivers/block/ataflop.c | 25 + drivers/block/drbd/drbd_main.c | 7 ++--- drivers/block/drbd/drbd_nl.c| 2 +- drivers/block/floppy.c | 28 +-- drivers/block/loop.c| 22 +++ drivers/block/mtip32xx/mtip32xx.c | 4 +-- drivers/block/nbd.c | 4 +-- drivers/block/pktcdvd.c | 17 ++-- drivers/block/rbd.c | 2 +- drivers/block/rnbd/rnbd-clt.c | 4 +-- drivers/block/rnbd/rnbd-srv.c | 4 +-- drivers/block/sunvdc.c | 2 +- drivers/block/swim.c| 16 +-- drivers/block/swim3.c | 24 - drivers/block/ublk_drv.c| 2 +- drivers/block/xen-blkback/xenbus.c | 2 +- drivers/block/xen-blkfront.c| 2 +- drivers/block/z2ram.c | 2 +- drivers/block/zram/zram_drv.c | 6 ++--- drivers/cdrom/cdrom.c | 6 ++--- drivers/cdrom/gdrom.c | 4 +-- drivers/md/bcache/bcache.h | 2 +- drivers/md/bcache/request.c | 4 +-- drivers/md/bcache/super.c | 6 ++--- drivers/md/dm-cache-target.c| 12 - drivers/md/dm-clone-target.c| 10 +++ drivers/md/dm-core.h| 7 +++-- drivers/md/dm-era-target.c | 6 +++-- drivers/md/dm-ioctl.c | 10 +++ drivers/md/dm-snap.c| 4 +-- drivers/md/dm-table.c | 11 drivers/md/dm-thin.c| 9 --- drivers/md/dm-verity-fec.c | 2 +- drivers/md/dm-verity-target.c | 6 ++--- drivers/md/dm.c | 10 +++ drivers/md/dm.h | 2 +- drivers/md/md.c | 8 +++--- drivers/mmc/core/block.c| 8 +++--- drivers/mtd/devices/block2mtd.c | 4 +-- drivers/mtd/mtd_blkdevs.c | 4 +-- drivers/mtd/ubi/block.c | 5 ++-- drivers/nvme/host/core.c| 2 +- drivers/nvme/host/ioctl.c | 8 +++--- drivers/nvme/host/multipath.c | 2 +- drivers/nvme/host/nvme.h| 4 +-- drivers/nvme/target/io-cmd-bdev.c | 2 +- drivers/s390/block/dasd.c | 6 ++--- drivers/s390/block/dasd_genhd.c | 3 ++- drivers/s390/block/dasd_int.h | 3 ++- drivers/s390/block/dasd_ioctl.c | 2 +- drivers/s390/block/dcssblk.c| 4 +-- drivers/scsi/sd.c | 19 ++--- drivers/scsi/sr.c | 10 +++ drivers/target/target_core_iblock.c | 5 ++-- drivers/target/target_core_pscsi.c | 4 +-- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/super.c| 8 +++--- fs/btrfs/volumes.c | 16 +-- fs/btrfs/volumes.h | 4 +-- fs/erofs/super.c| 2 +- fs/ext4/super.c | 2 +- fs/f2fs/super.c | 2 +- fs/jfs/jfs_logmgr.c | 2 +- fs/nfs/blocklayout/dev.c| 5 ++-- fs/ocfs2/cluster/heartbeat.c| 3 ++- fs/reiserfs/journal.c | 4 +-- fs/xfs/xfs_super.c | 2 +- include/linux/blkdev.h | 30 - include/linux/cdrom.h | 3 ++- include/linux/device-mapper.h | 8 +++--- kernel/power/swap.c | 6 ++--- mm/swapfile.c | 2 +- 82 files changed, 334 insertions(+), 315 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 05/30] cdrom: track if a cdrom_device_info was opened for data
On 6/8/23 13:02, Christoph Hellwig wrote: Set a flag when a cdrom_device_info is opened for writing, instead of trying to figure out this at release time. This will allow to eventually remove the mode argument to the ->release block_device_operation as nothing but the CDROM drivers uses that argument. Signed-off-by: Christoph Hellwig Reviewed-by: Phillip Potter Acked-by: Christian Brauner --- drivers/cdrom/cdrom.c | 12 +--- include/linux/cdrom.h | 1 + 2 files changed, 6 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 30/31] block: store the holder in file->private_data
On 6/6/23 09:39, Christoph Hellwig wrote: Store the file struct used as the holder in file->private_data as an indicator that this file descriptor was opened exclusively to remove the last use of FMODE_EXCL. Signed-off-by: Christoph Hellwig --- block/fops.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/block/fops.c b/block/fops.c index c40b9f978e3bc7..915e0ef7560993 100644 --- a/block/fops.c +++ b/block/fops.c @@ -478,7 +478,7 @@ blk_mode_t file_to_blk_mode(struct file *file) mode |= BLK_OPEN_READ; if (file->f_mode & FMODE_WRITE) mode |= BLK_OPEN_WRITE; - if (file->f_mode & FMODE_EXCL) + if (file->private_data) mode |= BLK_OPEN_EXCL; if ((file->f_flags & O_ACCMODE) == 3) mode |= BLK_OPEN_WRITE_IOCTL; @@ -501,12 +501,15 @@ static int blkdev_open(struct inode *inode, struct file *filp) filp->f_flags |= O_LARGEFILE; filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC; + /* +* Use the file private data to store the holder, file_to_blk_mode +* relies on this. +*/ Can you update the comment to reflect that we're only use the ->private_data field for exclusive open? I know it's indicated by the condition, but we really should be clarify this usage. if (filp->f_flags & O_EXCL) - filp->f_mode |= FMODE_EXCL; + filp->private_data = filp; bdev = blkdev_get_by_dev(inode->i_rdev, file_to_blk_mode(filp), -(filp->f_mode & FMODE_EXCL) ? filp : NULL, -NULL); +filp->private_data, NULL); if (IS_ERR(bdev)) return PTR_ERR(bdev); @@ -517,8 +520,7 @@ static int blkdev_open(struct inode *inode, struct file *filp) static int blkdev_release(struct inode *inode, struct file *filp) { - blkdev_put(I_BDEV(filp->f_mapping->host), - (filp->f_mode & FMODE_EXCL) ? filp : NULL); + blkdev_put(I_BDEV(filp->f_mapping->host), filp->private_data); return 0; } Other than that: Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 31/31] fs: remove the now unused FMODE_* flags
On 6/6/23 09:39, Christoph Hellwig wrote: FMODE_NDELAY, FMODE_EXCL and FMODE_WRITE_IOCTL were only used for block internal purposed and are now entirely unused, so remove them. Signed-off-by: Christoph Hellwig --- include/linux/fs.h | 7 --- 1 file changed, 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 25/31] ubd: remove commented out code in ubd_open
On 6/6/23 09:39, Christoph Hellwig wrote: This code has been dead forever, make sure it doesn't show up in code searches. Signed-off-by: Christoph Hellwig --- arch/um/drivers/ubd_kern.c | 7 --- 1 file changed, 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 08/31] block: share code between disk_check_media_change and disk_force_media_change
On 6/7/23 14:21, Christoph Hellwig wrote: On Wed, Jun 07, 2023 at 02:19:00PM +0200, Hannes Reinecke wrote: - return true; + return __disk_check_media_change(disk, + disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE | + DISK_EVENT_EJECT_REQUEST)); Can you move the call to disk_clear_events() out of the call to __disk_check_media_change()? I find this pattern hard to read. I suspect you've not done enough functional programming in your youth :) That's why I said 'I find'; purely personal preference. If you're happy with: Reviewed-by: Hannes Reinecke Cheers, Hannes (In my youth? One is tempted to quote Falco: "If you remember the '90s you haven't experienced them...") -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 29/31] block: always use I_BDEV on file->f_mapping->host to find the bdev
On 6/6/23 09:39, Christoph Hellwig wrote: Always use I_BDEV(file->f_mapping->host) to find the bdev for a file to free up file->private_data for other uses. Signed-off-by: Christoph Hellwig --- block/fops.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 27/31] block: remove unused fmode_t arguments from ioctl handlers
On 6/6/23 09:39, Christoph Hellwig wrote: A few ioctl handlers have fmode_t arguments that are entirely unused, remove them. Signed-off-by: Christoph Hellwig --- block/blk-zoned.c | 4 ++-- block/blk.h | 6 +++--- block/ioctl.c | 14 +++--- 3 files changed, 12 insertions(+), 12 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 26/31] block: move a few internal definitions out of blkdev.h
On 6/6/23 09:39, Christoph Hellwig wrote: All these helpers are only used in core block code, so move them out of the public header. Signed-off-by: Christoph Hellwig --- block/blk.h| 23 +-- include/linux/blkdev.h | 27 --- 2 files changed, 21 insertions(+), 29 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 24/31] rnbd-srv: replace sess->open_flags with a "bool readonly"
On 6/6/23 09:39, Christoph Hellwig wrote: Stop passing the fmode_t around and just use a simple bool to track if an export is read-only. Signed-off-by: Christoph Hellwig --- drivers/block/rnbd/rnbd-srv-sysfs.c | 3 +-- drivers/block/rnbd/rnbd-srv.c | 15 +++ drivers/block/rnbd/rnbd-srv.h | 2 +- 3 files changed, 9 insertions(+), 11 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 23/31] mtd: block: use a simple bool to track open for write
On 6/6/23 09:39, Christoph Hellwig wrote: Instead of propagating the fmode_t, just use a bool to track if a mtd block device was opened for writing. Signed-off-by: Christoph Hellwig --- drivers/mtd/mtd_blkdevs.c| 2 +- drivers/mtd/mtdblock.c | 2 +- include/linux/mtd/blktrans.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 22/31] nvme: replace the fmode_t argument to the nvme ioctl handlers with a simple bool
On 6/6/23 09:39, Christoph Hellwig wrote: Instead of passing a fmode_t and only checking it fo0r FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/ioctl.c | 62 +-- 1 file changed, 34 insertions(+), 28 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 21/31] scsi: replace the fmode_t argument to ->sg_io_fn with a simple bool
On 6/6/23 09:39, Christoph Hellwig wrote: Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig --- block/bsg-lib.c | 2 +- block/bsg.c | 8 +--- drivers/scsi/scsi_bsg.c | 4 ++-- include/linux/bsg.h | 2 +- 4 files changed, 9 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 20/31] scsi: replace the fmode_t argument to scsi_ioctl with a simple bool
On 6/6/23 09:39, Christoph Hellwig wrote: Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig --- drivers/scsi/ch.c | 3 ++- drivers/scsi/scsi_ioctl.c | 34 +- drivers/scsi/sd.c | 2 +- drivers/scsi/sg.c | 5 +++-- drivers/scsi/sr.c | 2 +- drivers/scsi/st.c | 2 +- include/scsi/scsi_ioctl.h | 2 +- 7 files changed, 26 insertions(+), 24 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 19/31] scsi: replace the fmode_t argument to scsi_cmd_allowed with a simple bool
On 6/6/23 09:39, Christoph Hellwig wrote: Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig --- drivers/scsi/scsi_bsg.c | 2 +- drivers/scsi/scsi_ioctl.c | 8 drivers/scsi/sg.c | 2 +- include/scsi/scsi_ioctl.h | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 18/31] fs: remove sb->s_mode
On 6/6/23 09:39, Christoph Hellwig wrote: There is no real need to store the open mode in the super_block now. It is only used by f2fs, which can easily recalculate it. Signed-off-by: Christoph Hellwig --- fs/f2fs/super.c| 10 ++ fs/nilfs2/super.c | 1 - fs/super.c | 2 -- include/linux/fs.h | 1 - 4 files changed, 6 insertions(+), 8 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 17/31] block: add a sb_open_mode helper
On 6/6/23 09:39, Christoph Hellwig wrote: Add a helper to return the open flags for blkdev_get_by* for passed in super block flags instead of open coding the logic in many places. Signed-off-by: Christoph Hellwig --- fs/btrfs/super.c | 5 + fs/nilfs2/super.c | 7 ++- fs/super.c | 15 --- include/linux/blkdev.h | 7 +++ 4 files changed, 14 insertions(+), 20 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 16/31] block: use the holder as indication for exclusive opens
On 6/6/23 09:39, Christoph Hellwig wrote: The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig --- block/bdev.c| 37 block/fops.c| 6 +++-- block/genhd.c | 5 ++-- block/ioctl.c | 5 ++-- drivers/block/drbd/drbd_nl.c| 23 ++--- drivers/block/pktcdvd.c | 13 +- drivers/block/rnbd/rnbd-srv.c | 4 +-- drivers/block/xen-blkback/xenbus.c | 2 +- drivers/block/zram/zram_drv.c | 8 +++--- drivers/md/bcache/super.c | 15 ++-- drivers/md/dm.c | 6 ++--- drivers/md/md.c | 38 +++-- drivers/mtd/devices/block2mtd.c | 4 +-- drivers/nvme/target/io-cmd-bdev.c | 2 +- drivers/s390/block/dasd_genhd.c | 2 +- drivers/target/target_core_iblock.c | 6 ++--- drivers/target/target_core_pscsi.c | 8 +++--- fs/btrfs/dev-replace.c | 6 ++--- fs/btrfs/ioctl.c| 12 - fs/btrfs/volumes.c | 28 ++--- fs/btrfs/volumes.h | 6 ++--- fs/erofs/super.c| 7 +++--- fs/ext4/super.c | 11 +++-- fs/f2fs/super.c | 2 +- fs/jfs/jfs_logmgr.c | 6 ++--- fs/nfs/blocklayout/dev.c| 4 +-- fs/nilfs2/super.c | 6 ++--- fs/ocfs2/cluster/heartbeat.c| 4 +-- fs/reiserfs/journal.c | 19 +++ fs/reiserfs/reiserfs.h | 1 - fs/super.c | 20 +++ fs/xfs/xfs_super.c | 15 ++-- include/linux/blkdev.h | 2 +- kernel/power/hibernate.c| 12 +++-- kernel/power/power.h| 2 +- kernel/power/swap.c | 21 +++- mm/swapfile.c | 7 +++--- 37 files changed, 183 insertions(+), 192 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 12/31] swsusp: don't pass a stack address to blkdev_get_by_path
On 6/6/23 09:39, Christoph Hellwig wrote: holder is just an on-stack pointer that can easily be reused by other calls, replace it with a static variable that doesn't change. Signed-off-by: Christoph Hellwig --- kernel/power/swap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 11/31] block: rename blkdev_close to blkdev_release
On 6/6/23 09:39, Christoph Hellwig wrote: Make the function name match the method name. Signed-off-by: Christoph Hellwig --- block/fops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 15/31] btrfs: don't pass a holder for non-exclusive blkdev_get_by_path
On 6/6/23 09:39, Christoph Hellwig wrote: Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't make sense, so pass NULL instead and remove the holder argument from the call chains the only end up in non-FMODE_EXCL blkdev_get_by_path calls. Signed-off-by: Christoph Hellwig --- fs/btrfs/super.c | 16 ++-- fs/btrfs/volumes.c | 17 - fs/btrfs/volumes.h | 3 +-- 3 files changed, 15 insertions(+), 21 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 13/31] bcache: don't pass a stack address to blkdev_get_by_path
On 6/6/23 09:39, Christoph Hellwig wrote: sb is just an on-stack pointer that can easily be reused by other calls. Switch to use the bcache-wide bcache_kobj instead as there is no need to claim per-bcache device anyway. Signed-off-by: Christoph Hellwig --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 14/31] rnbd-srv: don't pass a holder for non-exclusive blkdev_get_by_path
On 6/6/23 09:39, Christoph Hellwig wrote: Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't make sense, so pass NULL instead. Signed-off-by: Christoph Hellwig --- drivers/block/rnbd/rnbd-srv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 09/31] block: pass a gendisk to ->open
On 6/6/23 09:39, Christoph Hellwig wrote: ->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig --- arch/um/drivers/ubd_kern.c | 5 ++--- arch/xtensa/platforms/iss/simdisk.c | 4 ++-- block/bdev.c| 2 +- drivers/block/amiflop.c | 8 drivers/block/aoe/aoeblk.c | 4 ++-- drivers/block/ataflop.c | 16 +++ drivers/block/drbd/drbd_main.c | 6 +++--- drivers/block/floppy.c | 30 +++-- drivers/block/nbd.c | 8 drivers/block/pktcdvd.c | 6 +++--- drivers/block/rbd.c | 4 ++-- drivers/block/rnbd/rnbd-clt.c | 4 ++-- drivers/block/swim.c| 10 +- drivers/block/swim3.c | 10 +- drivers/block/ublk_drv.c| 4 ++-- drivers/block/z2ram.c | 6 ++ drivers/block/zram/zram_drv.c | 13 + drivers/cdrom/gdrom.c | 4 ++-- drivers/md/bcache/super.c | 4 ++-- drivers/md/dm.c | 4 ++-- drivers/md/md.c | 6 +++--- drivers/mmc/core/block.c| 4 ++-- drivers/mtd/mtd_blkdevs.c | 4 ++-- drivers/mtd/ubi/block.c | 4 ++-- drivers/nvme/host/core.c| 4 ++-- drivers/nvme/host/multipath.c | 4 ++-- drivers/s390/block/dasd.c | 4 ++-- drivers/s390/block/dcssblk.c| 7 +++ drivers/scsi/sd.c | 12 ++-- drivers/scsi/sr.c | 6 +++--- include/linux/blkdev.h | 2 +- 31 files changed, 102 insertions(+), 107 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 10/31] block: remove the unused mode argument to ->release
On 6/6/23 09:39, Christoph Hellwig wrote: The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig --- arch/um/drivers/ubd_kern.c | 4 ++-- arch/xtensa/platforms/iss/simdisk.c | 2 +- block/bdev.c| 14 +++--- drivers/block/amiflop.c | 2 +- drivers/block/aoe/aoeblk.c | 2 +- drivers/block/ataflop.c | 4 ++-- drivers/block/drbd/drbd_main.c | 4 ++-- drivers/block/floppy.c | 2 +- drivers/block/loop.c| 2 +- drivers/block/nbd.c | 2 +- drivers/block/pktcdvd.c | 4 ++-- drivers/block/rbd.c | 2 +- drivers/block/rnbd/rnbd-clt.c | 2 +- drivers/block/swim.c| 2 +- drivers/block/swim3.c | 3 +-- drivers/block/z2ram.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/md/bcache/super.c | 2 +- drivers/md/dm.c | 2 +- drivers/md/md.c | 2 +- drivers/mmc/core/block.c| 2 +- drivers/mtd/mtd_blkdevs.c | 2 +- drivers/mtd/ubi/block.c | 2 +- drivers/nvme/host/core.c| 2 +- drivers/nvme/host/multipath.c | 2 +- drivers/s390/block/dasd.c | 2 +- drivers/s390/block/dcssblk.c| 4 ++-- drivers/scsi/sd.c | 3 +-- drivers/scsi/sr.c | 2 +- include/linux/blkdev.h | 2 +- 30 files changed, 41 insertions(+), 43 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 07/31] block: pass a gendisk on bdev_check_media_change
On 6/6/23 09:39, Christoph Hellwig wrote: bdev_check_media_change should only ever be called for the whole device. Pass a gendisk to make that explicit and rename the function to disk_check_media_change. Signed-off-by: Christoph Hellwig --- block/disk-events.c | 18 +- drivers/block/amiflop.c | 2 +- drivers/block/ataflop.c | 6 +++--- drivers/block/floppy.c | 16 drivers/block/swim.c| 2 +- drivers/block/swim3.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/md/md.c | 2 +- drivers/scsi/sd.c | 9 - drivers/scsi/sr.c | 2 +- include/linux/blkdev.h | 2 +- 11 files changed, 31 insertions(+), 32 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 05/31] cdrom: track if a cdrom_device_info was opened for data
On 6/6/23 09:39, Christoph Hellwig wrote: Set a flag when a cdrom_device_info is opened for writing, instead of trying to figure out this at release time. This will allow to eventually remove the mode argument to the ->release block_device_operation as nothing but the CDROM drivers uses that argument. Signed-off-by: Christoph Hellwig --- drivers/cdrom/cdrom.c | 12 +--- include/linux/cdrom.h | 1 + 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 08abf1ffede002..adebac1bd210d9 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -1172,6 +1172,7 @@ int cdrom_open(struct cdrom_device_info *cdi, fmode_t mode) ret = 0; cdi->media_written = 0; } + cdi->opened_for_data = true; } if (ret) @@ -1252,7 +1253,6 @@ static int check_for_audio_disc(struct cdrom_device_info *cdi, void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode) { const struct cdrom_device_ops *cdo = cdi->ops; - int opened_for_data; cd_dbg(CD_CLOSE, "entering cdrom_release\n"); @@ -1270,14 +1270,12 @@ void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode) } } - opened_for_data = !(cdi->options & CDO_USE_FFLAGS) || - !(mode & FMODE_NDELAY); - cdo->release(cdi); - if (cdi->use_count == 0) { /* last process that closes dev*/ - if (opened_for_data && - cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY)) + + if (cdi->use_count == 0 && cdi->opened_for_data) { + if (cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY)) cdo->tray_move(cdi, 1); + cdi->opened_for_data = false; } } EXPORT_SYMBOL(cdrom_release); diff --git a/include/linux/cdrom.h b/include/linux/cdrom.h index 0a5db0b0c958a1..385e94732b2cf1 100644 --- a/include/linux/cdrom.h +++ b/include/linux/cdrom.h @@ -64,6 +64,7 @@ struct cdrom_device_info { int (*exit)(struct cdrom_device_info *); int mrw_mode_page; __s64 last_media_change_ms; + bool opened_for_data; }; struct cdrom_device_ops { Do we care about alignment here? integer followed by a 64 bit value followed by a bool seems like an automatic padding to me ... Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 03/31] cdrom: remove the unused mode argument to cdrom_ioctl
On 6/6/23 09:39, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig --- drivers/cdrom/cdrom.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/scsi/sr.c | 2 +- include/linux/cdrom.h | 4 ++-- 4 files changed, 5 insertions(+), 5 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 08/31] block: share code between disk_check_media_change and disk_force_media_change
On 6/6/23 09:39, Christoph Hellwig wrote: Factor the common logic between disk_check_media_change and disk_force_media_change into a helper. Signed-off-by: Christoph Hellwig --- block/disk-events.c | 37 - 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/block/disk-events.c b/block/disk-events.c index 8b1b63225738f8..06f325662b3494 100644 --- a/block/disk-events.c +++ b/block/disk-events.c @@ -262,6 +262,18 @@ static unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) return pending; } +static bool __disk_check_media_change(struct gendisk *disk, unsigned int events) +{ + if (!(events & DISK_EVENT_MEDIA_CHANGE)) + return false; + + if (__invalidate_device(disk->part0, true)) + pr_warn("VFS: busy inodes on changed media %s\n", + disk->disk_name); + set_bit(GD_NEED_PART_SCAN, >state); + return true; +} + /** * disk_check_media_change - check if a removable media has been changed * @disk: gendisk to check @@ -274,18 +286,9 @@ static unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) */ bool disk_check_media_change(struct gendisk *disk) { - unsigned int events; - - events = disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE | - DISK_EVENT_EJECT_REQUEST); - if (!(events & DISK_EVENT_MEDIA_CHANGE)) - return false; - - if (__invalidate_device(disk->part0, true)) - pr_warn("VFS: busy inodes on changed media %s\n", - disk->disk_name); - set_bit(GD_NEED_PART_SCAN, >state); - return true; + return __disk_check_media_change(disk, + disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE | + DISK_EVENT_EJECT_REQUEST)); Can you move the call to disk_clear_events() out of the call to __disk_check_media_change()? I find this pattern hard to read. Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 02/31] cdrom: remove the unused bdev argument to cdrom_open
On 6/6/23 09:39, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig --- drivers/cdrom/cdrom.c | 3 +-- drivers/cdrom/gdrom.c | 2 +- drivers/scsi/sr.c | 2 +- include/linux/cdrom.h | 3 +-- 4 files changed, 4 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 04/31] cdrom: remove the unused cdrom_close_write release code
On 6/6/23 09:39, Christoph Hellwig wrote: cdrom_close_write is empty, and the for_data flag it is keyed off is never set. Remove all this clutter. Signed-off-by: Christoph Hellwig --- drivers/cdrom/cdrom.c | 15 --- include/linux/cdrom.h | 1 - 2 files changed, 16 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 06/31] cdrom: remove the unused mode argument to cdrom_release
On 6/6/23 09:39, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig --- drivers/cdrom/cdrom.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/scsi/sr.c | 2 +- include/linux/cdrom.h | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 01/31] block: also call ->open for incremental partition opens
On 6/6/23 09:39, Christoph Hellwig wrote: For whole devices ->open is called for each open, but for partitions it is only called on the first open of a partition. This is problematic as various block drivers look at open flags and might not do all setup for ioctl only or NDELAY opens. Signed-off-by: Christoph Hellwig --- block/bdev.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Does dm-zoned support buffered write?
On 5/12/23 19:41, Ming Lin wrote: On Thu, May 11, 2023 at 11:56 AM Hannes Reinecke wrote: On 5/11/23 20:41, Ming Lin wrote: Hi list, I have an application that needs to use buffered_io to access SMR disk for good performance. From "ZBD Support Restrictions" at https://zonedstorage.io/docs/linux/overview " Direct IO Writes The kernel page cache does not guarantee that cached dirty pages will be flushed to a block device in sequential sector order. This can lead to unaligned write errors if an application uses buffered writes to write to the sequential write required zones of a device. To avoid this pitfall, applications that directly use a zoned block device without a file system should always use direct I/O operations to write to the sequential write required zones of a host-managed disk (that is, they should issue write() system calls with a block device "file open" that uses the O_DIRECT flag). " Raw zbd disk only supports direct_io. Does dm-zoned support buffered io (without O_DIRECT)? Yes. But I _think_ the above paragraph is ever so slightly outdated, as we've spent quite a lot of time fixing sequential writes (cf blk-zoned etc). So while dm-zoned is using bufferet writes there won't be any sequential write issues. At least, I have not uncovered any of those during testing. Hi Hannes, I use 5.10.90 kernel and smr disk capacity is 24T. I followed the below guide to create dm_zone device on top of smr disk. https://zonedstorage.io/docs/linux/dm Then mkfs.ext4 /dev/dm-0, but it seems hung. Any ideas? [37552.217472] device-mapper: uevent: version 1.0.3 [37552.217549] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com [37575.608500] device-mapper: zoned metadata: (dmz-5000cca2bfc0db21): DM-Zoned metadata version 2 [37575.608502] device-mapper: zoned metadata: (sdx): Host-managed zoned block device [37575.608503] device-mapper: zoned metadata: (sdx): 50782535680 512-byte logical sectors (offset 0) [37575.608503] device-mapper: zoned metadata: (sdx): 96860 zones of 524288 512-byte logical sectors (offset 0) [37575.608504] device-mapper: zoned metadata: (dmz-5000cca2bfc0db21): 96860 zones of 524288 512-byte logical sectors [37575.609204] device-mapper: zoned: (dmz-5000cca2bfc0db21): Target device: 50771001344 512-byte logical sectors (6346375168 blocks) [38101.543353] INFO: task mkfs.ext4:1411791 blocked for more than 122 seconds. [38101.543380] Tainted: G OE 5.10.90.bm.1-amd64+ #2 [38101.543395] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [38101.543411] task:mkfs.ext4 state:D stack:0 pid:1411791 ppid:1388660 flags:0x4000 [38101.543415] Call Trace: [38101.543422] __schedule+0x3fd/0x760 [38101.543425] schedule+0x46/0xb0 [38101.543426] io_schedule+0x12/0x40 [38101.543429] wait_on_page_bit+0x133/0x270 [38101.543431] ? __page_cache_alloc+0xa0/0xa0 [38101.543432] wait_on_page_writeback+0x25/0x70 [38101.543434] __filemap_fdatawait_range+0x86/0xf0 [38101.543435] file_write_and_wait_range+0x74/0xb0 [38101.543438] blkdev_fsync+0x16/0x40 [38101.543441] do_fsync+0x38/0x60 [38101.543442] __x64_sys_fsync+0x10/0x20 [38101.543445] do_syscall_64+0x2d/0x70 [38101.543446] entry_SYSCALL_64_after_hwframe+0x44/0xa9 === Below are the steps I did: root@smr_dev:~# blkzone reset /dev/sdx root@smr_dev:~# dmzadm --format /dev/sdx /dev/sdx: 50782535680 512-byte sectors (24215 GiB) Host-managed device 96860 zones, offset 0 96860 zones of 524288 512-byte sectors (256 MiB) 65536 4KB data blocks per zone Resetting sequential zones Writing primary metadata set Writing mapping table Writing bitmap blocks Writing super block to sdx block 0 Writing secondary metadata set Writing mapping table Writing bitmap blocks Writing super block to sdx block 196608 Syncing disk Done. Hmm. I don't actually see how many CMR zones the drive has. root@smr_dev:~# dmzadm --start /dev/sdx /dev/sdx: 50782535680 512-byte sectors (24215 GiB) Host-managed device 96860 zones, offset 0 96860 zones of 524288 512-byte sectors (256 MiB) 65536 4KB data blocks per zone sdx: starting dmz-5000cca2bfc0db21, metadata ver. 2, uuid 7495e21a-23d9-49f4-832a-76b32136078b root@smr_dev:~# mkfs.ext4 /dev/dm-0 mke2fs 1.44.5 (15-Dec-2018) Discarding device blocks: done Creating filesystem with 6346375168 4k blocks and 396648448 inodes Filesystem UUID: c47de06d-6cf6-4a85-9502-7830ca2f4526 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 2048, 23887872, 71663616, 78675968, 10240, 214990848, 51200, 550731776, 644972544, 1934917632, 256000, 3855122432, 5804752896 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: === At anoth
Re: [dm-devel] Does dm-zoned support buffered write?
On 5/11/23 20:41, Ming Lin wrote: Hi list, I have an application that needs to use buffered_io to access SMR disk for good performance. From "ZBD Support Restrictions" at https://zonedstorage.io/docs/linux/overview " Direct IO Writes The kernel page cache does not guarantee that cached dirty pages will be flushed to a block device in sequential sector order. This can lead to unaligned write errors if an application uses buffered writes to write to the sequential write required zones of a device. To avoid this pitfall, applications that directly use a zoned block device without a file system should always use direct I/O operations to write to the sequential write required zones of a host-managed disk (that is, they should issue write() system calls with a block device "file open" that uses the O_DIRECT flag). " Raw zbd disk only supports direct_io. Does dm-zoned support buffered io (without O_DIRECT)? Yes. But I _think_ the above paragraph is ever so slightly outdated, as we've spent quite a lot of time fixing sequential writes (cf blk-zoned etc). So while dm-zoned is using bufferet writes there won't be any sequential write issues. At least, I have not uncovered any of those during testing. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 18/18] scsi: target: Add block PR support to iblock
On 4/7/23 22:05, Mike Christie wrote: This adds support for the block PR callouts to target_core_iblock. This patch doesn't attempt to implement the entire spec because there's no way support it all like SPEC_I_PT and ALL_TG_PT. This only supports exporting the iblock device from one path on the local target. Signed-off-by: Mike Christie --- drivers/target/target_core_iblock.c | 271 +++- 1 file changed, 266 insertions(+), 5 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 17/18] scsi: target: Report and detect unsupported PR commands
On 4/7/23 22:05, Mike Christie wrote: The backend modules don't know about ports and I_T nexuses and the pr_ops callouts the modules will use don't support the old RESERVE/RELEASE commands. This patch has us report we don't support those types of commands and fail them. Signed-off-by: Mike Christie --- drivers/target/target_core_pr.c | 17 drivers/target/target_core_spc.c | 75 +++- 2 files changed, 72 insertions(+), 20 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 16/18] scsi: target: Pass struct target_opcode_descriptor to enabled
On 4/7/23 22:05, Mike Christie wrote: The iblock pr_ops support does not support commands that require port or I_T Nexus info. This adds a struct target_opcode_descriptor as an argument to the enabled callout so we can still have the common tcm_is_pr_enabled and tcm_is_scsi2_reservations_enabled functions and also determine if the command is supported based on the command and service action and device settings. Signed-off-by: Mike Christie --- drivers/target/target_core_spc.c | 40 +++ include/target/target_core_base.h | 3 ++- 2 files changed, 27 insertions(+), 16 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 15/18] scsi: target: Allow backends to hook into PR handling
On 4/7/23 22:05, Mike Christie wrote: For the cases where you want to export a device to a VM via a single I_T nexus and want to passthrough the PR handling to the physical/real device you have to use pscsi or tcmu. Both are good for specific uses however for the case where you want good performance, and are not using SCSI devices directly (using DM/MD RAID or multipath devices) then we are out of luck. The following patches allow iblock to mimimally hook into the LIO PR code and then pass the PR handling to the physical device. Note that like with the tcmu an pscsi cases it's only supported when you export the device via one I_T nexus. This patch adds the initial LIO callouts. The next patch will modify iblock. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig --- drivers/target/target_core_pr.c | 62 +++- include/target/target_core_backend.h | 4 ++ 2 files changed, 65 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 14/18] scsi: target: Rename sbc_ops to exec_cmd_ops
On 4/7/23 22:05, Mike Christie wrote: The next patches allow us to call the block layer's pr_ops from the backends. This will require allowing the backends to hook into the cmd processing for SPC commands, so this renames sbc_ops to a more generic exec_cmd_ops. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig --- drivers/target/target_core_file.c| 4 ++-- drivers/target/target_core_iblock.c | 4 ++-- drivers/target/target_core_rd.c | 4 ++-- drivers/target/target_core_sbc.c | 13 +++-- drivers/target/target_core_spc.c | 4 ++-- include/target/target_core_backend.h | 4 ++-- 6 files changed, 17 insertions(+), 16 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 13/18] nvme: Add pr_ops read_reservation support
On 4/7/23 22:05, Mike Christie wrote: This patch adds support for the pr_ops read_reservation callout by calling the NVMe Reservation Report helper. It then parses that info to detect if there is a reservation and if there is then convert the returned info to a pr_ops pr_held_reservation struct. Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni --- drivers/nvme/host/pr.c | 83 ++ 1 file changed, 83 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 12/18] nvme: Add a nvme_pr_type enum
On 4/7/23 22:05, Mike Christie wrote: The next patch adds support to report the reservation type, so we need to be able to convert from the NVMe PR value we get from the device to the linux block layer PR value that will be returned to callers. To prepare for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type function. Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Reviewed-by: Christoph Hellwig --- drivers/nvme/host/pr.c | 24 include/linux/nvme.h | 9 + 2 files changed, 21 insertions(+), 12 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 11/18] nvme: Add pr_ops read_keys support
On 4/7/23 22:05, Mike Christie wrote: This patch adds support for the pr_ops read_keys callout by calling the NVMe Reservation Report helper, then parsing that info to get the controller's registered keys. Because the callout is only used in the kernel where the callers, like LIO, do not know about controller/host IDs, the callout just returns the registered keys which is required by the SCSI PR in READ KEYS command. Signed-off-by: Mike Christie --- drivers/nvme/host/pr.c | 69 ++ include/linux/nvme.h | 4 +++ 2 files changed, 73 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 10/18] nvme: Add helper to send pr command
On 4/7/23 22:05, Mike Christie wrote: Move the code that checks for multipath support and sends the pr command to a new helper so it can be used by the reservation report support added in the next patches. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig --- drivers/nvme/host/pr.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 01/18] block: Add PR callouts for read keys and reservation
On 4/7/23 22:05, Mike Christie wrote: Add callouts for reading keys and reservations. This allows LIO to support the READ_KEYS and READ_RESERVATION commands so it can export devices to VMs for software like windows clustering. Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche --- include/linux/pr.h | 25 + 1 file changed, 25 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 09/18] nvme: Move pr code to it's own file
On 4/7/23 22:05, Mike Christie wrote: This patch moves the pr code to it's own file because I'm going to be adding more functions and core.c is getting bigger. Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Reviewed-by: Keith Busch --- drivers/nvme/host/Makefile | 2 +- drivers/nvme/host/core.c | 148 -- drivers/nvme/host/nvme.h | 2 + drivers/nvme/host/pr.c | 158 + 4 files changed, 161 insertions(+), 149 deletions(-) create mode 100644 drivers/nvme/host/pr.c Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 08/18] nvme: Don't hardcode the data len for pr commands
On 4/7/23 22:05, Mike Christie wrote: Reservation Report support needs to pass in a variable sized buffer, so this patch has the pr command helpers take a data length argument. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Keith Busch Reviewed-by: Chaitanya Kulkarni --- drivers/nvme/host/core.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 02/18] block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT
On 4/7/23 22:05, Mike Christie wrote: BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's locking feature which works similar to NVMe/SCSI reservations where a host can get a lock on a device and when the lock is taken it will get failures. This patch renames BLK_STS_NEXUS so it better reflects this type of use. Signed-off-by: Mike Christie Acked-by: Stefan Haberland Reviewed-by: Bart Van Assche --- block/blk-core.c | 2 +- drivers/nvme/host/core.c | 2 +- drivers/s390/block/dasd.c | 7 ++- drivers/scsi/scsi_lib.c | 2 +- include/linux/blk_types.h | 4 ++-- 5 files changed, 11 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 07/18] nvme: Fix reservation status related structs
On 4/7/23 22:05, Mike Christie wrote: This fixes the following issues with the reservation status structs: 1. resv10 is bytes 23:10 so it should be 14 bytes. 2. regctl_ds only supports 64 bit host IDs. These are not currently used, but will be in this patchset which adds support for the reservation report command. Signed-off-by: Mike Christie --- include/linux/nvme.h | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 06/18] dm: Add support for block PR read keys/reservation
On 4/7/23 22:05, Mike Christie wrote: This adds support in dm for the block PR read keys and read reservation callouts. Signed-off-by: Mike Christie --- drivers/md/dm.c | 69 + 1 file changed, 69 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 05/18] scsi: Add support for block PR read keys/reservation
On 4/7/23 22:05, Mike Christie wrote: This adds support in sd.c for the block PR read keys and read reservation callouts, so upper layers like LIO can get the PR info that's been setup using the existing pr callouts and return it to initiators. Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Reviewed-by: Christoph Hellwig --- drivers/scsi/scsi_common.c | 21 + drivers/scsi/sd.c | 91 ++ include/scsi/scsi_common.h | 1 + include/scsi/scsi_proto.h | 5 +++ 4 files changed, 118 insertions(+) Unfortunately this ties our PR abstraction to the SCSI PR model. Let's hope that other protocols like NVMe will map onto that, too. But in the meantime: Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 04/18] scsi: Move sd_pr_type to scsi_common
On 4/7/23 22:05, Mike Christie wrote: LIO is going to want to do the same block to/from SCSI pr types as sd.c so this moves the sd_pr_type helper to scsi_common and renames it. The next patch will then also add a helper to go from the SCSI value to the block one for use with PERSISTENT_RESERVE_IN commands. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni --- drivers/scsi/scsi_common.c | 22 ++ drivers/scsi/sd.c | 33 - include/scsi/scsi_common.h | 12 3 files changed, 42 insertions(+), 25 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 03/18] scsi: Rename sd_pr_command
On 4/7/23 22:05, Mike Christie wrote: Rename sd_pr_command to sd_pr_out_command to match a sd_pr_in_command helper added in the next patches. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Bart Van Assche --- drivers/scsi/sd.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [LSF/MM/BPF TOPIC] Linux Security Summit cross-over?
On 2/24/23 22:37, Steve French wrote: I did one on network fs security at a security summit before that would still be relevant to both On Tue, Feb 21, 2023, 16:16 James Bottomley <mailto:james.bottom...@hansenpartnership.com>> wrote: On Tue, 2023-02-21 at 16:55 +, David Howells wrote: > > Since the first day of the LSS is the same as the final day of LSF > and in the same venue, are there any filesystem + security subjects > that would merit a common session? I've got one: Cryptographic material handling. Subtitle could be: making keyrings more usable. The broad problem is that the use of encryption within the kernel is growing (from the old dm-crypt to the newer fscrypt and beyond) yet pretty much all of our cryptographic key material handling violates the principle of least privilege. The latest one (which I happened to notice being interested in TPMs) is the systemd tpm2 cryptenroll. The specific violation is that key unwrapping should occur as close as possible to use: when the kernel uses a key, it should be the kernel unwrapping it not unwrapping in user space and handing the unwrapped key down to the kernel because that gives a way. We got here because in most of the old uses, the key is derived from a passphrase and the kernel can't prompt the user, so pieces of user space have to gather the precursor cryptographic material anyway. However, we're moving towards using cryptographic devices (like the TPM, key fobs and the like) to store keys we really should be passing the wrapped key into the kernel and letting it do the unwrap to preserve least privilege. dm-crypt has some support for using kernel based TPM keys (the trusted key subsystem), but this isn't propagated into systemd-cryptenroll and pretty much none of the other encryption systems make any attempt to use keyrings for unwrap handling, even if they use keyrings to store cryptographic material. Part of the reason seems to be that the keyrings subsystem itself is hard to use as a generic "unwrapper" since the consumer of the keys has to know exactly the key type to consume the protected payload. We could possibly fix this by adding a payload accessor function so the keyring consumer could access a payload from any key type and thus benefit from in-kernel unwrapping, but there are likely a host of other issues that need to be solved. So what I'd really like to discuss is: Given the security need for a generic in-kernel unwrapper, should we make keyrings do this and if so, how? That's one where I'd be interested in, too; for NVMe-over-TLS we'll be using keyrings, too, and have the same issue as to how and where keys should be decoded (eg for X.509 handling). Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 2/9] block: Add copy offload support infrastructure
On 1/12/23 12:58, Nitesh Shetty wrote: Introduce blkdev_issue_copy which supports source and destination bdevs, and an array of (source, destination and copy length) tuples. Introduce REQ_COPY copy offload operation flag. Create a read-write bio pair with a token as payload and submitted to the device in order. Read request populates token with source specific information which is then passed with write request. This design is courtesy Mikulas Patocka's token based copy Larger copy will be divided, based on max_copy_sectors limit. Signed-off-by: Nitesh Shetty Signed-off-by: Anuj Gupta --- block/blk-lib.c | 358 ++ block/blk.h | 2 + include/linux/blk_types.h | 44 + include/linux/blkdev.h| 3 + include/uapi/linux/fs.h | 15 ++ 5 files changed, 422 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index e59c3069e835..2ce3c872ca49 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -115,6 +115,364 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } EXPORT_SYMBOL(blkdev_issue_discard); +/* + * For synchronous copy offload/emulation, wait and process all in-flight BIOs. + * This must only be called once all bios have been issued so that the refcount + * can only decrease. This just waits for all bios to make it through + * bio_copy_*_write_end_io. IO errors are propagated through cio->io_error. + */ +static int cio_await_completion(struct cio *cio) +{ + int ret = 0; + + atomic_dec(>refcount); + + if (cio->endio) + return 0; + + if (atomic_read(>refcount)) { + __set_current_state(TASK_UNINTERRUPTIBLE); + blk_io_schedule(); + } + Wouldn't it be better to use 'atomic_dec_return()' to avoid a potential race condition between atomic_dec() and atomic_read()? + ret = cio->io_err; + kfree(cio); + + return ret; +} + +static void blk_copy_offload_write_end_io(struct bio *bio) +{ + struct copy_ctx *ctx = bio->bi_private; + struct cio *cio = ctx->cio; + sector_t clen; + int ri = ctx->range_idx; + + if (bio->bi_status) { + cio->io_err = blk_status_to_errno(bio->bi_status); + clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - + cio->ranges[ri].dst; + cio->ranges[ri].comp_len = min_t(sector_t, clen, + cio->ranges[ri].comp_len); + } + __free_page(bio->bi_io_vec[0].bv_page); + bio_put(bio); + + if (atomic_dec_and_test(>refcount)) + kfree(ctx); + if (atomic_dec_and_test(>refcount)) { _Two_ atomic_dec() in a row? Why? And if that really is required please add a comment. + if (cio->endio) { + cio->endio(cio->private, cio->io_err); + kfree(cio); + } else + blk_wake_io_task(cio->waiter); + } +} + +static void blk_copy_offload_read_end_io(struct bio *read_bio) +{ + struct copy_ctx *ctx = read_bio->bi_private; + struct cio *cio = ctx->cio; + sector_t clen; + int ri = ctx->range_idx; + unsigned long flags; + + if (read_bio->bi_status) { + cio->io_err = blk_status_to_errno(read_bio->bi_status); + goto err_rw_bio; + } + + /* For zoned device, we check if completed bio is first entry in linked +* list, +* if yes, we start the worker to submit write bios. +* if not, then we just update status of bio in ctx, +* once the worker gets scheduled, it will submit writes for all +* the consecutive REQ_COPY_READ_COMPLETE bios. +*/ + if (bdev_is_zoned(ctx->write_bio->bi_bdev)) { + spin_lock_irqsave(>list_lock, flags); + ctx->status = REQ_COPY_READ_COMPLETE; + if (ctx == list_first_entry(>list, + struct copy_ctx, list)) { + spin_unlock_irqrestore(>list_lock, flags); + schedule_work(>dispatch_work); + goto free_read_bio; + } + spin_unlock_irqrestore(>list_lock, flags); + } else + schedule_work(>dispatch_work); + +free_read_bio: + bio_put(read_bio); + + return; + +err_rw_bio: + clen = (read_bio->bi_iter.bi_sector << SECTOR_SHIFT) - + cio->ranges[ri].src; + cio->ranges[ri].comp_len = min_t(sector_t, clen, + cio->ranges[ri].comp_len); + __free_page(read_bio->bi_io_vec[0].bv_page); + bio_put(ctx->write_bio); + bio_put(read_bio); + if (atomic_dec_and_test(>refcount)) + kfree(ctx); + if (atomic_dec_and_test(>refcount)) { Same here. + if (cio->endio) { +
Re: [dm-devel] [PATCH v6 3/9] block: add emulation for copy
On 1/12/23 15:46, Hannes Reinecke wrote: On 1/12/23 12:58, Nitesh Shetty wrote: For the devices which does not support copy, copy emulation is added. Copy-emulation is implemented by reading from source ranges into memory and writing to the corresponding destination asynchronously. For zoned device we maintain a linked list of read submission and try to submit corresponding write in same order. Also emulation is used, if copy offload fails or partially completes. Signed-off-by: Nitesh Shetty Signed-off-by: Vincent Fu Signed-off-by: Anuj Gupta --- block/blk-lib.c | 241 - block/blk-map.c | 4 +- include/linux/blkdev.h | 3 + 3 files changed, 245 insertions(+), 3 deletions(-) I'm not sure if I agree with this one. You just submitted a patch for device-mapper to implement copy offload, which (to all intents and purposes) _is_ an emulation. So why do we need to implement it in the block layer as an emulation? Or, if we have to, why do we need the device-mapper emulation? This emulation will be doing the same thing, no? Sheesh. One should read the entire patchset. Disregard the above comment. Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 3/9] block: add emulation for copy
On 1/12/23 12:58, Nitesh Shetty wrote: For the devices which does not support copy, copy emulation is added. Copy-emulation is implemented by reading from source ranges into memory and writing to the corresponding destination asynchronously. For zoned device we maintain a linked list of read submission and try to submit corresponding write in same order. Also emulation is used, if copy offload fails or partially completes. Signed-off-by: Nitesh Shetty Signed-off-by: Vincent Fu Signed-off-by: Anuj Gupta --- block/blk-lib.c| 241 - block/blk-map.c| 4 +- include/linux/blkdev.h | 3 + 3 files changed, 245 insertions(+), 3 deletions(-) I'm not sure if I agree with this one. You just submitted a patch for device-mapper to implement copy offload, which (to all intents and purposes) _is_ an emulation. So why do we need to implement it in the block layer as an emulation? Or, if we have to, why do we need the device-mapper emulation? This emulation will be doing the same thing, no? Cheers, Hannes -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.
On 6/4/22 19:13, michael.chris...@oracle.com wrote: On 6/4/22 2:38 AM, Hannes Reinecke wrote: On 6/3/22 21:45, Keith Busch wrote: On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote: @@ -171,6 +171,7 @@ static const struct { /* zone device specific errors */ [BLK_STS_ZONE_OPEN_RESOURCE] = { -ETOOMANYREFS, "open zones exceeded" }, [BLK_STS_ZONE_ACTIVE_RESOURCE] = { -EOVERFLOW, "active zones exceeded" }, + [BLK_STS_RSV_CONFLICT] = { -EBADE, "resevation conflict" }, You misspelled "reservation". :) And since you want a different error, why reuse EBADE for the errno? That is already used for BLK_STS_NEXUS that you're trying to differentiate from, right? At least for nvme, this error code is returned when the host lacks sufficient rights, so something like EACCESS might make sense. Looks good otherwise. Welll ... BLK_STS_NEXUS _is_ the reservation error. I was not sure of xen/virtio scsi uses of BLK_STS_NEXUS/DID_NEXUS_FAILURE. The virtio spec's description for VIRTIO_SCSI_S_NEXUS_FAILURE: if the nexus is suffering a failure but retrying on other paths might yield a different result. looks like the description for DID_NEXUS_FAILURE in scsi_status.h. To me the the description sounded generic where it could used for other errors like the endpoint/port for the I_T is removed. However, the qemu code only uses VIRTIO_SCSI_S_NEXUS_FAILURE for reservation conflicts. If we are saying that is always the case in other virt implementations, I don't even need this patch :) and we can do what you requested and do more of a rename. Well ... we tried to find a generic error for reservation failure, as we thought that reservation failure was too SCSI specific. And we wanted the error to describe what the resulting handling should be, not what the cause was. Hence we ended up with BLK_STS_NEXUS. But turns out that our initial assumption wasn't valid, and that reservations are a general concept. So by all means, rename BLK_STS_NEXUS to BLK_STS_RSV_CONFLICT to make it clear what this error is about. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.
On 6/3/22 21:45, Keith Busch wrote: On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote: @@ -171,6 +171,7 @@ static const struct { /* zone device specific errors */ [BLK_STS_ZONE_OPEN_RESOURCE]= { -ETOOMANYREFS, "open zones exceeded" }, [BLK_STS_ZONE_ACTIVE_RESOURCE] = { -EOVERFLOW, "active zones exceeded" }, + [BLK_STS_RSV_CONFLICT] = { -EBADE, "resevation conflict" }, You misspelled "reservation". :) And since you want a different error, why reuse EBADE for the errno? That is already used for BLK_STS_NEXUS that you're trying to differentiate from, right? At least for nvme, this error code is returned when the host lacks sufficient rights, so something like EACCESS might make sense. Looks good otherwise. Welll ... BLK_STS_NEXUS _is_ the reservation error. I'd rather modify the existing code to return BLK_STS_RSV_CONFLICT instead of BLK_STS_NEXUS, but keep the 'EBADE' mapping. Userspace ABI and all that. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size
On 5/24/22 09:10, Pankaj Raghav wrote: On 5/24/22 07:19, Hannes Reinecke wrote: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c4e4c7071b7b..f5c7a41032ba 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -676,6 +676,21 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q, return div64_u64(sector, zone_sectors); } +static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec) +{ + sector_t zone_sectors = blk_queue_zone_sectors(q); + u64 remainder = 0; + + if (!blk_queue_is_zoned(q)) + return false; + Not sure if we need this here; surely blk_queue_zone_sectors() will already barf, and none of the callers did this check. I totally agree with you but all the other blk_queue_* functions had this defensive check and I didn't want to break that pattern: Ah, fair enough. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 6/7] null_blk: use zone_size_sects_shift for power of 2 zoned devices
On 5/23/22 18:16, Pankaj Raghav wrote: Instead of doing is_power_of_2 and ilog2 operation for every IO, cache the zone_size_sects_shift variable and use it for power of 2 zoned devices. This variable will be set to zero for non power of 2 zoned devices. Suggested-by: Damien Le Moal Signed-off-by: Pankaj Raghav --- drivers/block/null_blk/null_blk.h | 6 ++ drivers/block/null_blk/zoned.c| 10 -- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index 78eb56b0ca55..3d6e41a9491f 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -74,6 +74,12 @@ struct nullb_device { unsigned int imp_close_zone_no; struct nullb_zone *zones; sector_t zone_size_sects; + /* +* zone_size_sects_shift is only useful when the zone size is +* power of 2. This variable is set to zero when zone size is non +* power of 2. +*/ + unsigned int zone_size_sects_shift; bool need_zone_res_mgmt; spinlock_t zone_res_lock; diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c index 00c34e65ef0a..806bef98ac83 100644 --- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -13,8 +13,8 @@ static inline sector_t mb_to_sects(unsigned long mb) static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect) { - if (is_power_of_2(dev->zone_size_sects)) - return sect >> ilog2(dev->zone_size_sects); + if (dev->zone_size_sects_shift) + return sect >> dev->zone_size_sects_shift; return div64_u64(sect, dev->zone_size_sects); } @@ -82,6 +82,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) zone_capacity_sects = mb_to_sects(dev->zone_capacity); dev_capacity_sects = mb_to_sects(dev->size); dev->zone_size_sects = mb_to_sects(dev->zone_size); + + if (is_power_of_2(dev->zone_size_sects)) + dev->zone_size_sects_shift = ilog2(dev->zone_size_sects); + else + dev->zone_size_sects_shift = 0; + dev->nr_zones = div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects), dev->zone_size_sects); Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 3/7] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
On 5/23/22 18:15, Pankaj Raghav wrote: Remove the condition which disallows non-power_of_2 zone size ZNS drive to be updated and use generic method to calculate number of zones instead of relying on log and shift based calculation on zone size. The power_of_2 calculation has been replaced directly with generic calculation without special handling. Both modified functions are not used in hot paths, they are only used during initialization & revalidation of the ZNS device. As rounddown macro from math.h does not work for 32 bit architectures, round down operation is open coded. Reviewed-by: Luis Chamberlain Reviewed by: Adam Manzanares Signed-off-by: Pankaj Raghav --- drivers/nvme/host/zns.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c index 9f81beb4df4e..d92f937d5cb9 100644 --- a/drivers/nvme/host/zns.c +++ b/drivers/nvme/host/zns.c @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf) } ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); - if (!is_power_of_2(ns->zsze)) { - dev_warn(ns->ctrl->device, - "invalid zone size:%llu for namespace:%u\n", - ns->zsze, ns->head->ns_id); - status = -ENODEV; - goto free_data; - } blk_queue_set_zoned(ns->disk, BLK_ZONED_HM); blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); @@ -128,8 +121,13 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns, const size_t min_bufsize = sizeof(struct nvme_zone_report) + sizeof(struct nvme_zone_descriptor); + /* +* Division is used to calculate nr_zones with no special handling +* for power of 2 zone sizes as this function is not invoked in a +* hot path +*/ nr_zones = min_t(unsigned int, nr_zones, -get_capacity(ns->disk) >> ilog2(ns->zsze)); +div64_u64(get_capacity(ns->disk), ns->zsze)); bufsize = sizeof(struct nvme_zone_report) + nr_zones * sizeof(struct nvme_zone_descriptor); @@ -182,6 +180,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, int ret, zone_idx = 0; unsigned int nz, i; size_t buflen; + u64 remainder = 0; if (ns->head->ids.csi != NVME_CSI_ZNS) return -EINVAL; @@ -197,7 +196,11 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; - sector &= ~(ns->zsze - 1); + /* +* Round down the sector value to the nearest zone start +*/ + div64_u64_rem(sector, ns->zsze, ); + sector -= remainder; while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) { memset(report, 0, buflen); Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size
n IS_ALIGNED(sec, zone_sectors); + + div64_u64_rem(sec, zone_sectors, ); + return remainder == 0; +} + static inline bool blk_queue_zone_is_seq(struct request_queue *q, sector_t sector) { @@ -722,6 +737,12 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q, { return 0; } + +static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec) +{ + return false; +} + static inline unsigned int queue_max_open_zones(const struct request_queue *q) { return 0; Otherwise looks good. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
On 5/19/22 20:47, Damien Le Moal wrote: On 5/19/22 16:34, Johannes Thumshirn wrote: On 19/05/2022 05:19, Damien Le Moal wrote: On 5/19/22 12:12, Luis Chamberlain wrote: On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote: On 5/18/22 00:34, Theodore Ts'o wrote: On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote: I'm a little surprised about all this activity. I though the conclusion at LSF/MM was that for Linux itself there is very little benefit in supporting this scheme. It will massively fragment the supported based of devices and applications, while only having the benefit of supporting some Samsung legacy devices. FWIW, That wasn't my impression from that LSF/MM session, but once the videos become available, folks can decide for themselves. There was no real discussion about zone size constraint on the zone storage BoF. Many discussions happened in the hallway track though. Right so no direct clear blockers mentioned at all during the BoF. Nor any clear OK. So what about creating a device-mapper target, that's taking npo2 drives and makes them po2 drives for the FS layers? It will be very similar code to dm-linear. +1 This will simplify the support for FSes, at least for the initial drop (if accepted). And more importantly, this will also allow addressing any potential problem with user space breaking because of the non power of 2 zone size. Seconded (or maybe thirded). The changes to support npo2 in the block layer are pretty simple, and really I don't have an issue with those. Then adding a device-mapper target transforming npo2 drives in po2 block devices should be pretty trivial. And once that is in you can start arguing with the the FS folks on whether to implement it natively. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy
On 4/26/22 12:12, Nitesh Shetty wrote: Add new BLKCOPY ioctl that offloads copying of one or more sources ranges to one or more destination in a device. COPY ioctl accepts a 'copy_range' structure that contains no of range, a reserved field , followed by an array of ranges. Each source range is represented by 'range_entry' that contains source start offset, destination start offset and length of source ranges (in bytes) MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. Example code, to issue BLKCOPY: /* Sample example to copy three entries with [dest,src,len], * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */ int main(void) { int i, ret, fd; unsigned long src = 0, dst = 32768, len = 4096; struct copy_range *cr; cr = (struct copy_range *)malloc(sizeof(*cr)+ (sizeof(struct range_entry)*3)); cr->nr_range = 3; cr->reserved = 0; for (i = 0; i< cr->nr_range; i++, src += len, dst += len) { cr->range_list[i].dst = dst; cr->range_list[i].src = src; cr->range_list[i].len = len; cr->range_list[i].comp_len = 0; } fd = open("/dev/nvme0n1", O_RDWR); if (fd < 0) return 1; ret = ioctl(fd, BLKCOPY, cr); if (ret != 0) printf("copy failed, ret= %d\n", ret); for (i=0; i< cr->nr_range; i++) if (cr->range_list[i].len != cr->range_list[i].comp_len) printf("Partial copy for entry %d: requested %llu, completed %llu\n", i, cr->range_list[i].len, cr->range_list[i].comp_len); close(fd); free(cr); return ret; } Signed-off-by: Nitesh Shetty Signed-off-by: Javier González Signed-off-by: Arnav Dawn --- block/ioctl.c | 32 include/uapi/linux/fs.h | 9 + 2 files changed, 41 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
On 4/26/22 12:12, Nitesh Shetty wrote: Add device limits as sysfs entries, - copy_offload (RW) - copy_max_bytes (RW) - copy_max_hw_bytes (RO) - copy_max_range_bytes (RW) - copy_max_range_hw_bytes (RO) - copy_max_nr_ranges (RW) - copy_max_nr_ranges_hw (RO) Above limits help to split the copy payload in block layer. copy_offload, used for setting copy offload(1) or emulation(0). copy_max_bytes: maximum total length of copy in single payload. copy_max_range_bytes: maximum length in a single entry. copy_max_nr_ranges: maximum number of entries in a payload. copy_max_*_hw_*: Reflects the device supported maximum limits. Signed-off-by: Nitesh Shetty Signed-off-by: Kanchan Joshi Signed-off-by: Arnav Dawn --- Documentation/ABI/stable/sysfs-block | 83 block/blk-settings.c | 59 block/blk-sysfs.c| 138 +++ include/linux/blkdev.h | 13 +++ 4 files changed, 293 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
} + ctx->cio = cio; + ctx->range_idx = ri; + ctx->start_sec = dst_blk; + + read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE, + gfp_mask); + if (!read_bio) { + ret = -ENOMEM; + goto err_read_bio; + } + read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT; + __bio_add_page(read_bio, token, PAGE_SIZE, 0); + /*__bio_add_page increases bi_size by len, so overwrite it with copy len*/ + read_bio->bi_iter.bi_size = copy_len; + ret = submit_bio_wait(read_bio); + bio_put(read_bio); + if (ret) + goto err_read_bio; + + write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE, + gfp_mask); + if (!write_bio) { + ret = -ENOMEM; + goto err_read_bio; + } + write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT; + __bio_add_page(write_bio, token, PAGE_SIZE, 0); + /*__bio_add_page increases bi_size by len, so overwrite it with copy len*/ + write_bio->bi_iter.bi_size = copy_len; + write_bio->bi_end_io = bio_copy_end_io; + write_bio->bi_private = ctx; + + spin_lock_irqsave(>lock, flags); + ++cio->refcount; + spin_unlock_irqrestore(>lock, flags); + + submit_bio(write_bio); + src_blk += copy_len; + dst_blk += copy_len; + } + } + Hmm. I'm not sure if I like the copy loop. What I definitely would do is to allocate the write bio before reading data; after all, if we can't allocate the write bio reading is pretty much pointless. But the real issue I have with this is that it's doing synchronous reads, thereby limiting the performance. Can't you submit the write bio from the end_io function of the read bio? That would disentangle things, and we should be getting a better performance. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm: fix race condition on Alpha
On 3/28/22 18:34, Mikulas Patocka wrote: Early alpha processors cannot write a single byte or short; they read 8 bytes, modify the value in registers and write back 8 bytes. This could cause race condition in the structure dm_io - if the fields flags and io_count are modified simultaneously. Fix this bug by using 32-bit flags if we are on Alpha and if we are compiling for a processor that doesn't have the byte-word-extension. Errm. Seeing that those alphas could only modify 64 bit atomically, what exactly do you gain by using 32-bit flags? Shouldn't you be using 64-bit flags? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()
On 3/22/22 16:17, John Garry wrote: On 22/03/2022 14:03, Hannes Reinecke wrote: As mentioned in the cover letter response, it just seems best to keep the normal scsi_cmnd payload but have other means to add on the internal command data, like using host_scribble or scsi_cmnd priv data. Well; I found that most drivers I had been looking at the scsi command payload isn't used at all; the drivers primarily cared about the (driver-provided) payload, and were completely ignoring the scsi command payload. Similar for ATA/libsas: you basically never issue real scsi commands, but either 'raw' ATA requests or SCSI TMFs. None of which are scsi commands, so providing them is a bit of a waste. (And causes irritations, too, as a scsi command requires associated pointers like ->device etc to be set up. Which makes it tricky to use for the initial device setup.) A problem I see is that in scsi_mq_init_request() we allocate memories like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd payload. If we then reuse a scsi_cmnd payload as an "internal" command payload then this data may be lost. It might be possible to reuse the scsi cmnd payload for the "internal", but I would rather not get hung up on it now if possible. Or, keep the payload as is, and use the 'internal' marker to indicate that the scsi payload is not valid. That would save us quite some checks during endio processing. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()
On 3/22/22 13:30, John Garry wrote: On 22/03/2022 12:16, Hannes Reinecke wrote: On 3/22/22 12:33, John Garry wrote: On 22/03/2022 11:18, Christoph Hellwig wrote: On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: Add an API to allocate a request queue which accepts a custom set of blk_mq_ops for that request queue. The reason which we may want custom ops is for queuing requests which we don't want to go through the normal queuing path. Eww. I really do not think we should do separate ops per queue, as that is going to get us into a deep mess eventually. Yeah... so far (here) it works out quite nicely, as we don't need to change the SCSI blk mq ops nor allocate a scsi_device - everything is just separate. The other method mentioned previously was to add the request "reserved" flag and add new paths in scsi_queue_rq() et al to handle this, but that gets messy. Any other ideas ...? As outlined in the other mail, I think might be useful is to have a _third_ type of requests (in addition to the normal and the reserved ones). That one would be allocated from the normal I/O pool (and hence could fail if the pool is exhausted), but would be able to carry a different payload (type) than the normal requests. As mentioned in the cover letter response, it just seems best to keep the normal scsi_cmnd payload but have other means to add on the internal command data, like using host_scribble or scsi_cmnd priv data. Well; I found that most drivers I had been looking at the scsi command payload isn't used at all; the drivers primarily cared about the (driver-provided) payload, and were completely ignoring the scsi command payload. Similar for ATA/libsas: you basically never issue real scsi commands, but either 'raw' ATA requests or SCSI TMFs. None of which are scsi commands, so providing them is a bit of a waste. (And causes irritations, too, as a scsi command requires associated pointers like ->device etc to be set up. Which makes it tricky to use for the initial device setup.) And we could have a separate queue_rq for these requests, as we can differentiate them in the block layer. I don't know, let me think about it. Maybe we could add an "internal" blk flag, which uses a separate "internal" queue_rq callback. Yeah, that's what I had in mind. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()
On 3/22/22 12:33, John Garry wrote: On 22/03/2022 11:18, Christoph Hellwig wrote: On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: Add an API to allocate a request queue which accepts a custom set of blk_mq_ops for that request queue. The reason which we may want custom ops is for queuing requests which we don't want to go through the normal queuing path. Eww. I really do not think we should do separate ops per queue, as that is going to get us into a deep mess eventually. Yeah... so far (here) it works out quite nicely, as we don't need to change the SCSI blk mq ops nor allocate a scsi_device - everything is just separate. The other method mentioned previously was to add the request "reserved" flag and add new paths in scsi_queue_rq() et al to handle this, but that gets messy. Any other ideas ...? As outlined in the other mail, I think might be useful is to have a _third_ type of requests (in addition to the normal and the reserved ones). That one would be allocated from the normal I/O pool (and hence could fail if the pool is exhausted), but would be able to carry a different payload (type) than the normal requests. And we could have a separate queue_rq for these requests, as we can differentiate them in the block layer. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH RFC 00/11] blk-mq/libata/scsi: SCSI driver tagging improvements
On 3/22/22 11:39, John Garry wrote: Currently SCSI low-level drivers are required to manage tags for commands which do not come via the block layer - libata internal commands would be an example of one of these. There was some work to provide "reserved commands" support in such series as https://lore.kernel.org/linux-scsi/20211125151048.103910-1-h...@suse.de/ This was based on allocating a request for the lifetime of the "internal" command. This series tries to solve that problem by not just allocating the request but also sending it through the block layer, that being the normal flow for a request. We need to do this as we may only poll completion of requests through the block layer, so would need to do this for poll queue support. There is still scope to allocate commands just to get a tag as token as that may suit some other scenarios, but it's not what we do here. This series extends blk-mq to support a request queue having a custom set of ops. In addition SCSI core code adds support for these type of requests. This series does not include SCSI core handling for enabling reserved tags per tagset, but that would be easy to add. Based on mkp-scsi 5.18/scsi-staging @ 66daf3e6b993 Please consider as an RFC for now. I think that the libata change has the largest scope for improvement... Grand seeing that someone is taking it up. Thanks for doing this! But: Allocating a queue for every request (eg in patch 3) is overkill. If we want to go that route we should be allocating the queue upfront (eg when creating the device itself), and then just referencing it. However, can't say I like this approach. I've been playing around with supporting internal commands, and really found two constraints really annoying: - The tagset supports only _one_ set of payload via blk_mq_rq_(to,from)_pdu(). This requires each request to be of the same type, and with that making it really hard for re-purposing the request for internal usage. In the end I settled by just keeping it and skipping the SCSI command field. If we could have a distinct PDU type for internal commands I guess things would be easier. - The number of reserved commands is static. With that it's getting really hard using reserved commands with low-queue depth devices like ATA; we only have 31 commands to work with, and setting one or two aside for TMF is really making a difference performance wise. It would be _awesome_ if we could allocate reserved commands dynamically (ie just marking a command as 'reserved' once allocated). Sure, it won't have the same guarantees as 'real' reserved commands, but in most cases we don't actually need that. Maybe these are some lines we could investigate? Hmm? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/2] dm-zoned: remove the ->name field in struct dmz_dev
On 3/1/22 09:38, Christoph Hellwig wrote: Just use the %pg format specifier to print the block device name directly. Signed-off-by: Christoph Hellwig --- drivers/md/dm-zoned-metadata.c | 4 ++-- drivers/md/dm-zoned-target.c | 1 - drivers/md/dm-zoned.h | 9 - 3 files changed, 6 insertions(+), 8 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
On 2/4/22 10:58, Chaitanya Kulkarni wrote: On 2/4/22 12:24 AM, Javier González wrote: [ .. ] For a software-only solution, we have experimented with something similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to the nvme-loop target as an alternative and this seems to work pretty nicely. I do not believe there should be many changes to support copy offload using this. If QEMU is so incompetent then we need to add every big feature into the NVMeOF test target so that we can test it better ? is that what you are proposing ? since if we implement one feature, it will be hard to nack any new features that ppl will come up with same rationale "with QEMU being slow and hard to test race conditions etc .." How would you use qemu for bare-metal testing? and if that is the case why we don't have ZNS NVMeOF target memory backed emulation ? Isn't that a bigger and more complicated feature than Simple Copy where controller states are involved with AENs ? ZNS kernel code testing is also done on QEMU, I've also fixed bugs in the ZNS kernel code which are discovered on QEMU and I've not seen any issues with that. Given that simple copy feature is way smaller than ZNS it will less likely to suffer from slowness and etc (listed above) in QEMU. my point is if we allow one, we will be opening floodgates and we need to be careful not to bloat the code unless it is _absolutely necessary_ which I don't think it is based on the simple copy specification. I do have a slightly different view on the nvme target code; it should provide the necessary means to test the nvme host code. And simple copy is on of these features, especially as it will operate as an exploiter of the new functionality. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote: On 10/7/21 11:49 PM, Javier González wrote: External email: Use caution opening links or attachments On 06.10.2021 10:33, Bart Van Assche wrote: On 10/6/21 3:05 AM, Javier González wrote: I agree that the topic is complex. However, we have not been able to find a clear path forward in the mailing list. Hmm ... really? At least Martin Petersen and I consider device mapper support essential. How about starting from Mikulas' patch series that supports the device mapper? See also https://lore.kernel.org/all/alpine.lrh.2.02.2108171630120.30...@file01.intranet.prod.int.rdu2.redhat.com/ When we add a new REQ_OP_XXX we need to make sure it will work with device mapper, so I agree with Bart and Martin. Starting with Mikulas patches is a right direction as of now.. Thanks for the pointers. We are looking into Mikulas' patch - I agree that it is a good start. What do you think about joining the call to talk very specific next steps to get a patchset that we can start reviewing in detail. I can do that. Thanks. I will wait until Chaitanya's reply on his questions. We will start suggesting some dates then. I think at this point we need to at least decide on having a first call focused on how to proceed forward with Mikulas approach ... Javier, can you please organize a call with people you listed in this thread earlier ? Also Keith presented his work on a simple zone-based remapping block device, which included an in-kernel copy offload facility. Idea is to lift that as a standalone patch such that we can use it a fallback (ie software) implementation if no other copy offload mechanism is available. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH RFC] multipath-tools: remove Hannes as IBM arrays maintainer
On 10/1/21 8:12 PM, Xose Vazquez Perez wrote: Cc: Hannes Reinecke Cc: Martin Wilck Cc: Benjamin Marzinski Cc: Christophe Varoqui Cc: DM-DEVEL ML Signed-off-by: Xose Vazquez Perez --- libmultipath/hwtable.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c index 763982cd..11282699 100644 --- a/libmultipath/hwtable.c +++ b/libmultipath/hwtable.c @@ -482,8 +482,6 @@ static struct hwentry default_hw[] = { }, /* * IBM -* -* Maintainer: Hannes Reinecke */ { /* ProFibre 4000R */ Hmm. You could've asked me ... Anyway. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 2/8] scsi/sr: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/scsi/sr.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 1/8] scsi/sd: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/scsi/sd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 8/8] nbd: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/block/nbd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 7/8] loop: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/block/loop.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 6/8] dm: add add_disk() error handling
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. There are two calls to dm_setup_md_queue() which can fail then, one on dm_early_create() and we can easily see that the error path there calls dm_destroy in the error path. The other use case is on the ioctl table_load case. If that fails userspace needs to call the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other failure. Signed-off-by: Luis Chamberlain --- drivers/md/dm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 5/8] md: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. We just do the unwinding of what was not done before, and are sure to unlock prior to bailing. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/md/md.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 3/8] nvme: add error handling support for add_disk()
On 8/30/21 11:25 PM, Luis Chamberlain wrote: We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/nvme/host/core.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 8679a108f571..687d3be563a3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3763,7 +3763,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid, nvme_get_ctrl(ctrl); - device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups); + if (device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups)) + goto out_cleanup_ns_from_list; + if (!nvme_ns_head_multipath(ns->head)) nvme_add_ns_cdev(ns); @@ -3773,6 +3775,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid, return; + out_cleanup_ns_from_list: + nvme_put_ctrl(ctrl); + down_write(>namespaces_rwsem); + list_del_init(>list); + up_write(>namespaces_rwsem); out_unlink_ns: mutex_lock(>subsys->lock); list_del_rcu(>siblings); I would rather turn this around, and call 'nvme_put_ctrl()' after removing the namespace from the list. But it's probably more a style issue, come to think of it. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel