Re: [dm-devel] [PATCH] dm-zoned: free dmz->ddev array in dmz_put_zoned_device

2023-09-20 Thread Hannes Reinecke

On 9/20/23 12:51, Fedor Pchelkin wrote:

Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices")
made the pointers to additional zoned devices to be stored in a
dynamically allocated dmz->ddev array. However, this array is not freed.

Free it when cleaning up zoned device information inside
dmz_put_zoned_device(). Assigning NULL to dmz->ddev elements doesn't make
sense there as they are not supposed to be reused later and the whole dmz
target structure is being cleaned anyway.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices")
Cc: sta...@vger.kernel.org
Signed-off-by: Fedor Pchelkin 
---
  drivers/md/dm-zoned-target.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index ad8e670a2f9b..e25cd9db6275 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -753,12 +753,10 @@ static void dmz_put_zoned_device(struct dm_target *ti)
struct dmz_target *dmz = ti->private;
int i;
  
-	for (i = 0; i < dmz->nr_ddevs; i++) {

-   if (dmz->ddev[i]) {
+   for (i = 0; i < dmz->nr_ddevs; i++)
+   if (dmz->ddev[i])
dm_put_device(ti, dmz->ddev[i]);
-   dmz->ddev[i] = NULL;
-   }
-   }
+   kfree(dmz->ddev);
  }
  
  static int dmz_fixup_devices(struct dm_target *ti)


Hmm. I'm not that happy with it; dmz_put_zoned_device() is using 
dm_target as an argument, whereas all of the functions surrounding the 
call sites is using the dmz_target directly.


Mind to modify the function to use 'struct dmz_target' as an argument?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 09/12] dm: Add support for copy offload

2023-09-11 Thread Hannes Reinecke

On 9/11/23 09:07, Nitesh Shetty wrote:

On Fri, Sep 08, 2023 at 08:13:37AM +0200, Hannes Reinecke wrote:

On 9/6/23 18:38, Nitesh Shetty wrote:

Before enabling copy for dm target, check if underlying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently splitting copy
request is not supported.


And here is where I would have expected the emulation to take place;
didn't you have it in one of the earlier iterations?


No, but it was the other way round.
In dm-kcopyd we used device offload, if that was possible, before using default
dm-mapper copy. It was dropped in the current series,
to streamline the patches and make the series easier to review.


After all, device-mapper already has the infrastructure for copying
data between devices, so adding a copy-offload emulation for device-mapper
should be trivial.

I did not understand this, can you please elaborate ?


Please see my comments to patch 04.
We should only implement copy-offload if there is a dedicated 
infrastructure in place. But we should not have a 'generic' copy-offload 
emulation.
Problem is that 'real' copy-offload functionalities (ie for NVMe or 
SCSI) are riddled with corner-cases where copy-offload does _not_ work,

and where commands might fail if particular conditions are not met.
Falling back to a generic implementation will cause applications to 
assume that copy-offload worked, and that it gained performance as

the application just had to issue a single command.
Whereas in fact the opposite is true; it wasn't a single command, and 
the application might have performed better by issuing the commands

itself.
Returning -EOPNOTSUPP in these cases will inform the application that 
the attempt didn't work, and that it will have to fall back to the

'normal' copy.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 04/12] block: add emulation for copy

2023-09-11 Thread Hannes Reinecke

On 9/11/23 09:09, Nitesh Shetty wrote:

On Fri, Sep 08, 2023 at 08:06:38AM +0200, Hannes Reinecke wrote:

On 9/6/23 18:38, Nitesh Shetty wrote:

For the devices which does not support copy, copy emulation is added.
It is required for in-kernel users like fabrics, where file descriptor is
not available and hence they can't use copy_file_range.
Copy-emulation is implemented by reading from source into memory and
writing to the corresponding destination.
Also emulation can be used, if copy offload fails or partially completes.
At present in kernel user of emulation is NVMe fabrics.


Leave out the last sentence; I really would like to see it enabled for SCSI,
too (we do have copy offload commands for SCSI ...).


Sure, will do that


And it raises all the questions which have bogged us down right from the
start: where is the point in calling copy offload if copy offload is not
implemented or slower than copying it by hand?
And how can the caller differentiate whether copy offload bring a benefit to
him?

IOW: wouldn't it be better to return -EOPNOTSUPP if copy offload is not
available?


Present approach treats copy as a background operation and the idea is to
maximize the chances of achieving copy by falling back to emulation.
Having said that, it should be possible to return -EOPNOTSUPP,
in case of offload IO failure or device not supporting offload.
We will update this in next version.

That is also what I meant with my comments to patch 09/12: I don't see 
it as a benefit to _always_ fall back to a generic copy-offload 
emulation. After all, that hardly brings any benefit.
Where I do see a benefit is to tie in the generic copy-offload 
_infrastructure_ to existing mechanisms (like dm-kcopyd).
But if there is no copy-offload infrastructure available then we really 
should return -EOPNOTSUPP as it really is not supported.


In the end, copy offload is not a command which 'always works'.
It's a command which _might_ deliver benefits (ie better performance) if 
dedicated implementations are available and certain parameters are met. 
If not then copy offload is not the best choice, and applications will 
need to be made aware of that.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 12/12] null_blk: add support for copy offload

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

Implementation is based on existing read and write infrastructure.
copy_max_bytes: A new configfs and module parameter is introduced, which
can be used to set hardware/driver supported maximum copy limit.
Only request based queue mode will support for copy offload.
Added tracefs support to copy IO tracing.

Suggested-by: Damien Le Moal 
Signed-off-by: Anuj Gupta 
Signed-off-by: Nitesh Shetty 
Signed-off-by: Vincent Fu 
---
  Documentation/block/null_blk.rst  |  5 ++
  drivers/block/null_blk/main.c | 97 ++-
  drivers/block/null_blk/null_blk.h |  1 +
  drivers/block/null_blk/trace.h| 23 
  4 files changed, 123 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 11/12] null: Enable trace capability for null block

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

This is a prep patch to enable copy trace capability.
At present only zoned null_block is using trace, so we decoupled trace
and zoned dependency to make it usable in null_blk driver also.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Anuj Gupta 
---
  drivers/block/null_blk/Makefile | 2 --
  drivers/block/null_blk/main.c   | 3 +++
  drivers/block/null_blk/trace.h  | 2 ++
  drivers/block/null_blk/zoned.c  | 1 -
  4 files changed, 5 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 10/12] dm: Enable copy offload for dm-linear target

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

Setting copy_offload_supported flag to enable offload.

Signed-off-by: Nitesh Shetty 
---
  drivers/md/dm-linear.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index f4448d520ee9..1d1ee30bbefb 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
ti->num_discard_bios = 1;
ti->num_secure_erase_bios = 1;
ti->num_write_zeroes_bios = 1;
+   ti->copy_offload_supported = 1;
ti->private = lc;
return 0;
  

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 09/12] dm: Add support for copy offload

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

Before enabling copy for dm target, check if underlying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently splitting copy
request is not supported.


And here is where I would have expected the emulation to take place;
didn't you have it in one of the earlier iterations?
After all, device-mapper already has the infrastructure for copying
data between devices, so adding a copy-offload emulation for 
device-mapper should be trivial.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 08/12] nvmet: add copy command support for bdev and file ns

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

Add support for handling nvme_cmd_copy command on target.

For bdev-ns if backing device supports copy offload we call device copy
offload (blkdev_copy_offload).
In case of partial completion from above or absence of device copy offload
capability, we fallback to copy emulation (blkdev_copy_emulation)

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

loop target has copy support, which can be used to test copy offload.
trace event support for nvme_cmd_copy.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Anuj Gupta 
---
  drivers/nvme/target/admin-cmd.c   |  9 ++-
  drivers/nvme/target/io-cmd-bdev.c | 97 +++
  drivers/nvme/target/io-cmd-file.c | 50 
  drivers/nvme/target/nvmet.h   |  4 ++
  drivers/nvme/target/trace.c   | 19 ++
  5 files changed, 177 insertions(+), 2 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 07/12] nvme: add copy offload support

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

Current design only supports single source range.
We receive a request with REQ_OP_COPY_SRC.
Parse this request which consists of src(1st) and dst(2nd) bios.
Form a copy command (TP 4065)

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits.

Signed-off-by: Kanchan Joshi 
Signed-off-by: Nitesh Shetty 
Signed-off-by: Javier González 
Signed-off-by: Anuj Gupta 
---
  drivers/nvme/host/constants.c |  1 +
  drivers/nvme/host/core.c  | 79 +++
  drivers/nvme/host/trace.c | 19 +
  include/linux/blkdev.h|  1 +
  include/linux/nvme.h  | 43 +--
  5 files changed, 140 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 06/12] fs, block: copy_file_range for def_blk_ops for direct block device

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

For direct block device opened with O_DIRECT, use copy_file_range to
issue device copy offload, and fallback to generic_copy_file_range incase
device copy offload capability is absent or the device files are not open
with O_DIRECT.

Signed-off-by: Anuj Gupta 
Signed-off-by: Nitesh Shetty 
---
  block/fops.c | 25 +
  1 file changed, 25 insertions(+)



Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 05/12] fs/read_write: Enable copy_file_range for block device.

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

From: Anuj Gupta 

This is a prep patch. Allow copy_file_range to work for block devices.
Relaxing generic_copy_file_checks allows us to reuse the existing infra,
instead of adding a new user interface for block copy offload.
Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
and inode_out. Allow block device in generic_file_rw_checks.

Signed-off-by: Anuj Gupta 
Signed-off-by: Nitesh Shetty 
---
  fs/read_write.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)



Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 04/12] block: add emulation for copy

2023-09-08 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

For the devices which does not support copy, copy emulation is added.
It is required for in-kernel users like fabrics, where file descriptor is
not available and hence they can't use copy_file_range.
Copy-emulation is implemented by reading from source into memory and
writing to the corresponding destination.
Also emulation can be used, if copy offload fails or partially completes.
At present in kernel user of emulation is NVMe fabrics.

Leave out the last sentence; I really would like to see it enabled for 
SCSI, too (we do have copy offload commands for SCSI ...).


And it raises all the questions which have bogged us down right from the 
start: where is the point in calling copy offload if copy offload is not 
implemented or slower than copying it by hand?
And how can the caller differentiate whether copy offload bring a 
benefit to him?


IOW: wouldn't it be better to return -EOPNOTSUPP if copy offload is not 
available?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 03/12] block: add copy offload support

2023-09-07 Thread Hannes Reinecke

On 9/7/23 09:16, Nitesh Shetty wrote:

On 07/09/23 07:49AM, Hannes Reinecke wrote:

On 9/6/23 18:38, Nitesh Shetty wrote:

Hmm. That looks a bit odd. Why do you have to use wait_for_completion?


wait_for_completion is waiting for all the copy IOs to complete,
when caller does not pass endio handler.
Copy IO submissions are still async, as in previous revisions.

Can't you submit the 'src' bio, and then submit the 'dst' bio from the 
endio handler of the 'src' bio?

We can't do this with the current bio merging approach.
'src' bio waits for the 'dst' bio to arrive in request layer.
Note that both bio's should be present in request reaching the driver,
to form the copy-cmd.


Hmm. I guess it would be possible, but in the end we can always change
it later once the infrastructure is in.

So:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 03/12] block: add copy offload support

2023-09-06 Thread Hannes Reinecke
nce this bio reaches request layer and find a request with previously
+ * sent source info we merge the destination bio and return.
+ * 3. Release the plug and request is sent to driver
+ * This design works only for drivers with request queue.
+ */
+ssize_t blkdev_copy_offload(struct block_device *bdev, loff_t pos_in,
+   loff_t pos_out, size_t len,
+   void (*endio)(void *, int, ssize_t),
+   void *private, gfp_t gfp)
+{
+   struct blkdev_copy_io *cio;
+   struct blkdev_copy_offload_io *offload_io;
+   struct bio *src_bio, *dst_bio;
+   ssize_t rem, chunk, ret;
+   ssize_t max_copy_bytes = bdev_max_copy_sectors(bdev) << SECTOR_SHIFT;
+   struct blk_plug plug;
+
+   if (!max_copy_bytes)
+   return -EINVAL;
+
+   ret = blkdev_copy_sanity_check(bdev, pos_in, bdev, pos_out, len);
+   if (ret)
+   return ret;
+
+   cio = kzalloc(sizeof(*cio), GFP_KERNEL);
+   if (!cio)
+   return -ENOMEM;
+   atomic_set(>refcount, 1);
+   cio->waiter = current;
+   cio->endio = endio;
+   cio->private = private;
+
+   /*
+* If there is a error, copied will be set to least successfully
+* completed copied length
+*/
+   cio->copied = len;
+   for (rem = len; rem > 0; rem -= chunk) {
+   chunk = min(rem, max_copy_bytes);
+
+   offload_io = kzalloc(sizeof(*offload_io), GFP_KERNEL);
+   if (!offload_io)
+   goto err_free_cio;
+   offload_io->cio = cio;
+   /*
+* For partial completion, we use offload_io->offset to truncate
+* successful copy length
+*/
+   offload_io->offset = len - rem;
+
+   src_bio = bio_alloc(bdev, 0, REQ_OP_COPY_SRC, gfp);
+   if (!src_bio)
+   goto err_free_offload_io;
+   src_bio->bi_iter.bi_size = chunk;
+   src_bio->bi_iter.bi_sector = pos_in >> SECTOR_SHIFT;
+
+   blk_start_plug();
+   dst_bio = blk_next_bio(src_bio, bdev, 0, REQ_OP_COPY_DST, gfp);
+   if (!dst_bio)
+   goto err_free_src_bio;
+   dst_bio->bi_iter.bi_size = chunk;
+   dst_bio->bi_iter.bi_sector = pos_out >> SECTOR_SHIFT;
+   dst_bio->bi_end_io = blkdev_copy_offload_dst_endio;
+   dst_bio->bi_private = offload_io;
+
+   atomic_inc(>refcount);
+   submit_bio(dst_bio);
+   blk_finish_plug();
+   pos_in += chunk;
+   pos_out += chunk;
+   }
+
+   if (atomic_dec_and_test(>refcount))
+   blkdev_copy_endio(cio);
+   if (cio->endio)
+   return -EIOCBQUEUED;
+
+   return blkdev_copy_wait_io_completion(cio);
+
+err_free_src_bio:
+   bio_put(src_bio);
+err_free_offload_io:
+   kfree(offload_io);
+err_free_cio:
+   cio->copied = min_t(ssize_t, cio->copied, (len - rem));
+   cio->status = -ENOMEM;
+   if (rem == len) {
+   kfree(cio);
+   return cio->status;
+   }
+   if (cio->endio)
+   return cio->status;
+
+   return blkdev_copy_wait_io_completion(cio);
+}
+EXPORT_SYMBOL_GPL(blkdev_copy_offload);


Hmm. That looks a bit odd. Why do you have to use wait_for_completion?
Can't you submit the 'src' bio, and then submit the 'dst' bio from the 
endio handler of the 'src' bio?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v15 02/12] Add infrastructure for copy offload in block and request layer.

2023-09-06 Thread Hannes Reinecke

On 9/6/23 18:38, Nitesh Shetty wrote:

We add two new opcode REQ_OP_COPY_SRC, REQ_OP_COPY_DST.
Since copy is a composite operation involving src and dst sectors/lba,
each needs to be represented by a separate bio to make it compatible
with device mapper.
We expect caller to take a plug and send bio with source information,
followed by bio with destination information.
Once the src bio arrives we form a request and wait for destination
bio. Upon arrival of destination we merge these two bio's and send
corresponding request down to device driver.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Anuj Gupta 
---
  block/blk-core.c  |  7 +++
  block/blk-merge.c | 41 +++
  block/blk.h   | 16 +++
  block/elevator.h  |  1 +
  include/linux/bio.h   |  6 +-
  include/linux/blk_types.h | 10 ++
  6 files changed, 76 insertions(+), 5 deletions(-)


Having two separate bios is okay, and what one would expect.
What is slightly strange is the merging functionality;
That could do with some more explanation why this approach was taken.
And also some checks in the merging code to avoid merging non-copy 
offload  bios.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 29/30] block: store the holder in file->private_data

2023-06-09 Thread Hannes Reinecke

On 6/8/23 13:02, Christoph Hellwig wrote:

Store the file struct used as the holder in file->private_data as an
indicator that this file descriptor was opened exclusively to  remove
the last use of FMODE_EXCL.

Signed-off-by: Christoph Hellwig 
---
  block/fops.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 27/30] block: replace fmode_t with a block-specific type for block open flags

2023-06-09 Thread Hannes Reinecke

On 6/8/23 13:02, Christoph Hellwig wrote:

The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig 
Acked-by: Jack Wang [rnbd]
---
  arch/um/drivers/ubd_kern.c  |  8 +++---
  arch/xtensa/platforms/iss/simdisk.c |  2 +-
  block/bdev.c| 32 +++---
  block/blk-zoned.c   |  8 +++---
  block/blk.h | 11 
  block/fops.c| 32 +-
  block/genhd.c   |  8 +++---
  block/ioctl.c   | 42 +
  drivers/block/amiflop.c | 12 -
  drivers/block/aoe/aoeblk.c  |  4 +--
  drivers/block/ataflop.c | 25 +
  drivers/block/drbd/drbd_main.c  |  7 ++---
  drivers/block/drbd/drbd_nl.c|  2 +-
  drivers/block/floppy.c  | 28 +--
  drivers/block/loop.c| 22 +++
  drivers/block/mtip32xx/mtip32xx.c   |  4 +--
  drivers/block/nbd.c |  4 +--
  drivers/block/pktcdvd.c | 17 ++--
  drivers/block/rbd.c |  2 +-
  drivers/block/rnbd/rnbd-clt.c   |  4 +--
  drivers/block/rnbd/rnbd-srv.c   |  4 +--
  drivers/block/sunvdc.c  |  2 +-
  drivers/block/swim.c| 16 +--
  drivers/block/swim3.c   | 24 -
  drivers/block/ublk_drv.c|  2 +-
  drivers/block/xen-blkback/xenbus.c  |  2 +-
  drivers/block/xen-blkfront.c|  2 +-
  drivers/block/z2ram.c   |  2 +-
  drivers/block/zram/zram_drv.c   |  6 ++---
  drivers/cdrom/cdrom.c   |  6 ++---
  drivers/cdrom/gdrom.c   |  4 +--
  drivers/md/bcache/bcache.h  |  2 +-
  drivers/md/bcache/request.c |  4 +--
  drivers/md/bcache/super.c   |  6 ++---
  drivers/md/dm-cache-target.c| 12 -
  drivers/md/dm-clone-target.c| 10 +++
  drivers/md/dm-core.h|  7 +++--
  drivers/md/dm-era-target.c  |  6 +++--
  drivers/md/dm-ioctl.c   | 10 +++
  drivers/md/dm-snap.c|  4 +--
  drivers/md/dm-table.c   | 11 
  drivers/md/dm-thin.c|  9 ---
  drivers/md/dm-verity-fec.c  |  2 +-
  drivers/md/dm-verity-target.c   |  6 ++---
  drivers/md/dm.c | 10 +++
  drivers/md/dm.h |  2 +-
  drivers/md/md.c |  8 +++---
  drivers/mmc/core/block.c|  8 +++---
  drivers/mtd/devices/block2mtd.c |  4 +--
  drivers/mtd/mtd_blkdevs.c   |  4 +--
  drivers/mtd/ubi/block.c |  5 ++--
  drivers/nvme/host/core.c|  2 +-
  drivers/nvme/host/ioctl.c   |  8 +++---
  drivers/nvme/host/multipath.c   |  2 +-
  drivers/nvme/host/nvme.h|  4 +--
  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
  drivers/s390/block/dasd.c   |  6 ++---
  drivers/s390/block/dasd_genhd.c |  3 ++-
  drivers/s390/block/dasd_int.h   |  3 ++-
  drivers/s390/block/dasd_ioctl.c |  2 +-
  drivers/s390/block/dcssblk.c|  4 +--
  drivers/scsi/sd.c   | 19 ++---
  drivers/scsi/sr.c   | 10 +++
  drivers/target/target_core_iblock.c |  5 ++--
  drivers/target/target_core_pscsi.c  |  4 +--
  fs/btrfs/dev-replace.c  |  2 +-
  fs/btrfs/super.c|  8 +++---
  fs/btrfs/volumes.c  | 16 +--
  fs/btrfs/volumes.h  |  4 +--
  fs/erofs/super.c|  2 +-
  fs/ext4/super.c |  2 +-
  fs/f2fs/super.c |  2 +-
  fs/jfs/jfs_logmgr.c |  2 +-
  fs/nfs/blocklayout/dev.c|  5 ++--
  fs/ocfs2/cluster/heartbeat.c|  3 ++-
  fs/reiserfs/journal.c   |  4 +--
  fs/xfs/xfs_super.c  |  2 +-
  include/linux/blkdev.h  | 30 -
  include/linux/cdrom.h   |  3 ++-
  include/linux/device-mapper.h   |  8 +++---
  kernel/power/swap.c |  6 ++---
  mm/swapfile.c   |  2 +-
  82 files changed, 334 insertions(+), 315 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 27/30] block: replace fmode_t with a block-specific type for block open flags

2023-06-09 Thread Hannes Reinecke

On 6/8/23 13:02, Christoph Hellwig wrote:

The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig 
Acked-by: Jack Wang [rnbd]
---
  arch/um/drivers/ubd_kern.c  |  8 +++---
  arch/xtensa/platforms/iss/simdisk.c |  2 +-
  block/bdev.c| 32 +++---
  block/blk-zoned.c   |  8 +++---
  block/blk.h | 11 
  block/fops.c| 32 +-
  block/genhd.c   |  8 +++---
  block/ioctl.c   | 42 +
  drivers/block/amiflop.c | 12 -
  drivers/block/aoe/aoeblk.c  |  4 +--
  drivers/block/ataflop.c | 25 +
  drivers/block/drbd/drbd_main.c  |  7 ++---
  drivers/block/drbd/drbd_nl.c|  2 +-
  drivers/block/floppy.c  | 28 +--
  drivers/block/loop.c| 22 +++
  drivers/block/mtip32xx/mtip32xx.c   |  4 +--
  drivers/block/nbd.c |  4 +--
  drivers/block/pktcdvd.c | 17 ++--
  drivers/block/rbd.c |  2 +-
  drivers/block/rnbd/rnbd-clt.c   |  4 +--
  drivers/block/rnbd/rnbd-srv.c   |  4 +--
  drivers/block/sunvdc.c  |  2 +-
  drivers/block/swim.c| 16 +--
  drivers/block/swim3.c   | 24 -
  drivers/block/ublk_drv.c|  2 +-
  drivers/block/xen-blkback/xenbus.c  |  2 +-
  drivers/block/xen-blkfront.c|  2 +-
  drivers/block/z2ram.c   |  2 +-
  drivers/block/zram/zram_drv.c   |  6 ++---
  drivers/cdrom/cdrom.c   |  6 ++---
  drivers/cdrom/gdrom.c   |  4 +--
  drivers/md/bcache/bcache.h  |  2 +-
  drivers/md/bcache/request.c |  4 +--
  drivers/md/bcache/super.c   |  6 ++---
  drivers/md/dm-cache-target.c| 12 -
  drivers/md/dm-clone-target.c| 10 +++
  drivers/md/dm-core.h|  7 +++--
  drivers/md/dm-era-target.c  |  6 +++--
  drivers/md/dm-ioctl.c   | 10 +++
  drivers/md/dm-snap.c|  4 +--
  drivers/md/dm-table.c   | 11 
  drivers/md/dm-thin.c|  9 ---
  drivers/md/dm-verity-fec.c  |  2 +-
  drivers/md/dm-verity-target.c   |  6 ++---
  drivers/md/dm.c | 10 +++
  drivers/md/dm.h |  2 +-
  drivers/md/md.c |  8 +++---
  drivers/mmc/core/block.c|  8 +++---
  drivers/mtd/devices/block2mtd.c |  4 +--
  drivers/mtd/mtd_blkdevs.c   |  4 +--
  drivers/mtd/ubi/block.c |  5 ++--
  drivers/nvme/host/core.c|  2 +-
  drivers/nvme/host/ioctl.c   |  8 +++---
  drivers/nvme/host/multipath.c   |  2 +-
  drivers/nvme/host/nvme.h|  4 +--
  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
  drivers/s390/block/dasd.c   |  6 ++---
  drivers/s390/block/dasd_genhd.c |  3 ++-
  drivers/s390/block/dasd_int.h   |  3 ++-
  drivers/s390/block/dasd_ioctl.c |  2 +-
  drivers/s390/block/dcssblk.c|  4 +--
  drivers/scsi/sd.c   | 19 ++---
  drivers/scsi/sr.c   | 10 +++
  drivers/target/target_core_iblock.c |  5 ++--
  drivers/target/target_core_pscsi.c  |  4 +--
  fs/btrfs/dev-replace.c  |  2 +-
  fs/btrfs/super.c|  8 +++---
  fs/btrfs/volumes.c  | 16 +--
  fs/btrfs/volumes.h  |  4 +--
  fs/erofs/super.c|  2 +-
  fs/ext4/super.c |  2 +-
  fs/f2fs/super.c |  2 +-
  fs/jfs/jfs_logmgr.c |  2 +-
  fs/nfs/blocklayout/dev.c|  5 ++--
  fs/ocfs2/cluster/heartbeat.c|  3 ++-
  fs/reiserfs/journal.c   |  4 +--
  fs/xfs/xfs_super.c  |  2 +-
  include/linux/blkdev.h  | 30 -
  include/linux/cdrom.h   |  3 ++-
  include/linux/device-mapper.h   |  8 +++---
  kernel/power/swap.c |  6 ++---
  mm/swapfile.c   |  2 +-
  82 files changed, 334 insertions(+), 315 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 05/30] cdrom: track if a cdrom_device_info was opened for data

2023-06-09 Thread Hannes Reinecke

On 6/8/23 13:02, Christoph Hellwig wrote:

Set a flag when a cdrom_device_info is opened for writing, instead of
trying to figure out this at release time.  This will allow to eventually
remove the mode argument to the ->release block_device_operation as
nothing but the CDROM drivers uses that argument.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Phillip Potter 
Acked-by: Christian Brauner 
---
  drivers/cdrom/cdrom.c | 12 +---
  include/linux/cdrom.h |  1 +
  2 files changed, 6 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 30/31] block: store the holder in file->private_data

2023-06-08 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Store the file struct used as the holder in file->private_data as an
indicator that this file descriptor was opened exclusively to  remove
the last use of FMODE_EXCL.

Signed-off-by: Christoph Hellwig 
---
  block/fops.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index c40b9f978e3bc7..915e0ef7560993 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -478,7 +478,7 @@ blk_mode_t file_to_blk_mode(struct file *file)
mode |= BLK_OPEN_READ;
if (file->f_mode & FMODE_WRITE)
mode |= BLK_OPEN_WRITE;
-   if (file->f_mode & FMODE_EXCL)
+   if (file->private_data)
mode |= BLK_OPEN_EXCL;
if ((file->f_flags & O_ACCMODE) == 3)
mode |= BLK_OPEN_WRITE_IOCTL;
@@ -501,12 +501,15 @@ static int blkdev_open(struct inode *inode, struct file 
*filp)
filp->f_flags |= O_LARGEFILE;
filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
  
+	/*

+* Use the file private data to store the holder, file_to_blk_mode
+* relies on this.
+*/


Can you update the comment to reflect that we're only use the
->private_data field for exclusive open?
I know it's indicated by the condition, but we really should
be clarify this usage.


if (filp->f_flags & O_EXCL)
-   filp->f_mode |= FMODE_EXCL;
+   filp->private_data = filp;
  
  	bdev = blkdev_get_by_dev(inode->i_rdev, file_to_blk_mode(filp),

-(filp->f_mode & FMODE_EXCL) ? filp : NULL,
-NULL);
+filp->private_data, NULL);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
  
@@ -517,8 +520,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
  
  static int blkdev_release(struct inode *inode, struct file *filp)

  {
-   blkdev_put(I_BDEV(filp->f_mapping->host),
-  (filp->f_mode & FMODE_EXCL) ? filp : NULL);
+   blkdev_put(I_BDEV(filp->f_mapping->host), filp->private_data);
return 0;
  }
  

Other than that:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 31/31] fs: remove the now unused FMODE_* flags

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

FMODE_NDELAY, FMODE_EXCL and FMODE_WRITE_IOCTL were only used for
block internal purposed and are now entirely unused, so remove them.

Signed-off-by: Christoph Hellwig 
---
  include/linux/fs.h | 7 ---
  1 file changed, 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 25/31] ubd: remove commented out code in ubd_open

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

This code has been dead forever, make sure it doesn't show up in code
searches.

Signed-off-by: Christoph Hellwig 
---
  arch/um/drivers/ubd_kern.c | 7 ---
  1 file changed, 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 08/31] block: share code between disk_check_media_change and disk_force_media_change

2023-06-07 Thread Hannes Reinecke

On 6/7/23 14:21, Christoph Hellwig wrote:

On Wed, Jun 07, 2023 at 02:19:00PM +0200, Hannes Reinecke wrote:

-   return true;
+   return __disk_check_media_change(disk,
+   disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE |
+   DISK_EVENT_EJECT_REQUEST));


Can you move the call to disk_clear_events() out of the call to
__disk_check_media_change()?
I find this pattern hard to read.


I suspect you've not done enough functional programming in your youth :)


That's why I said 'I find'; purely personal preference.
If you're happy with:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
(In my youth? One is tempted to quote Falco: "If you remember the '90s 
you haven't experienced them...")


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 29/31] block: always use I_BDEV on file->f_mapping->host to find the bdev

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Always use I_BDEV(file->f_mapping->host) to find the bdev for a file to
free up file->private_data for other uses.

Signed-off-by: Christoph Hellwig 
---
  block/fops.c | 18 --
  1 file changed, 8 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 27/31] block: remove unused fmode_t arguments from ioctl handlers

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

A few ioctl handlers have fmode_t arguments that are entirely unused,
remove them.

Signed-off-by: Christoph Hellwig 
---
  block/blk-zoned.c |  4 ++--
  block/blk.h   |  6 +++---
  block/ioctl.c | 14 +++---
  3 files changed, 12 insertions(+), 12 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 26/31] block: move a few internal definitions out of blkdev.h

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

All these helpers are only used in core block code, so move them out of
the public header.

Signed-off-by: Christoph Hellwig 
---
  block/blk.h| 23 +--
  include/linux/blkdev.h | 27 ---
  2 files changed, 21 insertions(+), 29 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 24/31] rnbd-srv: replace sess->open_flags with a "bool readonly"

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Stop passing the fmode_t around and just use a simple bool to track if
an export is read-only.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/rnbd/rnbd-srv-sysfs.c |  3 +--
  drivers/block/rnbd/rnbd-srv.c   | 15 +++
  drivers/block/rnbd/rnbd-srv.h   |  2 +-
  3 files changed, 9 insertions(+), 11 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 23/31] mtd: block: use a simple bool to track open for write

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Instead of propagating the fmode_t, just use a bool to track if a mtd
block device was opened for writing.

Signed-off-by: Christoph Hellwig 
---
  drivers/mtd/mtd_blkdevs.c| 2 +-
  drivers/mtd/mtdblock.c   | 2 +-
  include/linux/mtd/blktrans.h | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 22/31] nvme: replace the fmode_t argument to the nvme ioctl handlers with a simple bool

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Instead of passing a fmode_t and only checking it fo0r FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvme/host/ioctl.c | 62 +--
  1 file changed, 34 insertions(+), 28 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 21/31] scsi: replace the fmode_t argument to ->sg_io_fn with a simple bool

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.

Signed-off-by: Christoph Hellwig 
---
  block/bsg-lib.c | 2 +-
  block/bsg.c | 8 +---
  drivers/scsi/scsi_bsg.c | 4 ++--
  include/linux/bsg.h | 2 +-
  4 files changed, 9 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 20/31] scsi: replace the fmode_t argument to scsi_ioctl with a simple bool

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.

Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/ch.c |  3 ++-
  drivers/scsi/scsi_ioctl.c | 34 +-
  drivers/scsi/sd.c |  2 +-
  drivers/scsi/sg.c |  5 +++--
  drivers/scsi/sr.c |  2 +-
  drivers/scsi/st.c |  2 +-
  include/scsi/scsi_ioctl.h |  2 +-
  7 files changed, 26 insertions(+), 24 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 19/31] scsi: replace the fmode_t argument to scsi_cmd_allowed with a simple bool

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.

Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/scsi_bsg.c   | 2 +-
  drivers/scsi/scsi_ioctl.c | 8 
  drivers/scsi/sg.c | 2 +-
  include/scsi/scsi_ioctl.h | 2 +-
  4 files changed, 7 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 18/31] fs: remove sb->s_mode

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

There is no real need to store the open mode in the super_block now.
It is only used by f2fs, which can easily recalculate it.

Signed-off-by: Christoph Hellwig 
---
  fs/f2fs/super.c| 10 ++
  fs/nilfs2/super.c  |  1 -
  fs/super.c |  2 --
  include/linux/fs.h |  1 -
  4 files changed, 6 insertions(+), 8 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 17/31] block: add a sb_open_mode helper

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Add a helper to return the open flags for blkdev_get_by* for passed in
super block flags instead of open coding the logic in many places.

Signed-off-by: Christoph Hellwig 
---
  fs/btrfs/super.c   |  5 +
  fs/nilfs2/super.c  |  7 ++-
  fs/super.c | 15 ---
  include/linux/blkdev.h |  7 +++
  4 files changed, 14 insertions(+), 20 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 16/31] block: use the holder as indication for exclusive opens

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

The current interface for exclusive opens is rather confusing as it
requires both the FMODE_EXCL flag and a holder.  Remove the need to pass
FMODE_EXCL and just key off the exclusive open off a non-NULL holder.

For blkdev_put this requires adding the holder argument, which provides
better debug checking that only the holder actually releases the hold,
but at the same time allows removing the now superfluous mode argument.

Signed-off-by: Christoph Hellwig 
---
  block/bdev.c| 37 
  block/fops.c|  6 +++--
  block/genhd.c   |  5 ++--
  block/ioctl.c   |  5 ++--
  drivers/block/drbd/drbd_nl.c| 23 ++---
  drivers/block/pktcdvd.c | 13 +-
  drivers/block/rnbd/rnbd-srv.c   |  4 +--
  drivers/block/xen-blkback/xenbus.c  |  2 +-
  drivers/block/zram/zram_drv.c   |  8 +++---
  drivers/md/bcache/super.c   | 15 ++--
  drivers/md/dm.c |  6 ++---
  drivers/md/md.c | 38 +++--
  drivers/mtd/devices/block2mtd.c |  4 +--
  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
  drivers/s390/block/dasd_genhd.c |  2 +-
  drivers/target/target_core_iblock.c |  6 ++---
  drivers/target/target_core_pscsi.c  |  8 +++---
  fs/btrfs/dev-replace.c  |  6 ++---
  fs/btrfs/ioctl.c| 12 -
  fs/btrfs/volumes.c  | 28 ++---
  fs/btrfs/volumes.h  |  6 ++---
  fs/erofs/super.c|  7 +++---
  fs/ext4/super.c | 11 +++--
  fs/f2fs/super.c |  2 +-
  fs/jfs/jfs_logmgr.c |  6 ++---
  fs/nfs/blocklayout/dev.c|  4 +--
  fs/nilfs2/super.c   |  6 ++---
  fs/ocfs2/cluster/heartbeat.c|  4 +--
  fs/reiserfs/journal.c   | 19 +++
  fs/reiserfs/reiserfs.h  |  1 -
  fs/super.c  | 20 +++
  fs/xfs/xfs_super.c  | 15 ++--
  include/linux/blkdev.h  |  2 +-
  kernel/power/hibernate.c| 12 +++--
  kernel/power/power.h|  2 +-
  kernel/power/swap.c | 21 +++-
  mm/swapfile.c   |  7 +++---
  37 files changed, 183 insertions(+), 192 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 12/31] swsusp: don't pass a stack address to blkdev_get_by_path

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

holder is just an on-stack pointer that can easily be reused by other calls,
replace it with a static variable that doesn't change.

Signed-off-by: Christoph Hellwig 
---
  kernel/power/swap.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 11/31] block: rename blkdev_close to blkdev_release

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Make the function name match the method name.

Signed-off-by: Christoph Hellwig 
---
  block/fops.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 15/31] btrfs: don't pass a holder for non-exclusive blkdev_get_by_path

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't
make sense, so pass NULL instead and remove the holder argument from the
call chains the only end up in non-FMODE_EXCL blkdev_get_by_path calls.

Signed-off-by: Christoph Hellwig 
---
  fs/btrfs/super.c   | 16 ++--
  fs/btrfs/volumes.c | 17 -
  fs/btrfs/volumes.h |  3 +--
  3 files changed, 15 insertions(+), 21 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 13/31] bcache: don't pass a stack address to blkdev_get_by_path

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

sb is just an on-stack pointer that can easily be reused by other calls.
Switch to use the bcache-wide bcache_kobj instead as there is no need to
claim per-bcache device anyway.

Signed-off-by: Christoph Hellwig 
---
  drivers/md/bcache/super.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 14/31] rnbd-srv: don't pass a holder for non-exclusive blkdev_get_by_path

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't
make sense, so pass NULL instead.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/rnbd/rnbd-srv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 09/31] block: pass a gendisk to ->open

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

->open is only called on the whole device.  Make that explicit by
passing a gendisk instead of the block_device.

Signed-off-by: Christoph Hellwig 
---
  arch/um/drivers/ubd_kern.c  |  5 ++---
  arch/xtensa/platforms/iss/simdisk.c |  4 ++--
  block/bdev.c|  2 +-
  drivers/block/amiflop.c |  8 
  drivers/block/aoe/aoeblk.c  |  4 ++--
  drivers/block/ataflop.c | 16 +++
  drivers/block/drbd/drbd_main.c  |  6 +++---
  drivers/block/floppy.c  | 30 +++--
  drivers/block/nbd.c |  8 
  drivers/block/pktcdvd.c |  6 +++---
  drivers/block/rbd.c |  4 ++--
  drivers/block/rnbd/rnbd-clt.c   |  4 ++--
  drivers/block/swim.c| 10 +-
  drivers/block/swim3.c   | 10 +-
  drivers/block/ublk_drv.c|  4 ++--
  drivers/block/z2ram.c   |  6 ++
  drivers/block/zram/zram_drv.c   | 13 +
  drivers/cdrom/gdrom.c   |  4 ++--
  drivers/md/bcache/super.c   |  4 ++--
  drivers/md/dm.c |  4 ++--
  drivers/md/md.c |  6 +++---
  drivers/mmc/core/block.c|  4 ++--
  drivers/mtd/mtd_blkdevs.c   |  4 ++--
  drivers/mtd/ubi/block.c |  4 ++--
  drivers/nvme/host/core.c|  4 ++--
  drivers/nvme/host/multipath.c   |  4 ++--
  drivers/s390/block/dasd.c   |  4 ++--
  drivers/s390/block/dcssblk.c|  7 +++
  drivers/scsi/sd.c   | 12 ++--
  drivers/scsi/sr.c   |  6 +++---
  include/linux/blkdev.h  |  2 +-
  31 files changed, 102 insertions(+), 107 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 10/31] block: remove the unused mode argument to ->release

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

The mode argument to the ->release block_device_operation is never used,
so remove it.

Signed-off-by: Christoph Hellwig 
---
  arch/um/drivers/ubd_kern.c  |  4 ++--
  arch/xtensa/platforms/iss/simdisk.c |  2 +-
  block/bdev.c| 14 +++---
  drivers/block/amiflop.c |  2 +-
  drivers/block/aoe/aoeblk.c  |  2 +-
  drivers/block/ataflop.c |  4 ++--
  drivers/block/drbd/drbd_main.c  |  4 ++--
  drivers/block/floppy.c  |  2 +-
  drivers/block/loop.c|  2 +-
  drivers/block/nbd.c |  2 +-
  drivers/block/pktcdvd.c |  4 ++--
  drivers/block/rbd.c |  2 +-
  drivers/block/rnbd/rnbd-clt.c   |  2 +-
  drivers/block/swim.c|  2 +-
  drivers/block/swim3.c   |  3 +--
  drivers/block/z2ram.c   |  2 +-
  drivers/cdrom/gdrom.c   |  2 +-
  drivers/md/bcache/super.c   |  2 +-
  drivers/md/dm.c |  2 +-
  drivers/md/md.c |  2 +-
  drivers/mmc/core/block.c|  2 +-
  drivers/mtd/mtd_blkdevs.c   |  2 +-
  drivers/mtd/ubi/block.c |  2 +-
  drivers/nvme/host/core.c|  2 +-
  drivers/nvme/host/multipath.c   |  2 +-
  drivers/s390/block/dasd.c   |  2 +-
  drivers/s390/block/dcssblk.c|  4 ++--
  drivers/scsi/sd.c   |  3 +--
  drivers/scsi/sr.c   |  2 +-
  include/linux/blkdev.h  |  2 +-
  30 files changed, 41 insertions(+), 43 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 07/31] block: pass a gendisk on bdev_check_media_change

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

bdev_check_media_change should only ever be called for the whole device.
Pass a gendisk to make that explicit and rename the function to
disk_check_media_change.

Signed-off-by: Christoph Hellwig 
---
  block/disk-events.c | 18 +-
  drivers/block/amiflop.c |  2 +-
  drivers/block/ataflop.c |  6 +++---
  drivers/block/floppy.c  | 16 
  drivers/block/swim.c|  2 +-
  drivers/block/swim3.c   |  2 +-
  drivers/cdrom/gdrom.c   |  2 +-
  drivers/md/md.c |  2 +-
  drivers/scsi/sd.c   |  9 -
  drivers/scsi/sr.c   |  2 +-
  include/linux/blkdev.h  |  2 +-
  11 files changed, 31 insertions(+), 32 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 05/31] cdrom: track if a cdrom_device_info was opened for data

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Set a flag when a cdrom_device_info is opened for writing, instead of
trying to figure out this at release time.  This will allow to eventually
remove the mode argument to the ->release block_device_operation as
nothing but the CDROM drivers uses that argument.

Signed-off-by: Christoph Hellwig 
---
  drivers/cdrom/cdrom.c | 12 +---
  include/linux/cdrom.h |  1 +
  2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 08abf1ffede002..adebac1bd210d9 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -1172,6 +1172,7 @@ int cdrom_open(struct cdrom_device_info *cdi, fmode_t 
mode)
ret = 0;
cdi->media_written = 0;
}
+   cdi->opened_for_data = true;
}
  
  	if (ret)

@@ -1252,7 +1253,6 @@ static int check_for_audio_disc(struct cdrom_device_info 
*cdi,
  void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode)
  {
const struct cdrom_device_ops *cdo = cdi->ops;
-   int opened_for_data;
  
  	cd_dbg(CD_CLOSE, "entering cdrom_release\n");
  
@@ -1270,14 +1270,12 @@ void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode)

}
}
  
-	opened_for_data = !(cdi->options & CDO_USE_FFLAGS) ||

-   !(mode & FMODE_NDELAY);
-
cdo->release(cdi);
-   if (cdi->use_count == 0) {  /* last process that closes dev*/
-   if (opened_for_data &&
-   cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY))
+
+   if (cdi->use_count == 0 && cdi->opened_for_data) {
+   if (cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY))
cdo->tray_move(cdi, 1);
+   cdi->opened_for_data = false;
}
  }
  EXPORT_SYMBOL(cdrom_release);
diff --git a/include/linux/cdrom.h b/include/linux/cdrom.h
index 0a5db0b0c958a1..385e94732b2cf1 100644
--- a/include/linux/cdrom.h
+++ b/include/linux/cdrom.h
@@ -64,6 +64,7 @@ struct cdrom_device_info {
int (*exit)(struct cdrom_device_info *);
int mrw_mode_page;
__s64 last_media_change_ms;
+   bool opened_for_data;
  };
  
  struct cdrom_device_ops {


Do we care about alignment here?
integer followed by a 64 bit value followed by a bool seems
like an automatic padding to me ...

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 03/31] cdrom: remove the unused mode argument to cdrom_ioctl

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 
---
  drivers/cdrom/cdrom.c | 2 +-
  drivers/cdrom/gdrom.c | 2 +-
  drivers/scsi/sr.c | 2 +-
  include/linux/cdrom.h | 4 ++--
  4 files changed, 5 insertions(+), 5 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 08/31] block: share code between disk_check_media_change and disk_force_media_change

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Factor the common logic between disk_check_media_change and
disk_force_media_change into a helper.

Signed-off-by: Christoph Hellwig 
---
  block/disk-events.c | 37 -
  1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/block/disk-events.c b/block/disk-events.c
index 8b1b63225738f8..06f325662b3494 100644
--- a/block/disk-events.c
+++ b/block/disk-events.c
@@ -262,6 +262,18 @@ static unsigned int disk_clear_events(struct gendisk 
*disk, unsigned int mask)
return pending;
  }
  
+static bool __disk_check_media_change(struct gendisk *disk, unsigned int events)

+{
+   if (!(events & DISK_EVENT_MEDIA_CHANGE))
+   return false;
+
+   if (__invalidate_device(disk->part0, true))
+   pr_warn("VFS: busy inodes on changed media %s\n",
+   disk->disk_name);
+   set_bit(GD_NEED_PART_SCAN, >state);
+   return true;
+}
+
  /**
   * disk_check_media_change - check if a removable media has been changed
   * @disk: gendisk to check
@@ -274,18 +286,9 @@ static unsigned int disk_clear_events(struct gendisk 
*disk, unsigned int mask)
   */
  bool disk_check_media_change(struct gendisk *disk)
  {
-   unsigned int events;
-
-   events = disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE |
-  DISK_EVENT_EJECT_REQUEST);
-   if (!(events & DISK_EVENT_MEDIA_CHANGE))
-   return false;
-
-   if (__invalidate_device(disk->part0, true))
-   pr_warn("VFS: busy inodes on changed media %s\n",
-   disk->disk_name);
-   set_bit(GD_NEED_PART_SCAN, >state);
-   return true;
+   return __disk_check_media_change(disk,
+   disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE |
+   DISK_EVENT_EJECT_REQUEST));


Can you move the call to disk_clear_events() out of the call to 
__disk_check_media_change()?

I find this pattern hard to read.

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 02/31] cdrom: remove the unused bdev argument to cdrom_open

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 
---
  drivers/cdrom/cdrom.c | 3 +--
  drivers/cdrom/gdrom.c | 2 +-
  drivers/scsi/sr.c | 2 +-
  include/linux/cdrom.h | 3 +--
  4 files changed, 4 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 04/31] cdrom: remove the unused cdrom_close_write release code

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

cdrom_close_write is empty, and the for_data flag it is keyed off is
never set.  Remove all this clutter.

Signed-off-by: Christoph Hellwig 
---
  drivers/cdrom/cdrom.c | 15 ---
  include/linux/cdrom.h |  1 -
  2 files changed, 16 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 06/31] cdrom: remove the unused mode argument to cdrom_release

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 
---
  drivers/cdrom/cdrom.c | 2 +-
  drivers/cdrom/gdrom.c | 2 +-
  drivers/scsi/sr.c | 2 +-
  include/linux/cdrom.h | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 01/31] block: also call ->open for incremental partition opens

2023-06-07 Thread Hannes Reinecke

On 6/6/23 09:39, Christoph Hellwig wrote:

For whole devices ->open is called for each open, but for partitions it
is only called on the first open of a partition.  This is problematic
as various block drivers look at open flags and might not do all setup
for ioctl only or NDELAY opens.

Signed-off-by: Christoph Hellwig 
---
  block/bdev.c | 18 --
  1 file changed, 8 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] Does dm-zoned support buffered write?

2023-05-15 Thread Hannes Reinecke

On 5/12/23 19:41, Ming Lin wrote:

On Thu, May 11, 2023 at 11:56 AM Hannes Reinecke  wrote:


On 5/11/23 20:41, Ming Lin wrote:

Hi list,

I have an application that needs to use buffered_io to access SMR disk
for good performance.

  From "ZBD Support Restrictions" at https://zonedstorage.io/docs/linux/overview
"
Direct IO Writes The kernel page cache does not guarantee that cached
dirty pages will be flushed to a block device in sequential sector
order. This can lead to unaligned write errors if an application uses
buffered writes to write to the sequential write required zones of a
device. To avoid this pitfall, applications that directly use a zoned
block device without a file system should always use direct I/O
operations to write to the sequential write required zones of a
host-managed disk (that is, they should issue write() system calls
with a block device "file open" that uses the O_DIRECT flag).
"

Raw zbd disk only supports direct_io.

Does dm-zoned support buffered io (without O_DIRECT)?


Yes. But I _think_ the above paragraph is ever so slightly outdated, as
we've spent quite a lot of time fixing sequential writes (cf blk-zoned
etc). So while dm-zoned is using bufferet writes there won't be any
sequential write issues.

At least, I have not uncovered any of those during testing.


Hi Hannes,

I use 5.10.90 kernel and smr disk capacity is 24T.
I followed the below guide to create dm_zone device on top of smr disk.
https://zonedstorage.io/docs/linux/dm

Then mkfs.ext4 /dev/dm-0, but it seems hung.
Any ideas?

[37552.217472] device-mapper: uevent: version 1.0.3
[37552.217549] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01)
initialised: dm-devel@redhat.com
[37575.608500] device-mapper: zoned metadata: (dmz-5000cca2bfc0db21):
DM-Zoned metadata version 2
[37575.608502] device-mapper: zoned metadata: (sdx): Host-managed
zoned block device
[37575.608503] device-mapper: zoned metadata: (sdx):   50782535680
512-byte logical sectors (offset 0)
[37575.608503] device-mapper: zoned metadata: (sdx):   96860 zones of
524288 512-byte logical sectors (offset 0)
[37575.608504] device-mapper: zoned metadata: (dmz-5000cca2bfc0db21):
  96860 zones of 524288 512-byte logical sectors
[37575.609204] device-mapper: zoned: (dmz-5000cca2bfc0db21): Target
device: 50771001344 512-byte logical sectors (6346375168 blocks)
[38101.543353] INFO: task mkfs.ext4:1411791 blocked for more than 122 seconds.
[38101.543380]   Tainted: G   OE 5.10.90.bm.1-amd64+ #2
[38101.543395] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[38101.543411] task:mkfs.ext4   state:D stack:0 pid:1411791
ppid:1388660 flags:0x4000
[38101.543415] Call Trace:
[38101.543422]  __schedule+0x3fd/0x760
[38101.543425]  schedule+0x46/0xb0
[38101.543426]  io_schedule+0x12/0x40
[38101.543429]  wait_on_page_bit+0x133/0x270
[38101.543431]  ? __page_cache_alloc+0xa0/0xa0
[38101.543432]  wait_on_page_writeback+0x25/0x70
[38101.543434]  __filemap_fdatawait_range+0x86/0xf0
[38101.543435]  file_write_and_wait_range+0x74/0xb0
[38101.543438]  blkdev_fsync+0x16/0x40
[38101.543441]  do_fsync+0x38/0x60
[38101.543442]  __x64_sys_fsync+0x10/0x20
[38101.543445]  do_syscall_64+0x2d/0x70
[38101.543446]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

===
Below are the steps I did:

root@smr_dev:~# blkzone reset /dev/sdx

root@smr_dev:~# dmzadm --format /dev/sdx
/dev/sdx: 50782535680 512-byte sectors (24215 GiB)
   Host-managed device
   96860 zones, offset 0
   96860 zones of 524288 512-byte sectors (256 MiB)
   65536 4KB data blocks per zone
Resetting sequential zones
Writing primary metadata set
   Writing mapping table
   Writing bitmap blocks
   Writing super block to sdx block 0
Writing secondary metadata set
   Writing mapping table
   Writing bitmap blocks
   Writing super block to sdx block 196608
Syncing disk
Done.


Hmm. I don't actually see how many CMR zones the drive has.


root@smr_dev:~# dmzadm --start /dev/sdx
/dev/sdx: 50782535680 512-byte sectors (24215 GiB)
   Host-managed device
   96860 zones, offset 0
   96860 zones of 524288 512-byte sectors (256 MiB)
   65536 4KB data blocks per zone
sdx: starting dmz-5000cca2bfc0db21, metadata ver. 2, uuid
7495e21a-23d9-49f4-832a-76b32136078b

root@smr_dev:~# mkfs.ext4 /dev/dm-0
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done
Creating filesystem with 6346375168 4k blocks and 396648448 inodes
Filesystem UUID: c47de06d-6cf6-4a85-9502-7830ca2f4526
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 4096000, 7962624, 11239424, 2048, 23887872, 71663616, 78675968,
 10240, 214990848, 51200, 550731776, 644972544, 1934917632,
 256000, 3855122432, 5804752896

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information:

===
At anoth

Re: [dm-devel] Does dm-zoned support buffered write?

2023-05-11 Thread Hannes Reinecke

On 5/11/23 20:41, Ming Lin wrote:

Hi list,

I have an application that needs to use buffered_io to access SMR disk
for good performance.

 From "ZBD Support Restrictions" at https://zonedstorage.io/docs/linux/overview
"
Direct IO Writes The kernel page cache does not guarantee that cached
dirty pages will be flushed to a block device in sequential sector
order. This can lead to unaligned write errors if an application uses
buffered writes to write to the sequential write required zones of a
device. To avoid this pitfall, applications that directly use a zoned
block device without a file system should always use direct I/O
operations to write to the sequential write required zones of a
host-managed disk (that is, they should issue write() system calls
with a block device "file open" that uses the O_DIRECT flag).
"

Raw zbd disk only supports direct_io.

Does dm-zoned support buffered io (without O_DIRECT)?

Yes. But I _think_ the above paragraph is ever so slightly outdated, as 
we've spent quite a lot of time fixing sequential writes (cf blk-zoned 
etc). So while dm-zoned is using bufferet writes there won't be any 
sequential write issues.


At least, I have not uncovered any of those during testing.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 18/18] scsi: target: Add block PR support to iblock

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This adds support for the block PR callouts to target_core_iblock. This
patch doesn't attempt to implement the entire spec because there's no way
support it all like SPEC_I_PT and ALL_TG_PT. This only supports
exporting the iblock device from one path on the local target.

Signed-off-by: Mike Christie 
---
  drivers/target/target_core_iblock.c | 271 +++-
  1 file changed, 266 insertions(+), 5 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 17/18] scsi: target: Report and detect unsupported PR commands

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

The backend modules don't know about ports and I_T nexuses and the pr_ops
callouts the modules will use don't support the old RESERVE/RELEASE
commands. This patch has us report we don't support those types of
commands and fail them.

Signed-off-by: Mike Christie 
---
  drivers/target/target_core_pr.c  | 17 
  drivers/target/target_core_spc.c | 75 +++-
  2 files changed, 72 insertions(+), 20 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 16/18] scsi: target: Pass struct target_opcode_descriptor to enabled

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

The iblock pr_ops support does not support commands that require port or
I_T Nexus info. This adds a struct target_opcode_descriptor as an argument
to the enabled callout so we can still have the common tcm_is_pr_enabled
and tcm_is_scsi2_reservations_enabled functions and also determine if the
command is supported based on the command and service action and device
settings.

Signed-off-by: Mike Christie 
---
  drivers/target/target_core_spc.c  | 40 +++
  include/target/target_core_base.h |  3 ++-
  2 files changed, 27 insertions(+), 16 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 15/18] scsi: target: Allow backends to hook into PR handling

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

For the cases where you want to export a device to a VM via a single
I_T nexus and want to passthrough the PR handling to the physical/real
device you have to use pscsi or tcmu. Both are good for specific uses
however for the case where you want good performance, and are not using
SCSI devices directly (using DM/MD RAID or multipath devices) then we are
out of luck.

The following patches allow iblock to mimimally hook into the LIO PR code
and then pass the PR handling to the physical device. Note that like with
the tcmu an pscsi cases it's only supported when you export the device via
one I_T nexus.

This patch adds the initial LIO callouts. The next patch will modify
iblock.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
---
  drivers/target/target_core_pr.c  | 62 +++-
  include/target/target_core_backend.h |  4 ++
  2 files changed, 65 insertions(+), 1 deletion(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 14/18] scsi: target: Rename sbc_ops to exec_cmd_ops

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

The next patches allow us to call the block layer's pr_ops from the
backends. This will require allowing the backends to hook into the cmd
processing for SPC commands, so this renames sbc_ops to a more generic
exec_cmd_ops.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
---
  drivers/target/target_core_file.c|  4 ++--
  drivers/target/target_core_iblock.c  |  4 ++--
  drivers/target/target_core_rd.c  |  4 ++--
  drivers/target/target_core_sbc.c | 13 +++--
  drivers/target/target_core_spc.c |  4 ++--
  include/target/target_core_backend.h |  4 ++--
  6 files changed, 17 insertions(+), 16 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 13/18] nvme: Add pr_ops read_reservation support

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This patch adds support for the pr_ops read_reservation callout by
calling the NVMe Reservation Report helper. It then parses that info to
detect if there is a reservation and if there is then convert the
returned info to a pr_ops pr_held_reservation struct.

Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
---
  drivers/nvme/host/pr.c | 83 ++
  1 file changed, 83 insertions(+)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 12/18] nvme: Add a nvme_pr_type enum

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

The next patch adds support to report the reservation type, so we need to
be able to convert from the NVMe PR value we get from the device to the
linux block layer PR value that will be returned to callers. To prepare
for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type
function.

Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
---
  drivers/nvme/host/pr.c | 24 
  include/linux/nvme.h   |  9 +
  2 files changed, 21 insertions(+), 12 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 11/18] nvme: Add pr_ops read_keys support

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This patch adds support for the pr_ops read_keys callout by calling the
NVMe Reservation Report helper, then parsing that info to get the
controller's registered keys. Because the callout is only used in the
kernel where the callers, like LIO, do not know about controller/host IDs,
the callout just returns the registered keys which is required by the SCSI
PR in READ KEYS command.

Signed-off-by: Mike Christie 
---
  drivers/nvme/host/pr.c | 69 ++
  include/linux/nvme.h   |  4 +++
  2 files changed, 73 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 10/18] nvme: Add helper to send pr command

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

Move the code that checks for multipath support and sends the pr command
to a new helper so it can be used by the reservation report support added
in the next patches.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
---
  drivers/nvme/host/pr.c | 23 ++-
  1 file changed, 14 insertions(+), 9 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 01/18] block: Add PR callouts for read keys and reservation

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

Add callouts for reading keys and reservations. This allows LIO to support
the READ_KEYS and READ_RESERVATION commands so it can export devices to
VMs for software like windows clustering.

Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
---
  include/linux/pr.h | 25 +
  1 file changed, 25 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 09/18] nvme: Move pr code to it's own file

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This patch moves the pr code to it's own file because I'm going to be
adding more functions and core.c is getting bigger.

Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Keith Busch 
---
  drivers/nvme/host/Makefile |   2 +-
  drivers/nvme/host/core.c   | 148 --
  drivers/nvme/host/nvme.h   |   2 +
  drivers/nvme/host/pr.c | 158 +
  4 files changed, 161 insertions(+), 149 deletions(-)
  create mode 100644 drivers/nvme/host/pr.c
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 08/18] nvme: Don't hardcode the data len for pr commands

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

Reservation Report support needs to pass in a variable sized buffer, so
this patch has the pr command helpers take a data length argument.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Keith Busch 
Reviewed-by: Chaitanya Kulkarni 
---
  drivers/nvme/host/core.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 02/18] block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's
locking feature which works similar to NVMe/SCSI reservations where a
host can get a lock on a device and when the lock is taken it will get
failures.

This patch renames BLK_STS_NEXUS so it better reflects this type of
use.

Signed-off-by: Mike Christie 
Acked-by: Stefan Haberland 
Reviewed-by: Bart Van Assche 
---
  block/blk-core.c  | 2 +-
  drivers/nvme/host/core.c  | 2 +-
  drivers/s390/block/dasd.c | 7 ++-
  drivers/scsi/scsi_lib.c   | 2 +-
  include/linux/blk_types.h | 4 ++--
  5 files changed, 11 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 07/18] nvme: Fix reservation status related structs

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This fixes the following issues with the reservation status structs:

1. resv10 is bytes 23:10 so it should be 14 bytes.
2. regctl_ds only supports 64 bit host IDs.

These are not currently used, but will be in this patchset which adds
support for the reservation report command.

Signed-off-by: Mike Christie 
---
  include/linux/nvme.h | 38 ++
  1 file changed, 30 insertions(+), 8 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 06/18] dm: Add support for block PR read keys/reservation

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This adds support in dm for the block PR read keys and read reservation
callouts.

Signed-off-by: Mike Christie 
---
  drivers/md/dm.c | 69 +
  1 file changed, 69 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 05/18] scsi: Add support for block PR read keys/reservation

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

This adds support in sd.c for the block PR read keys and read reservation
callouts, so upper layers like LIO can get the PR info that's been setup
using the existing pr callouts and return it to initiators.

Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
---
  drivers/scsi/scsi_common.c | 21 +
  drivers/scsi/sd.c  | 91 ++
  include/scsi/scsi_common.h |  1 +
  include/scsi/scsi_proto.h  |  5 +++
  4 files changed, 118 insertions(+)


Unfortunately this ties our PR abstraction to the SCSI PR model.
Let's hope that other protocols like NVMe will map onto that, too.
But in the meantime:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 04/18] scsi: Move sd_pr_type to scsi_common

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

LIO is going to want to do the same block to/from SCSI pr types as sd.c
so this moves the sd_pr_type helper to scsi_common and renames it. The
next patch will then also add a helper to go from the SCSI value to the
block one for use with PERSISTENT_RESERVE_IN commands.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Chaitanya Kulkarni 
---
  drivers/scsi/scsi_common.c | 22 ++
  drivers/scsi/sd.c  | 33 -
  include/scsi/scsi_common.h | 12 
  3 files changed, 42 insertions(+), 25 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 03/18] scsi: Rename sd_pr_command

2023-04-11 Thread Hannes Reinecke

On 4/7/23 22:05, Mike Christie wrote:

Rename sd_pr_command to sd_pr_out_command to match a
sd_pr_in_command helper added in the next patches.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Bart Van Assche 
---
  drivers/scsi/sd.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [LSF/MM/BPF TOPIC] Linux Security Summit cross-over?

2023-02-27 Thread Hannes Reinecke

On 2/24/23 22:37, Steve French wrote:
I did one on network fs security at a security summit before that would 
still be relevant to both


On Tue, Feb 21, 2023, 16:16 James Bottomley 
<mailto:james.bottom...@hansenpartnership.com>> wrote:


On Tue, 2023-02-21 at 16:55 +, David Howells wrote:
 >
 > Since the first day of the LSS is the same as the final day of LSF
 > and in the same venue, are there any filesystem + security subjects
 > that would merit a common session?


I've got one:  Cryptographic material handling.

Subtitle could be: making keyrings more usable.

The broad problem is that the use of encryption within the kernel is
growing (from the old dm-crypt to the newer fscrypt and beyond) yet
pretty much all of our cryptographic key material handling violates the
principle of least privilege.  The latest one (which I happened to
notice being interested in TPMs) is the systemd tpm2 cryptenroll.  The
specific violation is that key unwrapping should occur as close as
possible to use: when the kernel uses a key, it should be the kernel
unwrapping it not unwrapping in user space and handing the unwrapped
key down to the kernel because that gives a way.  We got here because
in most of the old uses, the key is derived from a passphrase and the
kernel can't prompt the user, so pieces of user space have to gather
the precursor cryptographic material anyway.  However, we're moving
towards using cryptographic devices (like the TPM, key fobs and the
like) to store keys we really should be passing the wrapped key into
the kernel and letting it do the unwrap to preserve least privilege.
dm-crypt has some support for using kernel based TPM keys (the trusted
key subsystem), but this isn't propagated into systemd-cryptenroll and
pretty much none of the other encryption systems make any attempt to
use keyrings for unwrap handling, even if they use keyrings to store
cryptographic material.

Part of the reason seems to be that the keyrings subsystem itself is
hard to use as a generic "unwrapper" since the consumer of the keys has
to know exactly the key type to consume the protected payload.  We
could possibly fix this by adding a payload accessor function so the
keyring consumer could access a payload from any key type and thus
benefit from in-kernel unwrapping, but there are likely a host of other
issues that need to be solved.  So what I'd really like to discuss is:

Given the security need for a generic in-kernel unwrapper, should we
make keyrings do this and if so, how?
That's one where I'd be interested in, too; for NVMe-over-TLS we'll be 
using keyrings, too, and have the same issue as to how and where keys 
should be decoded (eg for X.509 handling).


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 2/9] block: Add copy offload support infrastructure

2023-01-12 Thread Hannes Reinecke

On 1/12/23 12:58, Nitesh Shetty wrote:

Introduce blkdev_issue_copy which supports source and destination bdevs,
and an array of (source, destination and copy length) tuples.
Introduce REQ_COPY copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
Read request populates token with source specific information which
is then passed with write request.
This design is courtesy Mikulas Patocka's token based copy

Larger copy will be divided, based on max_copy_sectors limit.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Anuj Gupta 
---
  block/blk-lib.c   | 358 ++
  block/blk.h   |   2 +
  include/linux/blk_types.h |  44 +
  include/linux/blkdev.h|   3 +
  include/uapi/linux/fs.h   |  15 ++
  5 files changed, 422 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..2ce3c872ca49 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -115,6 +115,364 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
  }
  EXPORT_SYMBOL(blkdev_issue_discard);
  
+/*

+ * For synchronous copy offload/emulation, wait and process all in-flight BIOs.
+ * This must only be called once all bios have been issued so that the refcount
+ * can only decrease. This just waits for all bios to make it through
+ * bio_copy_*_write_end_io. IO errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+   int ret = 0;
+
+   atomic_dec(>refcount);
+
+   if (cio->endio)
+   return 0;
+
+   if (atomic_read(>refcount)) {
+   __set_current_state(TASK_UNINTERRUPTIBLE);
+   blk_io_schedule();
+   }
+
Wouldn't it be better to use 'atomic_dec_return()' to avoid a potential 
race condition between atomic_dec() and atomic_read()?



+   ret = cio->io_err;
+   kfree(cio);
+
+   return ret;
+}
+
+static void blk_copy_offload_write_end_io(struct bio *bio)
+{
+   struct copy_ctx *ctx = bio->bi_private;
+   struct cio *cio = ctx->cio;
+   sector_t clen;
+   int ri = ctx->range_idx;
+
+   if (bio->bi_status) {
+   cio->io_err = blk_status_to_errno(bio->bi_status);
+   clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
+   cio->ranges[ri].dst;
+   cio->ranges[ri].comp_len = min_t(sector_t, clen,
+   cio->ranges[ri].comp_len);
+   }
+   __free_page(bio->bi_io_vec[0].bv_page);
+   bio_put(bio);
+
+   if (atomic_dec_and_test(>refcount))
+   kfree(ctx);
+   if (atomic_dec_and_test(>refcount)) {


_Two_ atomic_dec() in a row?
Why?

And if that really is required please add a comment.


+   if (cio->endio) {
+   cio->endio(cio->private, cio->io_err);
+   kfree(cio);
+   } else
+   blk_wake_io_task(cio->waiter);
+   }
+}
+
+static void blk_copy_offload_read_end_io(struct bio *read_bio)
+{
+   struct copy_ctx *ctx = read_bio->bi_private;
+   struct cio *cio = ctx->cio;
+   sector_t clen;
+   int ri = ctx->range_idx;
+   unsigned long flags;
+
+   if (read_bio->bi_status) {
+   cio->io_err = blk_status_to_errno(read_bio->bi_status);
+   goto err_rw_bio;
+   }
+
+   /* For zoned device, we check if completed bio is first entry in linked
+* list,
+* if yes, we start the worker to submit write bios.
+* if not, then we just update status of bio in ctx,
+* once the worker gets scheduled, it will submit writes for all
+* the consecutive REQ_COPY_READ_COMPLETE bios.
+*/
+   if (bdev_is_zoned(ctx->write_bio->bi_bdev)) {
+   spin_lock_irqsave(>list_lock, flags);
+   ctx->status = REQ_COPY_READ_COMPLETE;
+   if (ctx == list_first_entry(>list,
+   struct copy_ctx, list)) {
+   spin_unlock_irqrestore(>list_lock, flags);
+   schedule_work(>dispatch_work);
+   goto free_read_bio;
+   }
+   spin_unlock_irqrestore(>list_lock, flags);
+   } else
+   schedule_work(>dispatch_work);
+
+free_read_bio:
+   bio_put(read_bio);
+
+   return;
+
+err_rw_bio:
+   clen = (read_bio->bi_iter.bi_sector << SECTOR_SHIFT) -
+   cio->ranges[ri].src;
+   cio->ranges[ri].comp_len = min_t(sector_t, clen,
+   cio->ranges[ri].comp_len);
+   __free_page(read_bio->bi_io_vec[0].bv_page);
+   bio_put(ctx->write_bio);
+   bio_put(read_bio);
+   if (atomic_dec_and_test(>refcount))
+   kfree(ctx);
+   if (atomic_dec_and_test(>refcount)) {


Same here.


+   if (cio->endio) {
+   

Re: [dm-devel] [PATCH v6 3/9] block: add emulation for copy

2023-01-12 Thread Hannes Reinecke

On 1/12/23 15:46, Hannes Reinecke wrote:

On 1/12/23 12:58, Nitesh Shetty wrote:

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination asynchronously.
For zoned device we maintain a linked list of read submission and try to
submit corresponding write in same order.
Also emulation is used, if copy offload fails or partially completes.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Vincent Fu 
Signed-off-by: Anuj Gupta 
---
  block/blk-lib.c    | 241 -
  block/blk-map.c    |   4 +-
  include/linux/blkdev.h |   3 +
  3 files changed, 245 insertions(+), 3 deletions(-)


I'm not sure if I agree with this one.

You just submitted a patch for device-mapper to implement copy offload, 
which (to all intents and purposes) _is_ an emulation.


So why do we need to implement it in the block layer as an emulation?
Or, if we have to, why do we need the device-mapper emulation?
This emulation will be doing the same thing, no?


Sheesh. One should read the entire patchset.

Disregard the above comment.

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 3/9] block: add emulation for copy

2023-01-12 Thread Hannes Reinecke

On 1/12/23 12:58, Nitesh Shetty wrote:

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination asynchronously.
For zoned device we maintain a linked list of read submission and try to
submit corresponding write in same order.
Also emulation is used, if copy offload fails or partially completes.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Vincent Fu 
Signed-off-by: Anuj Gupta 
---
  block/blk-lib.c| 241 -
  block/blk-map.c|   4 +-
  include/linux/blkdev.h |   3 +
  3 files changed, 245 insertions(+), 3 deletions(-)


I'm not sure if I agree with this one.

You just submitted a patch for device-mapper to implement copy offload, 
which (to all intents and purposes) _is_ an emulation.


So why do we need to implement it in the block layer as an emulation?
Or, if we have to, why do we need the device-mapper emulation?
This emulation will be doing the same thing, no?

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.

2022-06-05 Thread Hannes Reinecke

On 6/4/22 19:13, michael.chris...@oracle.com wrote:

On 6/4/22 2:38 AM, Hannes Reinecke wrote:

On 6/3/22 21:45, Keith Busch wrote:

On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote:

@@ -171,6 +171,7 @@ static const struct {
   /* zone device specific errors */
   [BLK_STS_ZONE_OPEN_RESOURCE]    = { -ETOOMANYREFS, "open zones exceeded" 
},
   [BLK_STS_ZONE_ACTIVE_RESOURCE]    = { -EOVERFLOW, "active zones 
exceeded" },
+    [BLK_STS_RSV_CONFLICT]    = { -EBADE,    "resevation conflict" },


You misspelled "reservation". :)

And since you want a different error, why reuse EBADE for the errno? That is
already used for BLK_STS_NEXUS that you're trying to differentiate from, right?
At least for nvme, this error code is returned when the host lacks sufficient
rights, so something like EACCESS might make sense.

Looks good otherwise.


Welll ... BLK_STS_NEXUS _is_ the reservation error.


I was not sure of xen/virtio scsi uses of BLK_STS_NEXUS/DID_NEXUS_FAILURE.
The virtio spec's description for VIRTIO_SCSI_S_NEXUS_FAILURE:

 if the nexus is suffering a failure but retrying on other paths might
 yield a different result.

looks like the description for DID_NEXUS_FAILURE in scsi_status.h.
To me the the description sounded generic where it could used for
other errors like the endpoint/port for the I_T is removed.

However, the qemu code only uses VIRTIO_SCSI_S_NEXUS_FAILURE for
reservation conflicts. If we are saying that is always the case in
other virt implementations, I don't even need this patch :) and we
can do what you requested and do more of a rename.


Well ... we tried to find a generic error for reservation failure, as we 
thought that reservation failure was too SCSI specific.
And we wanted the error to describe what the resulting handling should 
be, not what the cause was. Hence we ended up with BLK_STS_NEXUS.


But turns out that our initial assumption wasn't valid, and that 
reservations are a general concept. So by all means, rename 
BLK_STS_NEXUS to BLK_STS_RSV_CONFLICT to make it clear what this error 
is about.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.

2022-06-04 Thread Hannes Reinecke

On 6/3/22 21:45, Keith Busch wrote:

On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote:

@@ -171,6 +171,7 @@ static const struct {
/* zone device specific errors */
[BLK_STS_ZONE_OPEN_RESOURCE]= { -ETOOMANYREFS, "open zones 
exceeded" },
[BLK_STS_ZONE_ACTIVE_RESOURCE]  = { -EOVERFLOW, "active zones exceeded" 
},
+   [BLK_STS_RSV_CONFLICT]  = { -EBADE, "resevation conflict" },


You misspelled "reservation". :)

And since you want a different error, why reuse EBADE for the errno? That is
already used for BLK_STS_NEXUS that you're trying to differentiate from, right?
At least for nvme, this error code is returned when the host lacks sufficient
rights, so something like EACCESS might make sense.

Looks good otherwise.


Welll ... BLK_STS_NEXUS _is_ the reservation error.

I'd rather modify the existing code to return BLK_STS_RSV_CONFLICT 
instead of BLK_STS_NEXUS, but keep the 'EBADE' mapping.

Userspace ABI and all that.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

2022-05-24 Thread Hannes Reinecke

On 5/24/22 09:10, Pankaj Raghav wrote:

On 5/24/22 07:19, Hannes Reinecke wrote:


diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c4e4c7071b7b..f5c7a41032ba 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -676,6 +676,21 @@ static inline unsigned int
blk_queue_zone_no(struct request_queue *q,
   return div64_u64(sector, zone_sectors);
   }
   +static inline bool blk_queue_is_zone_start(struct request_queue *q,
sector_t sec)
+{
+    sector_t zone_sectors = blk_queue_zone_sectors(q);
+    u64 remainder = 0;
+
+    if (!blk_queue_is_zoned(q))
+    return false;
+


Not sure if we need this here; surely blk_queue_zone_sectors() will
already barf, and none of the callers did this check.


I totally agree with you but all the other blk_queue_* functions had
this defensive check and I didn't want to break that pattern:


Ah, fair enough.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 6/7] null_blk: use zone_size_sects_shift for power of 2 zoned devices

2022-05-23 Thread Hannes Reinecke

On 5/23/22 18:16, Pankaj Raghav wrote:

Instead of doing is_power_of_2 and ilog2 operation for every IO, cache
the zone_size_sects_shift variable and use it for power of 2 zoned
devices.

This variable will be set to zero for non power of 2 zoned devices.

Suggested-by: Damien Le Moal 
Signed-off-by: Pankaj Raghav 
---
  drivers/block/null_blk/null_blk.h |  6 ++
  drivers/block/null_blk/zoned.c| 10 --
  2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk/null_blk.h 
b/drivers/block/null_blk/null_blk.h
index 78eb56b0ca55..3d6e41a9491f 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -74,6 +74,12 @@ struct nullb_device {
unsigned int imp_close_zone_no;
struct nullb_zone *zones;
sector_t zone_size_sects;
+   /*
+* zone_size_sects_shift is only useful when the zone size is
+* power of 2. This variable is set to zero when zone size is non
+* power of 2.
+*/
+   unsigned int zone_size_sects_shift;
bool need_zone_res_mgmt;
spinlock_t zone_res_lock;
  
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c

index 00c34e65ef0a..806bef98ac83 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -13,8 +13,8 @@ static inline sector_t mb_to_sects(unsigned long mb)
  
  static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)

  {
-   if (is_power_of_2(dev->zone_size_sects))
-   return sect >> ilog2(dev->zone_size_sects);
+   if (dev->zone_size_sects_shift)
+   return sect >> dev->zone_size_sects_shift;
  
  	return div64_u64(sect, dev->zone_size_sects);

  }
@@ -82,6 +82,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct 
request_queue *q)
zone_capacity_sects = mb_to_sects(dev->zone_capacity);
dev_capacity_sects = mb_to_sects(dev->size);
dev->zone_size_sects = mb_to_sects(dev->zone_size);
+
+   if (is_power_of_2(dev->zone_size_sects))
+   dev->zone_size_sects_shift = ilog2(dev->zone_size_sects);
+   else
+   dev->zone_size_sects_shift = 0;
+
dev->nr_zones =
div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects),
      dev->zone_size_sects);

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 3/7] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size

2022-05-23 Thread Hannes Reinecke

On 5/23/22 18:15, Pankaj Raghav wrote:

Remove the condition which disallows non-power_of_2 zone size ZNS drive
to be updated and use generic method to calculate number of zones
instead of relying on log and shift based calculation on zone size.

The power_of_2 calculation has been replaced directly with generic
calculation without special handling. Both modified functions are not
used in hot paths, they are only used during initialization &
revalidation of the ZNS device.

As rounddown macro from math.h does not work for 32 bit architectures,
round down operation is open coded.

Reviewed-by: Luis Chamberlain 
Reviewed by: Adam Manzanares 
Signed-off-by: Pankaj Raghav 
---
  drivers/nvme/host/zns.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 9f81beb4df4e..d92f937d5cb9 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned 
lbaf)
}
  
  	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));

-   if (!is_power_of_2(ns->zsze)) {
-   dev_warn(ns->ctrl->device,
-   "invalid zone size:%llu for namespace:%u\n",
-   ns->zsze, ns->head->ns_id);
-   status = -ENODEV;
-   goto free_data;
-   }
  
  	blk_queue_set_zoned(ns->disk, BLK_ZONED_HM);

blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
@@ -128,8 +121,13 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns 
*ns,
const size_t min_bufsize = sizeof(struct nvme_zone_report) +
   sizeof(struct nvme_zone_descriptor);
  
+	/*

+* Division is used to calculate nr_zones with no special handling
+* for power of 2 zone sizes as this function is not invoked in a
+* hot path
+*/
nr_zones = min_t(unsigned int, nr_zones,
-get_capacity(ns->disk) >> ilog2(ns->zsze));
+div64_u64(get_capacity(ns->disk), ns->zsze));
  
  	bufsize = sizeof(struct nvme_zone_report) +

nr_zones * sizeof(struct nvme_zone_descriptor);
@@ -182,6 +180,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t 
sector,
int ret, zone_idx = 0;
unsigned int nz, i;
size_t buflen;
+   u64 remainder = 0;
  
  	if (ns->head->ids.csi != NVME_CSI_ZNS)

return -EINVAL;
@@ -197,7 +196,11 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t 
sector,
c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
  
-	sector &= ~(ns->zsze - 1);

+   /*
+* Round down the sector value to the nearest zone start
+*/
+   div64_u64_rem(sector, ns->zsze, );
+   sector -= remainder;
while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) {
    memset(report, 0, buflen);
  

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

2022-05-23 Thread Hannes Reinecke
n IS_ALIGNED(sec, zone_sectors);
+
+   div64_u64_rem(sec, zone_sectors, );
+   return remainder == 0;
+}
+
  static inline bool blk_queue_zone_is_seq(struct request_queue *q,
 sector_t sector)
  {
@@ -722,6 +737,12 @@ static inline unsigned int blk_queue_zone_no(struct 
request_queue *q,
  {
return 0;
  }
+
+static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t 
sec)
+{
+   return false;
+}
+
  static inline unsigned int queue_max_open_zones(const struct request_queue *q)
  {
return 0;

Otherwise looks good.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices

2022-05-20 Thread Hannes Reinecke

On 5/19/22 20:47, Damien Le Moal wrote:

On 5/19/22 16:34, Johannes Thumshirn wrote:

On 19/05/2022 05:19, Damien Le Moal wrote:

On 5/19/22 12:12, Luis Chamberlain wrote:

On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:

On 5/18/22 00:34, Theodore Ts'o wrote:

On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:

I'm a little surprised about all this activity.

I though the conclusion at LSF/MM was that for Linux itself there
is very little benefit in supporting this scheme.  It will massively
fragment the supported based of devices and applications, while only
having the benefit of supporting some Samsung legacy devices.


FWIW,

That wasn't my impression from that LSF/MM session, but once the
videos become available, folks can decide for themselves.


There was no real discussion about zone size constraint on the zone
storage BoF. Many discussions happened in the hallway track though.


Right so no direct clear blockers mentioned at all during the BoF.


Nor any clear OK.


So what about creating a device-mapper target, that's taking npo2 drives and
makes them po2 drives for the FS layers? It will be very similar code to
dm-linear.


+1

This will simplify the support for FSes, at least for the initial drop (if
accepted).

And more importantly, this will also allow addressing any potential
problem with user space breaking because of the non power of 2 zone size.


Seconded (or maybe thirded).

The changes to support npo2 in the block layer are pretty simple, and 
really I don't have an issue with those.
Then adding a device-mapper target transforming npo2 drives in po2 block 
devices should be pretty trivial.


And once that is in you can start arguing with the the FS folks on 
whether to implement it natively.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy

2022-04-27 Thread Hannes Reinecke

On 4/26/22 12:12, Nitesh Shetty wrote:

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three entries with [dest,src,len],
* [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */

int main(void)
{
int i, ret, fd;
unsigned long src = 0, dst = 32768, len = 4096;
struct copy_range *cr;
cr = (struct copy_range *)malloc(sizeof(*cr)+
(sizeof(struct range_entry)*3));
cr->nr_range = 3;
cr->reserved = 0;
for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
cr->range_list[i].dst = dst;
cr->range_list[i].src = src;
cr->range_list[i].len = len;
cr->range_list[i].comp_len = 0;
}
fd = open("/dev/nvme0n1", O_RDWR);
if (fd < 0) return 1;
ret = ioctl(fd, BLKCOPY, cr);
if (ret != 0)
   printf("copy failed, ret= %d\n", ret);
for (i=0; i< cr->nr_range; i++)
if (cr->range_list[i].len != cr->range_list[i].comp_len)
printf("Partial copy for entry %d: requested %llu, completed 
%llu\n",
i, 
cr->range_list[i].len,

cr->range_list[i].comp_len);
close(fd);
free(cr);
return ret;
}

Signed-off-by: Nitesh Shetty 
Signed-off-by: Javier González 
Signed-off-by: Arnav Dawn 
---
  block/ioctl.c   | 32 
  include/uapi/linux/fs.h |  9 +
  2 files changed, 41 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support

2022-04-27 Thread Hannes Reinecke

On 4/26/22 12:12, Nitesh Shetty wrote:

Add device limits as sysfs entries,
 - copy_offload (RW)
 - copy_max_bytes (RW)
 - copy_max_hw_bytes (RO)
 - copy_max_range_bytes (RW)
 - copy_max_range_hw_bytes (RO)
 - copy_max_nr_ranges (RW)
 - copy_max_nr_ranges_hw (RO)

Above limits help to split the copy payload in block layer.
copy_offload, used for setting copy offload(1) or emulation(0).
copy_max_bytes: maximum total length of copy in single payload.
copy_max_range_bytes: maximum length in a single entry.
copy_max_nr_ranges: maximum number of entries in a payload.
copy_max_*_hw_*: Reflects the device supported maximum limits.

Signed-off-by: Nitesh Shetty 
Signed-off-by: Kanchan Joshi 
Signed-off-by: Arnav Dawn 
---
  Documentation/ABI/stable/sysfs-block |  83 
  block/blk-settings.c |  59 
  block/blk-sysfs.c| 138 +++
  include/linux/blkdev.h   |  13 +++
  4 files changed, 293 insertions(+)



Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure

2022-04-27 Thread Hannes Reinecke
   }
+   ctx->cio = cio;
+   ctx->range_idx = ri;
+   ctx->start_sec = dst_blk;
+
+   read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | 
REQ_COPY | REQ_NOMERGE,
+   gfp_mask);
+   if (!read_bio) {
+   ret = -ENOMEM;
+   goto err_read_bio;
+   }
+   read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+   __bio_add_page(read_bio, token, PAGE_SIZE, 0);
+   /*__bio_add_page increases bi_size by len, so overwrite 
it with copy len*/
+   read_bio->bi_iter.bi_size = copy_len;
+   ret = submit_bio_wait(read_bio);
+   bio_put(read_bio);
+   if (ret)
+   goto err_read_bio;
+
+   write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | 
REQ_COPY | REQ_NOMERGE,
+   gfp_mask);
+   if (!write_bio) {
+   ret = -ENOMEM;
+   goto err_read_bio;
+   }
+   write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+   __bio_add_page(write_bio, token, PAGE_SIZE, 0);
+   /*__bio_add_page increases bi_size by len, so overwrite 
it with copy len*/
+   write_bio->bi_iter.bi_size = copy_len;
+   write_bio->bi_end_io = bio_copy_end_io;
+   write_bio->bi_private = ctx;
+
+   spin_lock_irqsave(>lock, flags);
+   ++cio->refcount;
+   spin_unlock_irqrestore(>lock, flags);
+
+   submit_bio(write_bio);
+   src_blk += copy_len;
+   dst_blk += copy_len;
+   }
+   }
+


Hmm. I'm not sure if I like the copy loop.
What I definitely would do is to allocate the write bio before reading 
data; after all, if we can't allocate the write bio reading is pretty 
much pointless.


But the real issue I have with this is that it's doing synchronous 
reads, thereby limiting the performance.


Can't you submit the write bio from the end_io function of the read bio?
That would disentangle things, and we should be getting a better 
performance.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm: fix race condition on Alpha

2022-03-28 Thread Hannes Reinecke

On 3/28/22 18:34, Mikulas Patocka wrote:

Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.

This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.

Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.


Errm.

Seeing that those alphas could only modify 64 bit atomically, what 
exactly do you gain by using 32-bit flags?

Shouldn't you be using 64-bit flags?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

2022-03-22 Thread Hannes Reinecke

On 3/22/22 16:17, John Garry wrote:

On 22/03/2022 14:03, Hannes Reinecke wrote:


As mentioned in the cover letter response, it just seems best to keep 
the normal scsi_cmnd payload but have other means to add on the 
internal command data, like using host_scribble or scsi_cmnd priv data.


Well; I found that most drivers I had been looking at the scsi command 
payload isn't used at all; the drivers primarily cared about the 
(driver-provided) payload, and were completely ignoring the scsi 
command payload.


Similar for ATA/libsas: you basically never issue real scsi commands, 
but either 'raw' ATA requests or SCSI TMFs. None of which are scsi 
commands, so providing them is a bit of a waste.


(And causes irritations, too, as a scsi command requires associated 
pointers like ->device etc to be set up. Which makes it tricky to use 
for the initial device setup.)


A problem I see is that in scsi_mq_init_request() we allocate memories 
like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd 
payload. If we then reuse a scsi_cmnd payload as an "internal" command 
payload then this data may be lost.


It might be possible to reuse the scsi cmnd payload for the "internal", 
but I would rather not get hung up on it now if possible.


Or, keep the payload as is, and use the 'internal' marker to indicate 
that the scsi payload is not valid.

That would save us quite some checks during endio processing.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

2022-03-22 Thread Hannes Reinecke

On 3/22/22 13:30, John Garry wrote:

On 22/03/2022 12:16, Hannes Reinecke wrote:

On 3/22/22 12:33, John Garry wrote:

On 22/03/2022 11:18, Christoph Hellwig wrote:

On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:

Add an API to allocate a request queue which accepts a custom set of
blk_mq_ops for that request queue.

The reason which we may want custom ops is for queuing requests 
which we

don't want to go through the normal queuing path.


Eww.  I really do not think we should do separate ops per queue, as 
that

is going to get us into a deep mess eventually.



Yeah... so far (here) it works out quite nicely, as we don't need to 
change the SCSI blk mq ops nor allocate a scsi_device - everything is 
just separate.


The other method mentioned previously was to add the request 
"reserved" flag and add new paths in scsi_queue_rq() et al to handle 
this, but that gets messy.


Any other ideas ...?



As outlined in the other mail, I think might be useful is to have a 
_third_ type of requests (in addition to the normal and the reserved 
ones).
That one would be allocated from the normal I/O pool (and hence could 
fail if the pool is exhausted), but would be able to carry a different 
payload (type) than the normal requests.


As mentioned in the cover letter response, it just seems best to keep 
the normal scsi_cmnd payload but have other means to add on the internal 
command data, like using host_scribble or scsi_cmnd priv data.


Well; I found that most drivers I had been looking at the scsi command 
payload isn't used at all; the drivers primarily cared about the 
(driver-provided) payload, and were completely ignoring the scsi command 
payload.


Similar for ATA/libsas: you basically never issue real scsi commands, 
but either 'raw' ATA requests or SCSI TMFs. None of which are scsi 
commands, so providing them is a bit of a waste.


(And causes irritations, too, as a scsi command requires associated 
pointers like ->device etc to be set up. Which makes it tricky to use 
for the initial device setup.)


And we could have a separate queue_rq for these requests, as we can 
differentiate them in the block layer.


I don't know, let me think about it. Maybe we could add an "internal" 
blk flag, which uses a separate "internal" queue_rq callback.



Yeah, that's what I had in mind.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

2022-03-22 Thread Hannes Reinecke

On 3/22/22 12:33, John Garry wrote:

On 22/03/2022 11:18, Christoph Hellwig wrote:

On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:

Add an API to allocate a request queue which accepts a custom set of
blk_mq_ops for that request queue.

The reason which we may want custom ops is for queuing requests which we
don't want to go through the normal queuing path.


Eww.  I really do not think we should do separate ops per queue, as that
is going to get us into a deep mess eventually.



Yeah... so far (here) it works out quite nicely, as we don't need to 
change the SCSI blk mq ops nor allocate a scsi_device - everything is 
just separate.


The other method mentioned previously was to add the request "reserved" 
flag and add new paths in scsi_queue_rq() et al to handle this, but that 
gets messy.


Any other ideas ...?



As outlined in the other mail, I think might be useful is to have a 
_third_ type of requests (in addition to the normal and the reserved ones).
That one would be allocated from the normal I/O pool (and hence could 
fail if the pool is exhausted), but would be able to carry a different 
payload (type) than the normal requests.
And we could have a separate queue_rq for these requests, as we can 
differentiate them in the block layer.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH RFC 00/11] blk-mq/libata/scsi: SCSI driver tagging improvements

2022-03-22 Thread Hannes Reinecke

On 3/22/22 11:39, John Garry wrote:

Currently SCSI low-level drivers are required to manage tags for commands
which do not come via the block layer - libata internal commands would be
an example of one of these.

There was some work to provide "reserved commands" support in such series
as https://lore.kernel.org/linux-scsi/20211125151048.103910-1-h...@suse.de/

This was based on allocating a request for the lifetime of the "internal"
command.

This series tries to solve that problem by not just allocating the request
but also sending it through the block layer, that being the normal flow
for a request. We need to do this as we may only poll completion of
requests through the block layer, so would need to do this for poll queue
support.

There is still scope to allocate commands just to get a tag as token as
that may suit some other scenarios, but it's not what we do here.

This series extends blk-mq to support a request queue having a custom set
of ops. In addition SCSI core code adds support for these type of requests.

This series does not include SCSI core handling for enabling reserved
tags per tagset, but that would be easy to add.

Based on mkp-scsi 5.18/scsi-staging @ 66daf3e6b993

Please consider as an RFC for now. I think that the libata change has the
largest scope for improvement...



Grand seeing that someone is taking it up. Thanks for doing this!

But:
Allocating a queue for every request (eg in patch 3) is overkill. If we 
want to go that route we should be allocating the queue upfront (eg when 
creating the device itself), and then just referencing it.


However, can't say I like this approach. I've been playing around with 
supporting internal commands, and really found two constraints really 
annoying:


- The tagset supports only _one_ set of payload via
  blk_mq_rq_(to,from)_pdu().
This requires each request to be of the same type, and with that making
it really hard for re-purposing the request for internal usage. In the
end I settled by just keeping it and skipping the SCSI command field.
If we could have a distinct PDU type for internal commands I guess 
things would be easier.


- The number of reserved commands is static.
With that it's getting really hard using reserved commands with 
low-queue depth devices like ATA; we only have 31 commands to work with, 
and setting one or two aside for TMF is really making a difference 
performance wise. It would be _awesome_ if we could allocate reserved 
commands dynamically (ie just marking a command as 'reserved' once 
allocated).
Sure, it won't have the same guarantees as 'real' reserved commands, but 
in most cases we don't actually need that.


Maybe these are some lines we could investigate?
Hmm?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 1/2] dm-zoned: remove the ->name field in struct dmz_dev

2022-03-01 Thread Hannes Reinecke

On 3/1/22 09:38, Christoph Hellwig wrote:

Just use the %pg format specifier to print the block device name
directly.

Signed-off-by: Christoph Hellwig 
---
  drivers/md/dm-zoned-metadata.c | 4 ++--
  drivers/md/dm-zoned-target.c   | 1 -
  drivers/md/dm-zoned.h  | 9 -
  3 files changed, 6 insertions(+), 8 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver

2022-02-04 Thread Hannes Reinecke

On 2/4/22 10:58, Chaitanya Kulkarni wrote:

On 2/4/22 12:24 AM, Javier González wrote:

[ .. ]

For a software-only solution, we have experimented with something
similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
the nvme-loop target as an alternative and this seems to work pretty
nicely. I do not believe there should be many changes to support copy
offload using this.



If QEMU is so incompetent then we need to add every big feature into
the NVMeOF test target so that we can test it better ? is that what
you are proposing ? since if we implement one feature, it will be
hard to nack any new features that ppl will come up with
same rationale "with QEMU being slow and hard to test race
conditions etc .."



How would you use qemu for bare-metal testing?


and if that is the case why we don't have ZNS NVMeOF target
memory backed emulation ? Isn't that a bigger and more
complicated feature than Simple Copy where controller states
are involved with AENs ?

ZNS kernel code testing is also done on QEMU, I've also fixed
bugs in the ZNS kernel code which are discovered on QEMU and I've not
seen any issues with that. Given that simple copy feature is way smaller
than ZNS it will less likely to suffer from slowness and etc (listed
above) in QEMU.

my point is if we allow one, we will be opening floodgates and we need
to be careful not to bloat the code unless it is _absolutely
necessary_ which I don't think it is based on the simple copy
specification.



I do have a slightly different view on the nvme target code; it should 
provide the necessary means to test the nvme host code.
And simple copy is on of these features, especially as it will operate 
as an exploiter of the new functionality.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

2021-10-28 Thread Hannes Reinecke

On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote:

On 10/7/21 11:49 PM, Javier González wrote:

External email: Use caution opening links or attachments


On 06.10.2021 10:33, Bart Van Assche wrote:

On 10/6/21 3:05 AM, Javier González wrote:

I agree that the topic is complex. However, we have not been able to
find a clear path forward in the mailing list.


Hmm ... really? At least Martin Petersen and I consider device mapper
support essential. How about starting from Mikulas' patch series that
supports the device mapper? See also
https://lore.kernel.org/all/alpine.lrh.2.02.2108171630120.30...@file01.intranet.prod.int.rdu2.redhat.com/



When we add a new REQ_OP_XXX we need to make sure it will work with
device mapper, so I agree with Bart and Martin.

Starting with Mikulas patches is a right direction as of now..



Thanks for the pointers. We are looking into Mikulas' patch - I agree
that it is a good start.


What do you think about joining the call to talk very specific next
steps to get a patchset that we can start reviewing in detail.


I can do that.


Thanks. I will wait until Chaitanya's reply on his questions. We will
start suggesting some dates then.



I think at this point we need to at least decide on having a first call
focused on how to proceed forward with Mikulas approach  ...

Javier, can you please organize a call with people you listed in this
thread earlier ?

Also Keith presented his work on a simple zone-based remapping block 
device, which included an in-kernel copy offload facility.
Idea is to lift that as a standalone patch such that we can use it a 
fallback (ie software) implementation if no other copy offload mechanism 
is available.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH RFC] multipath-tools: remove Hannes as IBM arrays maintainer

2021-10-02 Thread Hannes Reinecke

On 10/1/21 8:12 PM, Xose Vazquez Perez wrote:

Cc: Hannes Reinecke 
Cc: Martin Wilck 
Cc: Benjamin Marzinski 
Cc: Christophe Varoqui 
Cc: DM-DEVEL ML 
Signed-off-by: Xose Vazquez Perez 
---
  libmultipath/hwtable.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 763982cd..11282699 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -482,8 +482,6 @@ static struct hwentry default_hw[] = {
},
/*
 * IBM
-*
-* Maintainer: Hannes Reinecke 
 */
{
/* ProFibre 4000R */


Hmm. You could've asked me ...
Anyway.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 2/8] scsi/sr: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/scsi/sr.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 1/8] scsi/sd: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/scsi/sd.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 8/8] nbd: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/block/nbd.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 7/8] loop: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/block/loop.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 6/8] dm: add add_disk() error handling

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.

Signed-off-by: Luis Chamberlain 
---
  drivers/md/dm.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 5/8] md: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We just do the unwinding of what was not done before, and are
sure to unlock prior to bailing.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/md/md.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 3/8] nvme: add error handling support for add_disk()

2021-09-06 Thread Hannes Reinecke

On 8/30/21 11:25 PM, Luis Chamberlain wrote:

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Luis Chamberlain 
---
  drivers/nvme/host/core.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8679a108f571..687d3be563a3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3763,7 +3763,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid,
  
  	nvme_get_ctrl(ctrl);
  
-	device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);

+   if (device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups))
+   goto out_cleanup_ns_from_list;
+
if (!nvme_ns_head_multipath(ns->head))
nvme_add_ns_cdev(ns);
  
@@ -3773,6 +3775,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
  
  	return;
  
+ out_cleanup_ns_from_list:

+   nvme_put_ctrl(ctrl);
+   down_write(>namespaces_rwsem);
+   list_del_init(>list);
+   up_write(>namespaces_rwsem);
   out_unlink_ns:
mutex_lock(>subsys->lock);
list_del_rcu(>siblings);

I would rather turn this around, and call 'nvme_put_ctrl()' after 
removing the namespace from the list. But it's probably more a style 
issue, come to think of it.


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

  1   2   3   4   5   6   7   8   9   >