Re: [PATCH 26/26] block: move the bounce flag into the features field

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the bounce flag into the features field to reclaim a little bit of
space.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-settings.c| 1 -
  block/blk.h | 2 +-
  drivers/scsi/scsi_lib.c | 2 +-
  include/linux/blkdev.h  | 6 --
  4 files changed, 6 insertions(+), 5 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 25/26] block: move the skip_tagset_quiesce flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the skip_tagset_quiesce flag into the queue_limits feature field so
that it can be set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c   | 1 -
  drivers/nvme/host/core.c | 8 +---
  include/linux/blkdev.h   | 6 --
  3 files changed, 9 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 24/26] block: move the pci_p2pdma flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the pci_p2pdma flag into the queue_limits feature field so that it
can be set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c   | 1 -
  drivers/nvme/host/core.c | 8 +++-
  include/linux/blkdev.h   | 7 ---
  3 files changed, 7 insertions(+), 9 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 23/26] block: move the zone_resetall flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the zone_resetall flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c | 1 -
  drivers/block/null_blk/zoned.c | 3 +--
  drivers/block/ublk_drv.c   | 4 +---
  drivers/block/virtio_blk.c | 3 +--
  drivers/nvme/host/zns.c| 3 +--
  drivers/scsi/sd_zbc.c  | 5 +
  include/linux/blkdev.h | 6 --
  7 files changed, 9 insertions(+), 16 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 22/26] block: move the zoned flag into the features field

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the zoned flags into the features field to reclaim a little
bit of space.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-settings.c   |  5 ++---
  drivers/block/null_blk/zoned.c |  2 +-
  drivers/block/ublk_drv.c   |  2 +-
  drivers/block/virtio_blk.c |  5 +++--
  drivers/md/dm-table.c  | 11 ++-
  drivers/md/dm-zone.c   |  2 +-
  drivers/md/dm-zoned-target.c   |  2 +-
  drivers/nvme/host/zns.c|  2 +-
  drivers/scsi/sd_zbc.c  |  2 +-
  include/linux/blkdev.h |  9 ++---
  10 files changed, 23 insertions(+), 19 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 21/26] block: move the poll flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the poll flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-core.c  |  5 ++--
  block/blk-mq-debugfs.c|  1 -
  block/blk-mq.c| 31 +++-
  block/blk-settings.c  | 10 ---
  block/blk-sysfs.c |  4 +--
  drivers/md/dm-table.c | 54 +--
  drivers/nvme/host/multipath.c | 12 +---
  include/linux/blkdev.h|  4 ++-
  8 files changed, 45 insertions(+), 76 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 20/26] block: move the dax flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the dax flag into the queue_limits feature field so that it can be
set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c   | 1 -
  drivers/md/dm-table.c| 4 ++--
  drivers/nvdimm/pmem.c| 7 ++-
  drivers/s390/block/dcssblk.c | 2 +-
  include/linux/blkdev.h   | 6 --
  5 files changed, 9 insertions(+), 11 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 19/26] block: move the nowait flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the nowait flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig 
---
  block/blk-mq-debugfs.c|  1 -
  block/blk-mq.c|  2 +-
  block/blk-settings.c  |  9 +
  drivers/block/brd.c   |  4 ++--
  drivers/md/dm-table.c | 18 +++---
  drivers/md/md.c   | 18 +-
  drivers/nvme/host/multipath.c |  3 +--
  include/linux/blkdev.h|  9 +
  8 files changed, 22 insertions(+), 42 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 18/26] block: move the synchronous flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the synchronous flag into the queue_limits feature field so that it
can be set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c| 1 -
  drivers/block/brd.c   | 2 +-
  drivers/block/zram/zram_drv.c | 4 ++--
  drivers/nvdimm/btt.c  | 3 +--
  drivers/nvdimm/pmem.c | 4 ++--
  include/linux/blkdev.h| 7 ---
  6 files changed, 10 insertions(+), 11 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 17/26] block: move the stable_writes flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the stable_writes flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.

The flag is now inherited by blk_stack_limits, which greatly simplifies
the code in dm, and fixed md which previously did not pass on the flag
set on lower devices.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c |  1 -
  block/blk-sysfs.c  | 29 +
  drivers/block/drbd/drbd_main.c |  5 ++---
  drivers/block/rbd.c|  9 +++--
  drivers/block/zram/zram_drv.c  |  2 +-
  drivers/md/dm-table.c  | 19 ---
  drivers/md/raid5.c |  6 --
  drivers/mmc/core/queue.c   |  5 +++--
  drivers/nvme/host/core.c   |  9 +
  drivers/nvme/host/multipath.c  |  4 
  drivers/scsi/iscsi_tcp.c   |  8 
  include/linux/blkdev.h |  9 ++---
  12 files changed, 29 insertions(+), 77 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 16/26] block: move the io_stat flag setting to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the io_stat flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Simplify md and dm to set the flag unconditionally instead of avoiding
setting a simple flag for cases where it already is set by other means,
which is a bit pointless.

Signed-off-by: Christoph Hellwig 
---
  block/blk-mq-debugfs.c|  1 -
  block/blk-mq.c|  6 +-
  block/blk-sysfs.c |  2 +-
  drivers/md/dm-table.c | 12 +---
  drivers/md/dm.c   | 13 +++--
  drivers/md/md.c   |  5 ++---
  drivers/nvme/host/multipath.c |  2 +-
  include/linux/blkdev.h|  9 +
  8 files changed, 26 insertions(+), 24 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 15/26] block: move the add_random flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the add_random flag into the queue_limits feature field so that it
can be set atomically with the queue frozen.

Note that this also removes code from dm to clear the flag based on
the underlying devices, which can't be reached as dm devices will
always start out without the flag set.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
  block/blk-mq-debugfs.c|  1 -
  block/blk-sysfs.c |  6 +++---
  drivers/block/mtip32xx/mtip32xx.c |  1 -
  drivers/md/dm-table.c | 18 --
  drivers/mmc/core/queue.c  |  2 --
  drivers/mtd/mtd_blkdevs.c |  3 ---
  drivers/s390/block/scm_blk.c  |  4 
  drivers/scsi/scsi_lib.c   |  3 +--
  drivers/scsi/sd.c | 11 +++
  include/linux/blkdev.h|  5 +++--
  10 files changed, 10 insertions(+), 44 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 14/26] block: move the nonrot flag to queue_limits

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the nonrot flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.

For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change.  There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).

The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 13/26] block: move cache control settings out of queue->flags

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: Christoph Hellwig 
Acked-by: Ulf Hansson  [mmc]
---


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 03/26] sd: move zone limits setup out of sd_read_block_characteristics

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

Move a bit of code that sets up the zone flag and the write granularity
into sd_zbc_read_zones to be with the rest of the zoned limits.

Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/sd.c | 21 +
  drivers/scsi/sd_zbc.c |  9 +
  2 files changed, 10 insertions(+), 20 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 01/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-17 Thread Hannes Reinecke

On 6/17/24 08:04, Christoph Hellwig wrote:

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes in the block layer as the
flags handling is moving to the atomic queue limits API that needs
user context to freeze the queue for that.  Instead handle the case
of the feature flags cleared inside of blkfront.  This removes old
debug code to check for such a mismatch which was previously impossible
to hit, including the check for passthrough requests that blkfront
never used to start with.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/xen-blkfront.c | 44 +++-
  1 file changed, 23 insertions(+), 21 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 13/26] block: move cache control settings out of queue->flags

2024-06-11 Thread Hannes Reinecke
 (rq->cmd_flags & REQ_FUA))
+   if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
policy |= REQ_FSEQ_POSTFLUSH;
}
  
@@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)

 * REQ_PREFLUSH and FUA for the driver.
 */
rq->cmd_flags &= ~REQ_PREFLUSH;
-   if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
+   if (!supports_fua)
rq->cmd_flags &= ~REQ_FUA;
  
  	/*

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 770c0c2b72faaa..e8b9db7c30c455 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(INIT_DONE),
QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
-   QUEUE_FLAG_NAME(WC),
-   QUEUE_FLAG_NAME(FUA),
QUEUE_FLAG_NAME(DAX),
QUEUE_FLAG_NAME(STATS),
QUEUE_FLAG_NAME(REGISTERED),
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f11c8676eb4c67..536ee202fcdccb 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
lim->misaligned = 0;
}
  
+	if (!(lim->features & BLK_FEAT_WRITE_CACHE))

+   lim->features &= ~BLK_FEAT_FUA;
+
err = blk_validate_integrity_limits(lim);
if (err)
return err;
@@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct 
queue_limits *b,
  {
unsigned int top, bottom, alignment, ret = 0;
  
+	t->features |= (b->features & BLK_FEAT_INHERIT_MASK);

+
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
t->max_user_sectors = min_not_zero(t->max_user_sectors,
b->max_user_sectors);
@@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned 
int depth)
  }
  EXPORT_SYMBOL(blk_set_queue_depth);
  
-/**

- * blk_queue_write_cache - configure queue's write cache
- * @q: the request queue for the device
- * @wc:write back cache on or off
- * @fua:   device supports FUA writes, if true
- *
- * Tell the block layer about the write cache of @q.
- */
-void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
-{
-   if (wc) {
-   blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
-   blk_queue_flag_set(QUEUE_FLAG_WC, q);
-   } else {
-   blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
-   blk_queue_flag_clear(QUEUE_FLAG_WC, q);
-   }
-   if (fua)
-   blk_queue_flag_set(QUEUE_FLAG_FUA, q);
-   else
-   blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
-}
-EXPORT_SYMBOL_GPL(blk_queue_write_cache);
-
  int bdev_alignment_offset(struct block_device *bdev)
  {
struct request_queue *q = bdev_get_queue(bdev);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5c787965b7d09e..4f524c1d5e08bd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct 
request_queue *q, const char *page,
  
  static ssize_t queue_wc_show(struct request_queue *q, char *page)

  {
-   if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
-   return sprintf(page, "write back\n");
-
-   return sprintf(page, "write through\n");
+   if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)


Where is the difference between 'flags' and 'features'?
Ie why is is named BLK_FEAT_FUA but BLK_FLAGS_WRITE_CACHE_DISABLED?
And if the feature is the existence of a capability, and the flag is
the setting of that capability, can you make it clear in the documentation?

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 12/26] block: remove blk_flush_policy

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Fold blk_flush_policy into the only caller to prepare for pending changes
to it.

Signed-off-by: Christoph Hellwig 
---
  block/blk-flush.c | 33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 11/26] block: freeze the queue in queue_attr_store

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

queue_attr_store updates attributes used to control generating I/O, and
can cause malformed bios if changed with I/O in flight.  Freeze the queue
in common code instead of adding it to almost every attribute.

Signed-off-by: Christoph Hellwig 
---
  block/blk-mq.c| 5 +++--
  block/blk-sysfs.c | 9 ++---
  2 files changed, 5 insertions(+), 9 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes when they fail as that is
a grave error.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/xen-blkfront.c | 29 +
  1 file changed, 9 insertions(+), 20 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Move setting the cache control flags in nbd in preparation for moving
these flags into the queue_limits structure.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/nbd.c | 17 +++--
  1 file changed, 7 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

virtblk_update_cache_mode boils down to a single call to
blk_queue_write_cache.  Remove it in preparation for moving the cache
control flags into the queue_limits.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/virtio_blk.c | 13 +++--
  1 file changed, 3 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

This prepares for moving the rotational flag into the queue_limits and
also fixes it for the case where the loop device is backed by a block
device.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 23 ---
  1 file changed, 4 insertions(+), 19 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 01/26] sd: fix sd_is_zoned

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Since commit 7437bb73f087 ("block: remove support for the host aware zone
model"), only ZBC devices expose a zoned access model.  sd_is_zoned is
used to check for that and thus return false for host aware devices.

Fixes: 7437bb73f087 ("block: remove support for the host aware zone model")
Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/sd.h | 7 ++-
  drivers/scsi/sd_zbc.c | 7 +--
  2 files changed, 7 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Move a bit of code that sets up the zone flag and the write granularity
into sd_zbc_read_zones to be with the rest of the zoned limits.

Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/sd.c | 21 +
  drivers/scsi/sd_zbc.c | 13 -
  2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 85b45345a27739..5bfed61c70db8f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3308,29 +3308,10 @@ static void sd_read_block_characteristics(struct 
scsi_disk *sdkp,
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
}
  
-

-#ifdef CONFIG_BLK_DEV_ZONED /* sd_probe rejects ZBD devices early otherwise */
-   if (sdkp->device->type == TYPE_ZBC) {
-   lim->zoned = true;
-
-   /*
-* Per ZBC and ZAC specifications, writes in sequential write
-* required zones of host-managed devices must be aligned to
-* the device physical block size.
-*/
-   lim->zone_write_granularity = sdkp->physical_block_size;
-   } else {
-   /*
-* Host-aware devices are treated as conventional.
-*/
-   lim->zoned = false;
-   }
-#endif /* CONFIG_BLK_DEV_ZONED */
-
if (!sdkp->first_scan)
return;
  
-	if (lim->zoned)

+   if (sdkp->device->type == TYPE_ZBC)


Why not sd_is_zoned()?

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 06/26] loop: also use the default block size from an underlying block device

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Fix the code in loop_reconfigure_limits to pick a default block size for
O_DIRECT file descriptors to also work when the loop device sits on top
of a block device and not just on a regular file on a block device based
file system.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

The LOOP_CONFIGURE path automatically upgrades the block size to that
of the underlying file for O_DIRECT file descriptors, but the
LOOP_SET_BLOCK_SIZE path does not.  Fix this by lifting the code to
pick the block size into common code.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 25 +++--
  1 file changed, 15 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

Simplify loop_reconfigure_limits by always updating the discard limits.
This adds a little more work to loop_set_block_size, but doesn't change
the outcome as the discard flag won't change.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

__loop_clr_fd wants to clear all settings on the device.  Prepare for
moving more settings into the block limits by open coding
loop_reconfigure_limits.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



Re: [PATCH v7 1/3 RESEND] block:sed-opal: SED Opal keystore

2023-08-16 Thread Hannes Reinecke

On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

Add read and write functions that allow SED Opal keys to stored
in a permanent keystore.


Probably state that these are dummy functions only.


Signed-off-by: Greg Joyce 
Reviewed-by: Jonathan Derrick 
---
  block/Makefile   |  2 +-
  block/sed-opal-key.c | 24 
  include/linux/sed-opal-key.h | 15 +++
  3 files changed, 40 insertions(+), 1 deletion(-)
  create mode 100644 block/sed-opal-key.c
  create mode 100644 include/linux/sed-opal-key.h

diff --git a/block/Makefile b/block/Makefile
index 46ada9dc8bbf..ea07d80402a6 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_BLK_DEV_ZONED)   += blk-zoned.o
  obj-$(CONFIG_BLK_WBT) += blk-wbt.o
  obj-$(CONFIG_BLK_DEBUG_FS)+= blk-mq-debugfs.o
  obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
-obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
+obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o sed-opal-key.o
  obj-$(CONFIG_BLK_PM)  += blk-pm.o
  obj-$(CONFIG_BLK_INLINE_ENCRYPTION)   += blk-crypto.o blk-crypto-profile.o \
   blk-crypto-sysfs.o
diff --git a/block/sed-opal-key.c b/block/sed-opal-key.c
new file mode 100644
index ..16f380164c44
--- /dev/null
+++ b/block/sed-opal-key.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SED key operations.
+ *
+ * Copyright (C) 2022 IBM Corporation
+ *
+ * These are the accessor functions (read/write) for SED Opal
+ * keys. Specific keystores can provide overrides.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+int __weak sed_read_key(char *keyname, char *key, u_int *keylen)
+{
+   return -EOPNOTSUPP;
+}
+
+int __weak sed_write_key(char *keyname, char *key, u_int keylen)
+{
+   return -EOPNOTSUPP;
+}


Hmm. We do have security/keys, which is using a 'struct key' for
their operations.
Why don't you leverage that structure?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v7 3/3 RESEND] powerpc/pseries: PLPKS SED Opal keystore support

2023-08-16 Thread Hannes Reinecke

On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

Define operations for SED Opal to read/write keys
from POWER LPAR Platform KeyStore(PLPKS). This allows
non-volatile storage of SED Opal keys.

Signed-off-by: Greg Joyce 
Reviewed-by: Jonathan Derrick 
---
  arch/powerpc/platforms/pseries/Kconfig|   6 +
  arch/powerpc/platforms/pseries/Makefile   |   1 +
  .../powerpc/platforms/pseries/plpks_sed_ops.c | 114 ++
  block/Kconfig |   1 +
  4 files changed, 122 insertions(+)
  create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v7 2/3 RESEND] block: sed-opal: keystore access for SED Opal keys

2023-08-16 Thread Hannes Reinecke

On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

Allow for permanent SED authentication keys by
reading/writing to the SED Opal non-volatile keystore.

Signed-off-by: Greg Joyce 
Reviewed-by: Jonathan Derrick 
---
  block/sed-opal.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/sed-opal.c b/block/sed-opal.c
index 6d7f25d1711b..fa23a6a60485 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -3019,7 +3020,13 @@ static int opal_set_new_pw(struct opal_dev *dev, struct 
opal_new_pw *opal_pw)
if (ret)
return ret;
  
-	/* update keyring with new password */

+   /* update keyring and key store with new password */
+   ret = sed_write_key(OPAL_AUTH_KEY,
+   opal_pw->new_user_pw.opal_key.key,
+   opal_pw->new_user_pw.opal_key.key_len);
+   if (ret != -EOPNOTSUPP)
+   pr_warn("error updating SED key: %d\n", ret);
+
ret = update_sed_opal_key(OPAL_AUTH_KEY,
  opal_pw->new_user_pw.opal_key.key,
  opal_pw->new_user_pw.opal_key.key_len);
@@ -3292,6 +3299,8 @@ EXPORT_SYMBOL_GPL(sed_ioctl);
  static int __init sed_opal_init(void)
  {
struct key *kr;
+   char init_sed_key[OPAL_KEY_MAX];
+   int keylen = OPAL_KEY_MAX - 1;
  
  	kr = keyring_alloc(".sed_opal",

   GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(),
@@ -3304,6 +3313,11 @@ static int __init sed_opal_init(void)
  
  	sed_opal_keyring = kr;
  
-	return 0;

+   if (sed_read_key(OPAL_AUTH_KEY, init_sed_key, &keylen) < 0) {
+   memset(init_sed_key, '\0', sizeof(init_sed_key));
+   keylen = OPAL_KEY_MAX - 1;
+   }
+
+   return update_sed_opal_key(OPAL_AUTH_KEY, init_sed_key, keylen);
  }
  late_initcall(sed_opal_init);


See the previous patch for comments about 'struct key'.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys

2022-12-01 Thread Hannes Reinecke

On 12/1/22 19:03, Greg Joyce wrote:

On Wed, 2022-11-30 at 08:00 +0100, Hannes Reinecke wrote:

On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

Extend the SED block driver so it can alternatively
obtain a key from a sed-opal kernel keyring. The SED
ioctls will indicate the source of the key, either
directly in the ioctl data or from the keyring.

This allows the use of SED commands in scripts such as
udev scripts so that drives may be automatically unlocked
as they become available.

Signed-off-by: Greg Joyce 
Reviewed-by: Jonathan Derrick 
---
   block/Kconfig |   1 +
   block/sed-opal.c  | 174
+-
   include/linux/sed-opal.h  |   3 +
   include/uapi/linux/sed-opal.h |   8 +-
   4 files changed, 183 insertions(+), 3 deletions(-)
  
+	ret = opal_get_key(dev, &opal_lrs->session.opal_key);

+   if (ret)
+   return ret;
mutex_lock(&dev->dev_lock);
setup_opal_dev(dev);
ret = execute_steps(dev, lr_steps, ARRAY_SIZE(lr_steps));
@@ -2622,6 +2759,14 @@ static int opal_set_new_pw(struct opal_dev
*dev, struct opal_new_pw *opal_pw)
ret = execute_steps(dev, pw_steps, ARRAY_SIZE(pw_steps));
mutex_unlock(&dev->dev_lock);
   
+	if (ret)

+   return ret;
+
+   /* update keyring with new password */
+   ret = update_sed_opal_key(OPAL_AUTH_KEY,
+ opal_pw->new_user_pw.opal_key.key,
+ opal_pw-

new_user_pw.opal_key.key_len);

+
return ret;
   }
   

What about key revocation?
You only allow to set a new key, but what happens with the old ones?


My understanding was that key_create_or_update() would not allow
duplicates so there shouldn't be old ones. Is that incorrect?


Ah, right, you only have one key.
But still, you might want to revoke that one, too, no?
(Think of decommissioning old drives ...)

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys

2022-11-29 Thread Hannes Reinecke
nly allow to set a new key, but what happens with the old ones?


@@ -2642,6 +2787,9 @@ static int opal_activate_user(struct opal_dev *dev,
return -EINVAL;
}
  
+	ret = opal_get_key(dev, &opal_session->opal_key);

+   if (ret)
+   return ret;
mutex_lock(&dev->dev_lock);
setup_opal_dev(dev);
ret = execute_steps(dev, act_steps, ARRAY_SIZE(act_steps));
@@ -2728,6 +2876,9 @@ static int opal_generic_read_write_table(struct opal_dev 
*dev,
  {
int ret, bit_set;
  
+	ret = opal_get_key(dev, &rw_tbl->key);

+   if (ret)
+   return ret;
mutex_lock(&dev->dev_lock);
setup_opal_dev(dev);
  
@@ -2776,9 +2927,9 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)

if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (!dev)
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
if (!(dev->flags & OPAL_FL_SUPPORTED))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
  
  	if (cmd & IOC_IN) {

p = memdup_user(arg, _IOC_SIZE(cmd));
@@ -2854,3 +3005,22 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, 
void __user *arg)
return ret;
  }
  EXPORT_SYMBOL_GPL(sed_ioctl);
+
+static int __init sed_opal_init(void)
+{
+   struct key *kr;
+
+   kr = keyring_alloc(".sed_opal",
+  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(),
+  (KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW |
+  KEY_USR_READ | KEY_USR_SEARCH | KEY_USR_WRITE,
+  KEY_ALLOC_NOT_IN_QUOTA,
+  NULL, NULL);
+   if (IS_ERR(kr))
+   return PTR_ERR(kr);
+
+   sed_opal_keyring = kr;
+
+   return 0;
+}
+late_initcall(sed_opal_init);


Shouldn't you free the keyring on exit?


diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h
index 7131d7f0eec2..57d483506b4a 100644
--- a/include/linux/sed-opal.h
+++ b/include/linux/sed-opal.h
@@ -24,6 +24,9 @@ bool opal_unlock_from_suspend(struct opal_dev *dev);
  struct opal_dev *init_opal_dev(void *data, sec_send_recv *send_recv);
  int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *ioctl_ptr);
  
+#define	OPAL_AUTH_KEY   "opal-boot-pin"

+#defineOPAL_AUTH_KEY_PREV  "opal-boot-pin-prev"
+
  static inline bool is_sed_ioctl(unsigned int cmd)
  {
switch (cmd) {
diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h
index fccde168e90c..6b79cdcf9518 100644
--- a/include/uapi/linux/sed-opal.h
+++ b/include/uapi/linux/sed-opal.h
@@ -44,10 +44,16 @@ enum opal_lock_state {
    OPAL_LK = 0x04, /* 0100 */
  };
  
+enum opal_key_type {

+   OPAL_INCLUDED = 0,  /* key[] is the key */
+   OPAL_KEYRING,   /* key is in keyring */
+};
+
  struct opal_key {
__u8 lr;
__u8 key_len;
-   __u8 __align[6];
+   __u8 key_type;
+   __u8 __align[5];
__u8 key[OPAL_KEY_MAX];
  };
  


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v3 2/3] block: sed-opal: Implement IOC_OPAL_REVERT_LSP

2022-11-29 Thread Hannes Reinecke

On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

This is used in conjunction with IOC_OPAL_REVERT_TPR to return a drive to
Original Factory State without erasing the data. If IOC_OPAL_REVERT_LSP
is called with opal_revert_lsp.options bit OPAL_PRESERVE set prior
to calling IOC_OPAL_REVERT_TPR, the drive global locking range will not
be erased.

Signed-off-by: Greg Joyce 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jonathan Derrick 
---
  block/opal_proto.h|  4 
  block/sed-opal.c  | 40 +++
  include/linux/sed-opal.h  |  1 +
  include/uapi/linux/sed-opal.h | 11 ++
  4 files changed, 56 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v3 1/3] block: sed-opal: Implement IOC_OPAL_DISCOVERY

2022-11-29 Thread Hannes Reinecke

On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote:

From: Greg Joyce 

Add IOC_OPAL_DISCOVERY ioctl to return raw discovery data to a SED Opal
application. This allows the application to display drive capabilities
and state.

Signed-off-by: Greg Joyce 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jonathan Derrick 
---
  block/sed-opal.c  | 38 ---
  include/linux/sed-opal.h  |  1 +
  include/uapi/linux/sed-opal.h |  6 ++
  3 files changed, 42 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



Re: [PATCH v2 2/3] scsi: BusLogic remove bus_to_virt

2022-06-21 Thread Hannes Reinecke

On 6/17/22 14:57, Arnd Bergmann wrote:

From: Arnd Bergmann 

The BusLogic driver is the last remaining driver that relies on the
deprecated bus_to_virt() function, which in turn only works on a few
architectures, and is incompatible with both swiotlb and iommu support.

Before commit 391e2f25601e ("[SCSI] BusLogic: Port driver to 64-bit."),
the driver had a dependency on x86-32, presumably because of this
problem. However, the change introduced another bug that made it still
impossible to use the driver on any 64-bit machine.

This was in turn fixed in commit 56f396146af2 ("scsi: BusLogic: Fix
64-bit system enumeration error for Buslogic"), 8 years later, which
shows that there are not a lot of users.

Maciej is still using the driver on 32-bit hardware, and Khalid mentioned
that the driver works with the device emulation used in VirtualBox
and VMware. Both of those only emulate it for Windows 2000 and older
operating systems that did not ship with the better LSI logic driver.

Do a minimum fix that searches through the list of descriptors to find
one that matches the bus address. This is clearly as inefficient as
was indicated in the code comment about the lack of a bus_to_virt()
replacement. A better fix would likely involve changing out the entire
descriptor allocation for a simpler one, but that would be much
more invasive.

Cc: Maciej W. Rozycki 
Cc: Matt Wang 
Cc: Khalid Aziz 
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/BusLogic.c | 27 ---
  drivers/scsi/Kconfig|  2 +-
  2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index a897c8f914cf..d057abfcdd5c 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -2515,12 +2515,26 @@ static int blogic_resultcode(struct blogic_adapter 
*adapter,
return (hoststatus << 16) | tgt_status;
  }
  
+/*

+ * turn the dma address from an inbox into a ccb pointer
+ * This is rather inefficient.
+ */
+static struct blogic_ccb *
+blogic_inbox_to_ccb(struct blogic_adapter *adapter, struct blogic_inbox *inbox)
+{
+   struct blogic_ccb *ccb;
+
+   for (ccb = adapter->all_ccbs; ccb; ccb = ccb->next_all)
+   if (inbox->ccb == ccb->dma_handle)
+   break;
+
+   return ccb;
+}
  
  /*

blogic_scan_inbox scans the Incoming Mailboxes saving any
Incoming Mailbox entries for completion processing.
  */
-
  static void blogic_scan_inbox(struct blogic_adapter *adapter)
  {
/*
@@ -2540,16 +2554,7 @@ static void blogic_scan_inbox(struct blogic_adapter 
*adapter)
enum blogic_cmplt_code comp_code;
  
  	while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) {

-   /*
-  We are only allowed to do this because we limit our
-  architectures we run on to machines where bus_to_virt(
-  actually works.  There *needs* to be a dma_addr_to_virt()
-  in the new PCI DMA mapping interface to replace
-  bus_to_virt() or else this code is going to become very
-  innefficient.
-*/
-   struct blogic_ccb *ccb =
-   (struct blogic_ccb *) bus_to_virt(next_inbox->ccb);
+   struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter, 
adapter->next_inbox);
if (comp_code != BLOGIC_CMD_NOTFOUND) {
if (ccb->status == BLOGIC_CCB_ACTIVE ||
ccb->status == BLOGIC_CCB_RESET) {
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index cf75588a2587..56bdc08d0b77 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -513,7 +513,7 @@ config SCSI_HPTIOP
  
  config SCSI_BUSLOGIC

tristate "BusLogic SCSI support"
-   depends on PCI && SCSI && VIRT_TO_BUS
+   depends on PCI && SCSI
help
  This is support for BusLogic MultiMaster and FlashPoint SCSI Host
  Adapters. Consult the SCSI-HOWTO, available from


CCB handling in the driver is ugly anyway, so that'll be good enough.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 1/3] scsi: dpt_i2o: drop stale VIRT_TO_BUS dependency

2022-06-21 Thread Hannes Reinecke

On 6/17/22 14:57, Arnd Bergmann wrote:

From: Arnd Bergmann 

The dpt_i2o driver was fixed to stop using virt_to_bus() in 2008, but
it still has a stale reference in an error handling code path that could
never work.

Fix it up to build without VIRT_TO_BUS and remove the Kconfig dependency.

The alternative to this would be to just remove the driver, as it is
clearly obsolete. The i2o driver layer was removed in 2015 with commit
4a72a7af462d ("staging: remove i2o subsystem"), but the even older
dpt_i2o scsi driver stayed around.

The last non-cleanup patches I could find were from Miquel van Smoorenburg
and Mark Salyzyn back in 2008, they might know if there is any chance
of the hardware still being used anywhere.

Fixes: 67af2b060e02 ("[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to 
dma_alloc_coherent")
Cc: Miquel van Smoorenburg 
Cc: Mark Salyzyn 
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/Kconfig   | 2 +-
  drivers/scsi/dpt_i2o.c | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index a9fe5152addd..cf75588a2587 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -460,7 +460,7 @@ config SCSI_MVUMI
  
  config SCSI_DPT_I2O

tristate "Adaptec I2O RAID support "
-   depends on SCSI && PCI && VIRT_TO_BUS
+   depends on SCSI && PCI
help
  This driver supports all of Adaptec's I2O based RAID controllers as
  well as the DPT SmartRaid V cards.  This is an Adaptec maintained
diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 2e9155ba7408..55dfe7011912 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -52,11 +52,11 @@ MODULE_DESCRIPTION("Adaptec I2O RAID Driver");
  
  #include 

  #include 
+#include 
  #include 
  #include 
  
  #include 	/* for boot_cpu_data */

-#include   /* for virt_to_bus, etc. */
  
  #include 

  #include 
@@ -2112,7 +2112,7 @@ static irqreturn_t adpt_isr(int irq, void *dev_id)
} else {
/* Ick, we should *never* be here */
printk(KERN_ERR "dpti: reply frame not from pool\n");
-   reply = (u8 *)bus_to_virt(m);
+   goto out;
    }
  
  		if (readl(reply) & MSG_FAIL) {


Reviewed-by: Hannes Reinecke 

Personally I wouldn't mind to see this driver gone, as it's being built 
upon the (long-defunct) I2O specification. We already deleted the i2o 
subsystem years ago, so maybe it's time to consign this driver to 
history, too.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 14/26] md: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-24 Thread Hannes Reinecke

On 5/24/21 9:26 AM, Christoph Hellwig wrote:

On Sun, May 23, 2021 at 10:12:49AM +0200, Hannes Reinecke wrote:

+   blk_set_stacking_limits(&mddev->queue->limits);
blk_queue_write_cache(mddev->queue, true, true);
/* Allow extended partitions.  This makes the
 * 'mdp' device redundant, but we can't really


Wouldn't it make sense to introduce a helper 'blk_queue_from_disk()' or
somesuch to avoid having to keep an explicit 'queue' pointer?


My rought plan is that a few series from now bio based drivers will
never directly deal with the request_queue at all.


Go for it.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 26/26] block: unexport blk_alloc_queue

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

blk_alloc_queue is just an internal helper now, unexport it and remove
it from the public header.

Signed-off-by: Christoph Hellwig 
---
  block/blk-core.c   | 1 -
  block/blk.h| 2 ++
  include/linux/blkdev.h | 1 -
  3 files changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 25/26] null_blk: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.  Note that the
blk-mq mode is left with its own allocations scheme, to be handled later.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/null_blk/main.c | 38 +--
  1 file changed, 19 insertions(+), 19 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 24/26] xpram: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the xpram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/s390/block/xpram.c | 26 +-
  1 file changed, 9 insertions(+), 17 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 23/26] dcssblk: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/s390/block/dcssblk.c | 26 --
  1 file changed, 8 insertions(+), 18 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 22/26] ps3vram: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the ps3vram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/ps3vram.c | 31 ---
  1 file changed, 8 insertions(+), 23 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 21/26] n64cart: convert to blk_alloc_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the n64cart driver to use the blk_alloc_disk helper to simplify
gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/n64cart.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 20/26] simdisk: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the simdisk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  arch/xtensa/platforms/iss/simdisk.c | 29 +++--
  1 file changed, 7 insertions(+), 22 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 19/26] nfblock: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the nfblock driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  arch/m68k/emu/nfblock.c | 20 +---
  1 file changed, 5 insertions(+), 15 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 18/26] nvme-multipath: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the nvme-multipath driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvdimm/pmem.c |  1 -
  drivers/nvme/host/multipath.c | 45 ++-
  2 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9fcd05084564..31f3c4bd6f72 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,7 +472,6 @@ static int pmem_attach_disk(struct device *dev,
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
  
  	disk->fops		= &pmem_fops;

-   disk->queue  = q;
disk->private_data   = pmem;
nvdimm_namespace_disk_name(ndns, disk->disk_name);
set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a5d02f236cca..b5fbdb416022 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -427,7 +427,6 @@ static void nvme_requeue_work(struct work_struct *work)
  
  int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)

  {
-   struct request_queue *q;
bool vwc = false;
  
  	mutex_init(&head->lock);

@@ -443,33 +442,24 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct 
nvme_ns_head *head)
if (!(ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) || !multipath)
return 0;
  
-	q = blk_alloc_queue(ctrl->numa_node);

-   if (!q)
-   goto out;
-   blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-   /* set to a default value for 512 until disk is validated */
-   blk_queue_logical_block_size(q, 512);
-   blk_set_stacking_limits(&q->limits);
-
-   /* we need to propagate up the VMC settings */
-   if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
-   vwc = true;
-   blk_queue_write_cache(q, vwc, vwc);
-
-   head->disk = alloc_disk(0);
+   head->disk = blk_alloc_disk(ctrl->numa_node);
if (!head->disk)
-   goto out_cleanup_queue;
+   return -ENOMEM;
head->disk->fops = &nvme_ns_head_ops;
head->disk->private_data = head;
-   head->disk->queue = q;
sprintf(head->disk->disk_name, "nvme%dn%d",
ctrl->subsys->instance, head->instance);
-   return 0;
  
-out_cleanup_queue:

-   blk_cleanup_queue(q);
-out:
-   return -ENOMEM;
+   blk_queue_flag_set(QUEUE_FLAG_NONROT, head->disk->queue);
+   /* set to a default value of 512 until the disk is validated */
+   blk_queue_logical_block_size(head->disk->queue, 512);
+   blk_set_stacking_limits(&head->disk->queue->limits);
+
+   /* we need to propagate up the VMC settings */
+   if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
+   vwc = true;
+   blk_queue_write_cache(head->disk->queue, vwc, vwc);
+   return 0;
  }
  
  static void nvme_mpath_set_live(struct nvme_ns *ns)

@@ -768,16 +758,7 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
/* make sure all pending bios are cleaned up */
kblockd_schedule_work(&head->requeue_work);
flush_work(&head->requeue_work);
-   blk_cleanup_queue(head->disk->queue);
-   if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
-   /*
-* if device_add_disk wasn't called, prevent
-* disk release to put a bogus reference on the
-* request queue
-*/
-   head->disk->queue = NULL;
-   }
-   put_disk(head->disk);
+   blk_cleanup_disk(head->disk);
  }
  
  void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl)



What about the check for GENHD_FL_UP a bit further up in line 766?
Can this still happen with the new allocation scheme, ie is there still 
a difference in lifetime between ->disk and ->disk->queue?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 17/26] nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the nvdimm-pmem driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvdimm/pmem.c | 15 +--
  1 file changed, 5 insertions(+), 10 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 16/26] nvdimm-btt: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the nvdimm-btt driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvdimm/btt.c | 24 +++-
  drivers/nvdimm/btt.h |  2 --
  2 files changed, 7 insertions(+), 19 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 15/26] nvdimm-blk: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the nvdimm-blk driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvdimm/blk.c | 26 ++
  1 file changed, 6 insertions(+), 20 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 14/26] md: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the md driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/md/md.c | 25 +
  1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 49f897fbb89b..d806be8cc210 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5598,12 +5598,10 @@ static void md_free(struct kobject *ko)
if (mddev->sysfs_level)
sysfs_put(mddev->sysfs_level);
  
-	if (mddev->gendisk)

+   if (mddev->gendisk) {
del_gendisk(mddev->gendisk);
-   if (mddev->queue)
-   blk_cleanup_queue(mddev->queue);
-   if (mddev->gendisk)
-   put_disk(mddev->gendisk);
+   blk_cleanup_disk(mddev->gendisk);
+   }
percpu_ref_exit(&mddev->writes_pending);
  
  	bioset_exit(&mddev->bio_set);

@@ -5711,20 +5709,13 @@ static int md_alloc(dev_t dev, char *name)
goto abort;
  
  	error = -ENOMEM;

-   mddev->queue = blk_alloc_queue(NUMA_NO_NODE);
-   if (!mddev->queue)
+   disk = blk_alloc_disk(NUMA_NO_NODE);
+   if (!disk)
goto abort;
  
-	blk_set_stacking_limits(&mddev->queue->limits);

-
-   disk = alloc_disk(1 << shift);
-   if (!disk) {
-   blk_cleanup_queue(mddev->queue);
-   mddev->queue = NULL;
-   goto abort;
-   }
disk->major = MAJOR(mddev->unit);
disk->first_minor = unit << shift;
+   disk->minors = 1 << shift;
if (name)
strcpy(disk->disk_name, name);
else if (partitioned)
@@ -5733,7 +5724,9 @@ static int md_alloc(dev_t dev, char *name)
sprintf(disk->disk_name, "md%d", unit);
disk->fops = &md_fops;
disk->private_data = mddev;
-   disk->queue = mddev->queue;
+
+   mddev->queue = disk->queue;
+   blk_set_stacking_limits(&mddev->queue->limits);
blk_queue_write_cache(mddev->queue, true, true);
/* Allow extended partitions.  This makes the
 * 'mdp' device redundant, but we can't really

Wouldn't it make sense to introduce a helper 'blk_queue_from_disk()' or 
somesuch to avoid having to keep an explicit 'queue' pointer?



Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 13/26] dm: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/md/dm.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..3c7c2d257018 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1801,13 +1801,13 @@ static void cleanup_mapped_device(struct mapped_device 
*md)
md->disk->private_data = NULL;
spin_unlock(&_minor_lock);
del_gendisk(md->disk);
-   put_disk(md->disk);
}
  
-	if (md->queue) {

+   if (md->queue)
dm_queue_destroy_keyslot_manager(md->queue);
-   blk_cleanup_queue(md->queue);
-   }
+
+   if (md->disk)
+   blk_cleanup_disk(md->disk);
  
  	cleanup_srcu_struct(&md->io_barrier);
  


Can't these conditionals be merged into a single 'if (md->disk)'?
Eg like:

if (md->disk) {
spin_lock(&_minor_lock);
md->disk->private_data = NULL;
spin_unlock(&_minor_lock);
del_gendisk(md->disk);
dm_queue_destroy_keyslot_manager(md->queue);
blk_cleanup_disk(md->queue);
}

We're now always allocating 'md->disk' and 'md->queue' together,
so how can we end up in a situation where one is set without the other?


@@ -1869,13 +1869,10 @@ static struct mapped_device *alloc_dev(int minor)
 * established. If request-based table is loaded: blk-mq will
 * override accordingly.
 */
-   md->queue = blk_alloc_queue(numa_node_id);
-   if (!md->queue)
-   goto bad;
-
-   md->disk = alloc_disk_node(1, md->numa_node_id);
+   md->disk = blk_alloc_disk(md->numa_node_id);
if (!md->disk)
goto bad;
+   md->queue = md->disk->queue;
  
  	init_waitqueue_head(&md->wait);

INIT_WORK(&md->work, dm_wq_work);
@@ -1888,6 +1885,7 @@ static struct mapped_device *alloc_dev(int minor)
  
  	md->disk->major = _major;

md->disk->first_minor = minor;
+   md->disk->minors = 1;
md->disk->fops = &dm_blk_dops;
md->disk->queue = md->queue;
md->disk->private_data = md;


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 12/26] bcache: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the bcache driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/md/bcache/super.c | 15 ---
  1 file changed, 4 insertions(+), 11 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 11/26] lightnvm: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the lightnvm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/lightnvm/core.c | 23 +--
  1 file changed, 5 insertions(+), 18 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 10/26] zram: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:51 AM, Christoph Hellwig wrote:

Convert the zram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/zram/zram_drv.c | 19 ---
  1 file changed, 4 insertions(+), 15 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 09/26] rsxx: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Convert the rsxx driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/rsxx/dev.c   | 39 +-
  drivers/block/rsxx/rsxx_priv.h |  1 -
  2 files changed, 15 insertions(+), 25 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 08/26] pktcdvd: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Convert the pktcdvd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/pktcdvd.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 07/26] drbd: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Convert the drbd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/drbd/drbd_main.c | 23 ---
  1 file changed, 8 insertions(+), 15 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 06/26] brd: convert to blk_alloc_disk/blk_cleanup_disk

2021-05-23 Thread Hannes Reinecke
_list);
del_gendisk(brd->brd_disk);
-   brd_free(brd);
+   blk_cleanup_disk(brd->brd_disk);
+   brd_free_pages(brd);
+   kfree(brd);
  }
  
  static inline void brd_check_and_reset_par(void)

@@ -485,7 +470,7 @@ static inline void brd_check_and_reset_par(void)
  static int __init brd_init(void)
  {
struct brd_device *brd, *next;
-   int i;
+   int err, i;
  
  	/*

 * brd module now has a feature to instantiate underlying device
@@ -511,22 +496,11 @@ static int __init brd_init(void)
  
  	mutex_lock(&brd_devices_mutex);

for (i = 0; i < rd_nr; i++) {
-   brd = brd_alloc(i);
-   if (!brd)
+   err = brd_alloc(i);
+   if (err)
goto out_free;
-   list_add_tail(&brd->brd_list, &brd_devices);
}
  
-	/* point of no return */

-
-   list_for_each_entry(brd, &brd_devices, brd_list) {
-   /*
-* associate with queue just before adding disk for
-* avoiding to mess up failure path
-*/
-   brd->brd_disk->queue = brd->brd_queue;
-   add_disk(brd->brd_disk);
-   }
mutex_unlock(&brd_devices_mutex);
  
  	pr_info("brd: module loaded\n");

@@ -535,15 +509,13 @@ static int __init brd_init(void)
  out_free:
debugfs_remove_recursive(brd_debugfs_dir);
  
-	list_for_each_entry_safe(brd, next, &brd_devices, brd_list) {

-   list_del(&brd->brd_list);
-   brd_free(brd);
-   }
+   list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
+   brd_del_one(brd);
mutex_unlock(&brd_devices_mutex);
unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
  
  	pr_info("brd: module NOT loaded !!!\n");

-   return -ENOMEM;
+   return err;
  }
  
  static void __exit brd_exit(void)



Other than that:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 05/26] block: add blk_alloc_disk and blk_cleanup_disk APIs

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Add two new APIs to allocate and free a gendisk including the
request_queue for use with BIO based drivers.  This is to avoid
boilerplate code in drivers.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 35 +++
  include/linux/genhd.h | 22 ++
  2 files changed, 57 insertions(+)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 04/26] block: add a flag to make put_disk on partially initalized disks safer

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Add a flag to indicate that __device_add_disk did grab a queue reference
so that disk_release only drops it if we actually had it.  This sort
out one of the major pitfals with partially initialized gendisk that

   pitfalls


a lot of drivers did get wrong or still do.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 7 +--
  include/linux/genhd.h | 1 +
  2 files changed, 6 insertions(+), 2 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 03/26] block: automatically enable GENHD_FL_EXT_DEVT

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated
without an explicit number of minors.  This is what all new block
drivers should do, so make sure it is the default without boilerplate
code.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c| 2 +-
  block/partitions/core.c  | 4 
  drivers/block/n64cart.c  | 2 +-
  drivers/lightnvm/core.c  | 1 -
  drivers/memstick/core/ms_block.c | 1 -
  drivers/nvdimm/blk.c | 1 -
  drivers/nvdimm/btt.c | 1 -
  drivers/nvdimm/pmem.c| 1 -
  drivers/nvme/host/core.c | 1 -
  drivers/nvme/host/multipath.c| 1 -
  10 files changed, 2 insertions(+), 13 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 02/26] block: move the DISK_MAX_PARTS sanity check into __device_add_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Keep this together with the first place that actually looks at
->minors and prepare for not passing a minors argument to
alloc_disk.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 01/26] block: refactor device number setup in __device_add_disk

2021-05-23 Thread Hannes Reinecke

On 5/21/21 7:50 AM, Christoph Hellwig wrote:

Untangle the mess around blk_alloc_devt by moving the check for
the used allocation scheme into the callers.

Signed-off-by: Christoph Hellwig 
---
  block/blk.h |  4 +-
  block/genhd.c   | 96 -
  block/partitions/core.c | 15 +--
  3 files changed, 49 insertions(+), 66 deletions(-)

... and also fixes an issue with GENHD_FL_UP remained set in an error 
path in __device_add_disk().


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-07 Thread Hannes Reinecke

On 12/4/20 3:26 PM, Brian King wrote:

On 12/2/20 11:27 AM, Tyrel Datwyler wrote:

On 12/2/20 7:14 AM, Brian King wrote:

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler 
---
  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
  2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..f1d677a7423d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
}
  
  	shost->transportt = ibmvfc_transport_template;

-   shost->can_queue = max_requests;
+   shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);


This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
queue depth.


Our max_requests is the total number commands allowed across all queues. From
what I understand is can_queue is the total number of commands in flight allowed
for each hw queue.

 /*
  * In scsi-mq mode, the number of hardware queues supported by the LLD.
  *
  * Note: it is assumed that each hardware queue has a queue depth of
  * can_queue. In other words, the total queue depth per host
  * is nr_hw_queues * can_queue. However, for when host_tagset is set,
  * the total queue depth is can_queue.
  */

We currently don't use the host wide shared tagset.


Ok. I missed that bit... In that case, since we allocate by default only 100
event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
we end up with only about 6 commands that can be outstanding per queue,
which is going to really hurt performance... I'd suggest bumping up
IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.


Before doing that I'd rather use the host-wide shared tagset.
Increasing the number of requests will increase the memory footprint of 
the driver (as each request will be statically allocated).


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 00/13] ibmvfc: initial MQ development

2020-12-02 Thread Hannes Reinecke

On 11/26/20 2:48 AM, Tyrel Datwyler wrote:

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.


Why do you still hold the hold lock during submission?
An initial check on the submission code path didn't reveal anything 
obvious, so it _should_ be possible to drop the host lock there.
Or at least move it into the submission function itself to avoid lock 
contention. Hmm?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [mainline] rcu stalls on CPU when unbinding mpt3sas driver

2017-12-15 Thread Hannes Reinecke
792d47820] [dfc72604] sas_rphy_remove+0x94/0xa0 
> [scsi_transport_sas]
> [c07792d47850] [dfc745bc] sas_port_delete+0x4c/0x238 
> [scsi_transport_sas]
> [c07792d478b0] [d00010e82990] 
> mpt3sas_transport_port_remove+0x2d0/0x310 [mpt3sas]
> [c07792d47950] [d00010e71ba0] _scsih_remove_device+0x100/0x2a0 
> [mpt3sas]
> [c07792d47a10] [d00010e774d4] 
> mpt3sas_device_remove_by_sas_address.part.44+0xb4/0x160 [mpt3sas]
> [c07792d47a70] [d00010e77614] _scsih_expander_node_remove+0x94/0x170 
> [mpt3sas]
> [c07792d47af0] [d00010e77a88] 
> mpt3sas_expander_remove.part.46+0x398/0xe70 [mpt3sas]
> [c07792d47b90] [c056a9c4] pci_device_remove+0x64/0x110
> [c07792d47bd0] [c060fa74] 
> device_release_driver_internal+0x1e4/0x2c0
> [c07792d47c20] [c060d260] unbind_store+0x110/0x140
> [c07792d47c70] [c060c2fc] drv_attr_store+0x3c/0x60
> [c07792d47c90] [c03a03c4] sysfs_kf_write+0x64/0xa0
> [c07792d47cb0] [c039f1b0] kernfs_fop_write+0x170/0x250
> [c07792d47d00] [c02fd370] __vfs_write+0x40/0x200
> [c07792d47d90] [c02fd748] vfs_write+0xc8/0x240
> [c07792d47de0] [c02fda80] SyS_write+0x60/0x110
> [c07792d47e30] [c000b8e0] system_call+0x58/0x6c
> 
This is probably the same issue as discussed in threads "[PATCH] scsi:
check for device state in __scsi_remove_target()" and "[PATCH] scsi: fix
race condition when removing target".

Please test with a patch from there.

Cheers,

Hannes

-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 3/3] cxlflash: Update debug prints in reset handlers

2017-06-28 Thread Hannes Reinecke
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote:
> The device and host reset handler contain debug prints to help
> identify the entities being reset. Today these reset handlers
> are based on a SCSI EH design that uses a SCSI command reference
> as a means of identifying the target entity. As such, the debug
> trace includes the SCSI command pointer and associated CDB. This
> is not necessary as the SCSI command is simply the messenger in
> these scenarios.
> 
> Refactor the debug prints in the host and reset handlers to only
> present information that is applicable given the function scope.
> 
> Signed-off-by: Matthew R. Ochs 
> ---
>  drivers/scsi/cxlflash/main.c | 18 +++---
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/3] cxlflash: Update send_tmf() parameters

2017-06-28 Thread Hannes Reinecke
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote:
> The current send_tmf() implementation is based on the caller providing
> a SCSI command reference. In reality all that is needed is a SCSI device
> reference as the routine uses a private command.
> 
> Refactor send_tmf() to pass the private adapter configuration reference
> and a SCSI device reference. As a nice side effect, this will ease the
> burden of converting caller routines to be based solely off of a SCSI
> device reference.
> 
> Signed-off-by: Matthew R. Ochs 
> ---
>  drivers/scsi/cxlflash/main.c | 27 +--
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 1/3] cxlflash: Avoid double free of character device

2017-06-28 Thread Hannes Reinecke
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote:
> The device_unregister() service used when cleaning up the character
> device is already responsible for the internal state associated with
> the device upon successful creation. As the cxlflash driver does not
> obtain a second reference to the character device, the explicit call
> to put_device() is not required and can lead to an inconsistent sysfs
> among other issues as the reference is no longer valid after the first
> put_device() is performed.
> 
> Remove the unnecessary put_device() to remedy this issue.
> 
> Fixes: a834a36b57d9 ("scsi: cxlflash: Create character device to provide host 
> management interface")
> Signed-off-by: Matthew R. Ochs > ---
>  drivers/scsi/cxlflash/main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCH] powerpc: conditionally compile platform-specific serial drivers

2017-06-27 Thread Hannes Reinecke
mpsc.c and mpc52xx-psc.c are platform-specific serial drivers, and
should be compiled for the respective platforms only.

Signed-off-by: Hannes Reinecke 
---
 arch/powerpc/boot/Makefile | 7 ---
 arch/powerpc/boot/serial.c | 4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index e82f333..1a4609c 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -99,14 +99,15 @@ src-wlib-y := string.S crt0.S crtsavres.S stdio.c 
decompress.c main.c \
$(libfdt) libfdt-wrapper.c \
ns16550.c serial.c simple_alloc.c div64.S util.S \
elf_util.c $(zlib-y) devtree.c stdlib.c \
-   oflib.c ofconsole.c cuboot.c mpsc.c cpm-serial.c \
-   uartlite.c mpc52xx-psc.c opal.c
+   oflib.c ofconsole.c cuboot.c cpm-serial.c \
+   uartlite.c opal.c
+src-wlib-$(CONFIG_PPC_MPC52XX) += mpc52xx-psc.c
 src-wlib-$(CONFIG_PPC64_BOOT_WRAPPER) +=  opal-calls.S
 src-wlib-$(CONFIG_40x) += 4xx.c planetcore.c
 src-wlib-$(CONFIG_44x) += 4xx.c ebony.c bamboo.c
 src-wlib-$(CONFIG_8xx) += mpc8xx.c planetcore.c fsl-soc.c
 src-wlib-$(CONFIG_PPC_82xx) += pq2.c fsl-soc.c planetcore.c
-src-wlib-$(CONFIG_EMBEDDED6xx) += mv64x60.c mv64x60_i2c.c ugecon.c fsl-soc.c
+src-wlib-$(CONFIG_EMBEDDED6xx) += mpsc.c mv64x60.c mv64x60_i2c.c ugecon.c 
fsl-soc.c
 
 src-plat-y := of.c epapr.c
 src-plat-$(CONFIG_40x) += fixed-head.S ep405.c cuboot-hotfoot.c \
diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index e04c1e4..7b5c02b 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -120,15 +120,19 @@ int serial_console_init(void)
if (dt_is_compatible(devp, "ns16550") ||
dt_is_compatible(devp, "pnpPNP,501"))
rc = ns16550_console_init(devp, &serial_cd);
+#ifdef CONFIG_EMBEDDED6xx
else if (dt_is_compatible(devp, "marvell,mv64360-mpsc"))
rc = mpsc_console_init(devp, &serial_cd);
+#endif
else if (dt_is_compatible(devp, "fsl,cpm1-scc-uart") ||
 dt_is_compatible(devp, "fsl,cpm1-smc-uart") ||
 dt_is_compatible(devp, "fsl,cpm2-scc-uart") ||
 dt_is_compatible(devp, "fsl,cpm2-smc-uart"))
rc = cpm_console_init(devp, &serial_cd);
+#ifdef CONFIG_PPC_MPC52XX
else if (dt_is_compatible(devp, "fsl,mpc5200-psc-uart"))
rc = mpc5200_psc_console_init(devp, &serial_cd);
+#endif
else if (dt_is_compatible(devp, "xlnx,opb-uartlite-1.00.b") ||
 dt_is_compatible(devp, "xlnx,xps-uartlite-1.00.a"))
rc = uartlite_console_init(devp, &serial_cd);
-- 
1.8.5.6



Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly

2016-10-28 Thread Hannes Reinecke
On 10/28/2016 11:53 AM, Steffen Maier wrote:
> 
> 
> On 10/13/2016 06:24 PM, Johannes Thumshirn wrote:
>> On Thu, Oct 13, 2016 at 05:15:25PM +0200, Steffen Maier wrote:
>>> I'm puzzled.
>>>
>>> $ git bisect start fc_bsg master
> 
>>>> 3087864ce3d7282f59021245d8a5f83ef1caef18 is the first bad commit
>>>> commit 3087864ce3d7282f59021245d8a5f83ef1caef18
>>>> Author: Johannes Thumshirn 
>>>> Date:   Wed Oct 12 15:06:28 2016 +0200
>>>>
>>>> scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly
>>>>
>>>> Don't use fc_bsg_job::request and fc_bsg_job::reply directly,
>>>> but use
>>>> helper variables bsg_request and bsg_reply. This will be
>>>> helpfull  when
>>>> transitioning to bsg-lib.
>>>>
>>>> Signed-off-by: Johannes Thumshirn 
>>>>
>>>> :04 04 140c4b6829d5cfaec4079716e0795f63f8bc3bd2
>>>> 0d9fe225615679550be91fbd9f84c09ab1e280fc Mdrivers
>>>
>>> From there (on the reverse bisect path) I get the following Oops,
>>> except for the full patch set having another stack trace as in my
>>> previous
>>> mail (dying in zfcp code).
>>>
>>
>> [...]
>>
>>>
>>>> @@ -3937,6 +3944,7 @@ fc_bsg_request_handler(struct request_queue
>>>> *q, struct Scsi_Host *shost,
>>>>  struct request *req;
>>>>  struct fc_bsg_job *job;
>>>>  enum fc_dispatch_result ret;
>>>> +struct fc_bsg_reply *bsg_reply;
>>>>
>>>>  if (!get_device(dev))
>>>>  return;
>>>> @@ -3973,8 +3981,9 @@ fc_bsg_request_handler(struct request_queue
>>>> *q, struct Scsi_Host *shost,
>>>>  /* check if we have the msgcode value at least */
>>>>  if (job->request_len < sizeof(uint32_t)) {
>>>>  BUG_ON(job->reply_len < sizeof(uint32_t));
>>>> -job->reply->reply_payload_rcv_len = 0;
>>>> -job->reply->result = -ENOMSG;
>>>> +bsg_reply = job->reply;
>>>> +bsg_reply->reply_payload_rcv_len = 0;
>>>> +bsg_reply->result = -ENOMSG;
> 
> Compiler optimization re-ordered above two lines and the first pointer
> derefence is bsg_reply->result [field offset 0] where bsg_reply is NULL.
> The assignment tries to write to memory at address NULL causing the
> kernel page fault.
> 
I spoke to our compiler people, and they strongly believed this not to
be the case. Or, put it the other way round, if such a thing would
happen it would be a compiler issue.

Have you checked the compiler output?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 13/16] scsi: fc: use bsg_job_done

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> fc_bsg_jobdone() and bsg_job_done() are 1:1 copies now so use the bsg-lib one
> instead of the FC private implementation.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  |  2 +-
>  drivers/scsi/bfa/bfad_bsg.c  |  4 ++--
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  2 +-
>  drivers/scsi/libfc/fc_lport.c|  4 ++--
>  drivers/scsi/lpfc/lpfc_bsg.c | 38 +-
>  drivers/scsi/qla2xxx/qla_bsg.c   | 44 
> 
>  drivers/scsi/scsi_transport_fc.c | 41 +++--
>  include/scsi/scsi_transport_fc.h |  2 --
>  8 files changed, 50 insertions(+), 87 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 10/16] scsi: change FC drivers to use 'struct bsg_job'

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Change FC drivers to use 'struct bsg_job' from bsg-lib.h instead of 'struct
> fc_bsg_job' from scsi_transport_fc.h and remove 'struct fc_bsg_job'.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_ext.h |  4 +--
>  drivers/s390/scsi/zfcp_fc.c  | 15 
>  drivers/scsi/bfa/bfad_bsg.c  | 10 +++---
>  drivers/scsi/bfa/bfad_im.h   |  4 +--
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  9 ++---
>  drivers/scsi/libfc/fc_lport.c| 10 +++---
>  drivers/scsi/lpfc/lpfc_bsg.c | 74 
> 
>  drivers/scsi/lpfc/lpfc_crtn.h|  4 +--
>  drivers/scsi/qla2xxx/qla_bsg.c   | 61 +
>  drivers/scsi/qla2xxx/qla_def.h   |  2 +-
>  drivers/scsi/qla2xxx/qla_gbl.h   |  4 +--
>  drivers/scsi/qla2xxx/qla_iocb.c  |  8 ++---
>  drivers/scsi/qla2xxx/qla_isr.c   |  6 ++--
>  drivers/scsi/qla2xxx/qla_mr.c|  5 +--
>  drivers/scsi/scsi_transport_fc.c | 20 +--
>  include/scsi/libfc.h |  2 +-
>  include/scsi/scsi_transport_fc.h | 63 ++----
>  17 files changed, 138 insertions(+), 163 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 06/16] scsi: fc: provide fc_bsg_to_rport() helper

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Provide fc_bsg_to_rport() helper that will become handy when we're moving
> from struct fc_bsg_job to a plain struct bsg_job. Also move all LLDDs to use
> the new helper.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  | 5 +++--
>  drivers/scsi/ibmvscsi/ibmvfc.c   | 2 +-
>  drivers/scsi/libfc/fc_lport.c| 4 ++--
>  drivers/scsi/lpfc/lpfc_bsg.c | 4 ++--
>  drivers/scsi/qla2xxx/qla_bsg.c   | 4 ++--
>  drivers/scsi/scsi_transport_fc.c | 3 ++-
>  include/scsi/scsi_transport_fc.h | 5 +
>  7 files changed, 17 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 05/16] scsi: fc: provide fc_bsg_to_shost() helper

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Provide fc_bsg_to_shost() helper that will become handy when we're moving from
> struct fc_bsg_job to a plain struct bsg_job. Also use this little helper in
> the LLDDs.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  |  4 +--
>  drivers/scsi/bfa/bfad_bsg.c  |  6 ++---
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  4 +--
>  drivers/scsi/libfc/fc_lport.c|  2 +-
>  drivers/scsi/lpfc/lpfc_bsg.c | 32 
>  drivers/scsi/qla2xxx/qla_bsg.c   | 54 
> 
>  drivers/scsi/scsi_transport_fc.c |  2 +-
>  include/scsi/scsi_transport_fc.h |  5 
>  8 files changed, 56 insertions(+), 53 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 04/16] scsi: Unify interfaces of fc_bsg_jobdone and bsg_job_done

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Unify the interfaces of fc_bsg_jobdone and bsg_job_done. This will reduce the
> diff when moving from 'struct fc_bsg_job' to a plain 'struct bsg_job' later
> on.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  |  2 +-
>  drivers/scsi/bfa/bfad_bsg.c  |  6 ++--
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  3 +-
>  drivers/scsi/libfc/fc_lport.c|  6 ++--
>  drivers/scsi/lpfc/lpfc_bsg.c | 68 
> +++-
>  drivers/scsi/qla2xxx/qla_bsg.c   | 66 +-
>  drivers/scsi/scsi_transport_fc.c | 22 +++--
>  include/scsi/scsi_transport_fc.h |  3 +-
>  8 files changed, 116 insertions(+), 60 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 03/16] scsi: fc: Export fc_bsg_jobdone and use it in FC drivers

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Export fc_bsg_jobdone so drivers can use it directly instead of doing
> the round-trip via struct fc_bsg_job::job_done() and use it in the LLDDs.
> 
> As we've converted all LLDDs over to use fc_bsg_jobdone() directly,
> we can remove the function pointer from struct fc_bsg_job as well.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  |  2 +-
>  drivers/scsi/bfa/bfad_bsg.c  |  4 ++--
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  2 +-
>  drivers/scsi/libfc/fc_lport.c|  4 ++--
>  drivers/scsi/lpfc/lpfc_bsg.c | 38 +-
>  drivers/scsi/qla2xxx/qla_bsg.c   | 44 
> 
>  drivers/scsi/scsi_transport_fc.c |  5 ++---
>  include/scsi/scsi_transport_fc.h |  2 +-
>  8 files changed, 50 insertions(+), 51 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly

2016-10-13 Thread Hannes Reinecke
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote:
> Don't use fc_bsg_job::request and fc_bsg_job::reply directly, but use
> helper variables bsg_request and bsg_reply. This will be helpfull  when
> transitioning to bsg-lib.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/s390/scsi/zfcp_fc.c  |   9 +-
>  drivers/scsi/bfa/bfad_bsg.c  |  40 +++---
>  drivers/scsi/ibmvscsi/ibmvfc.c   |  22 ++--
>  drivers/scsi/libfc/fc_lport.c|  23 ++--
>  drivers/scsi/lpfc/lpfc_bsg.c | 194 +---
>  drivers/scsi/qla2xxx/qla_bsg.c   | 264 
> ++-
>  drivers/scsi/qla2xxx/qla_iocb.c  |   5 +-
>  drivers/scsi/qla2xxx/qla_isr.c   |  46 ---
>  drivers/scsi/qla2xxx/qla_mr.c|  10 +-
>  drivers/scsi/scsi_transport_fc.c |  37 +++---
>  10 files changed, 387 insertions(+), 263 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke

On 11/25/2015 06:56 PM, Jens Axboe wrote:

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above
making it any different? In general, NOMERGE being set or not should not
make a difference. It's only a hint that we need not check further if we
should be merging on this request, since we already tried it once, found
we'd exceed various limits, then set NOMERGE to reflect that.


The problem is that NOMERGE does too much, as it inhibits _any_ merging.

Unfortunately, the req->nr_phys_segments value is evaluated in the final
_driver_ context _after_ the merging happend; cf 
scsi_lib.c:scsi_init_sgtable().

As nr_phys_segments is inherited from the original request (and never
recalculated with the new request queue limits) the following 
blk_rq_map_sg() call might end up at a different calculation, especially

after retrying a request on another path.

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke

On 11/25/2015 07:01 PM, Mike Snitzer wrote:

On Wed, Nov 25 2015 at  4:04am -0500,
Hannes Reinecke  wrote:


On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.


How did you arrive at that?  Do you have a reproducer now?


Not a reproducer, but several dumps for analysis.


Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag,


NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.


Yes.


which is evaluated via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()


blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.


If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?


That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be 
different from the original (merged) queue limits of the multipath

request queue.


BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()

Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
pointless, too, so we might as well remove the entire if-clause.



Seems prudent to make that change now to be clear that this code is only
used by cloned requests.


Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)

Cheers,

Hannes

--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke
On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
> 
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
> 
>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
> 
> That might not fix it if it is a problem with the merge code, though.
> 
And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
  ->blk_insert_cloned_request()
->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
return 0;

if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Hannes Reinecke

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.


Oh, I got plenty of them :-(



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


Of course not. But it'll be a band-aid to keep the customer happy.

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Hannes Reinecke
On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
> 
> We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.)  I was suspicious of the patch:
> 
> bio: modify __bio_add_page() to accept pages that don't start a new segment
> 
> But we put some debugging code in and didn't hit it.  We haven't
> found the problem yet, either, though.  We're still looking.
> 
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
> 
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
> 
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-19 Thread Hannes Reinecke
On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Well, I have been fighting a similar issue for several months now,
albeit with multipath enabled. Haven't had much progress with this,
sadly.
Seeing that this is our distro kernel it might or might not be
related; however, as the symptoms are identical there still is a
chance that this is actually a generic block-layer problem.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Regression in 3.15 on POWER8 with multipath SCSI

2014-06-30 Thread Hannes Reinecke

On 06/30/2014 11:28 PM, Paul Mackerras wrote:

On Mon, Jun 30, 2014 at 01:35:20PM +0200, Hannes Reinecke wrote:

On 06/30/2014 01:02 PM, Paul Mackerras wrote:

[ .. ]


No, I'm not using LVM, and in fact I deleted all the physical volumes
that were on any of the disks (they were installations of other
distros), so there are no physical or logical volumes anywhere on any
disk.  I haven't tried disabling the LVM service completely, though.
What would it mean if disabling the LVM service made a difference?


Yes. LVM integration with systemd is a science unto itself.
I'm reasonably confident with multipath, but not LVM.
Plus the fact the the LVM service apparently is waiting for something sort
of points into that direction.

So please do disable the lvm service.


I disabled the LVM service, and it's still bad.  Unmodified 3.15
booted successfully in only 18 out of 50 attempts with LVM disabled.

So it's not LVM.  In any case LVM was fine with a 3.14 kernel.


Right, that was just a cross-check to eliminate any variables.

I'll be checking here at my end.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Regression in 3.15 on POWER8 with multipath SCSI

2014-06-30 Thread Hannes Reinecke

On 06/30/2014 01:02 PM, Paul Mackerras wrote:

On Mon, Jun 30, 2014 at 12:52:29PM +0200, Hannes Reinecke wrote:

On 06/30/2014 12:30 PM, Paul Mackerras wrote:

I have a machine on which 3.15 usually fails to boot, and 3.14 boots
every time.  The machine is a POWER8 2-socket server with 20 cores
(thus 160 CPUs), 128GB of RAM, and 7 SCSI disks connected via a
hardware-RAID-capable adapter which appears as two IPR controllers
which are both connected to each disk.  I am booting from a disk that
has Fedora 20 installed on it.

After over two weeks of bisections, I can finally point to the commits
that cause the problems.  The culprits are:

3e9f1be1 dm mpath: remove process_queued_ios()
e8099177 dm mpath: push back requests instead of queueing
bcccff93 kobject: don't block for each kobject_uevent

The interesting thing is that neither e8099177 nor bcccff93 cause
failures on their own, but with both commits in there are failures
where the system will fail to find /home on some occasions.

With 3e9f1be1 included, the system appears to be prone to a deadlock
condition which typically causes the boot process to hang with this
message showing:

A start job is running for Monitoring of LVM2 mirror...rogress polling

(with a [*** ] thing before it where the asterisks move back and
forth).

If I revert 63d832c3 ("dm mpath: really fix lockdep warning") ,
4cdd2ad7 ("dm mpath: fix lock order inconsistency in
multipath_ioctl"), 3e9f1be1 and bcccff93, in that order, I get a
kernel that will boot every time.  The first two are later commits
that fix some problems with 3e9f1be1 (though not the problems I am
seeing).

Can anyone see any reason why e8099177 and bcccff93 would interfere
with each other?


It might be running afoul with the 'cookie' mechanism.
Device-mapper is using inserting a 'cookie' with the ioctl, and listens to
any event containing the cookie to ensure udev has finished processing that
device and hence the device node is accessible. Added to this is the problem
that we don't have any good means of detecting any changes to device-mapper
devices.

EG look at this sequence of events:

add dm-1
remove dm-1
add dm-1

Originally udev would pick up the event, read the details from sysfs, and
return control to the kernel.
With bcccff93 udev will _not_have a chance to read the details
from sysfs for 'dm-1', as anything read from sysfs relating to 'dm-1' might
infact refer to the _second_ 'add' event, which might be a totally different
device.
As far as I know udev doesn't have any mechanism to drop events,
so it'll always process all events. Assuming that the sysfs attributes it
reads _do_ relate to that event. If they don't things become interesting ...

(Actually, this issue was always present, especially with multipathing.
multipath occasionally can become sluggish when processing events, so the
same might happen with it. We've tried to work around this, but never found
a fool-proof way of doing so).

Adding Kay as he might have some more insight here.

Another thing:
Do you run LVM on top of multipathing?
If so, could you setup your system with _not_ using LVM and disabling the
LVM service?


No, I'm not using LVM, and in fact I deleted all the physical volumes
that were on any of the disks (they were installations of other
distros), so there are no physical or logical volumes anywhere on any
disk.  I haven't tried disabling the LVM service completely, though.
What would it mean if disabling the LVM service made a difference?


Yes. LVM integration with systemd is a science unto itself.
I'm reasonably confident with multipath, but not LVM.
Plus the fact the the LVM service apparently is waiting for 
something sort of points into that direction.


So please do disable the lvm service.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Regression in 3.15 on POWER8 with multipath SCSI

2014-06-30 Thread Hannes Reinecke

On 06/30/2014 12:30 PM, Paul Mackerras wrote:

I have a machine on which 3.15 usually fails to boot, and 3.14 boots
every time.  The machine is a POWER8 2-socket server with 20 cores
(thus 160 CPUs), 128GB of RAM, and 7 SCSI disks connected via a
hardware-RAID-capable adapter which appears as two IPR controllers
which are both connected to each disk.  I am booting from a disk that
has Fedora 20 installed on it.

After over two weeks of bisections, I can finally point to the commits
that cause the problems.  The culprits are:

3e9f1be1 dm mpath: remove process_queued_ios()
e8099177 dm mpath: push back requests instead of queueing
bcccff93 kobject: don't block for each kobject_uevent

The interesting thing is that neither e8099177 nor bcccff93 cause
failures on their own, but with both commits in there are failures
where the system will fail to find /home on some occasions.

With 3e9f1be1 included, the system appears to be prone to a deadlock
condition which typically causes the boot process to hang with this
message showing:

A start job is running for Monitoring of LVM2 mirror...rogress polling

(with a [*** ] thing before it where the asterisks move back and
forth).

If I revert 63d832c3 ("dm mpath: really fix lockdep warning") ,
4cdd2ad7 ("dm mpath: fix lock order inconsistency in
multipath_ioctl"), 3e9f1be1 and bcccff93, in that order, I get a
kernel that will boot every time.  The first two are later commits
that fix some problems with 3e9f1be1 (though not the problems I am
seeing).

Can anyone see any reason why e8099177 and bcccff93 would interfere
with each other?


It might be running afoul with the 'cookie' mechanism.
Device-mapper is using inserting a 'cookie' with the ioctl, and 
listens to any event containing the cookie to ensure udev has 
finished processing that device and hence the device node is 
accessible. Added to this is the problem that we don't have any good 
means of detecting any changes to device-mapper devices.


EG look at this sequence of events:

add dm-1
remove dm-1
add dm-1

Originally udev would pick up the event, read the details from 
sysfs, and return control to the kernel.

With bcccff93 udev will _not_have a chance to read the details
from sysfs for 'dm-1', as anything read from sysfs relating to 
'dm-1' might infact refer to the _second_ 'add' event, which might 
be a totally different device.

As far as I know udev doesn't have any mechanism to drop events,
so it'll always process all events. Assuming that the sysfs 
attributes it reads _do_ relate to that event. If they don't things 
become interesting ...


(Actually, this issue was always present, especially with 
multipathing. multipath occasionally can become sluggish when 
processing events, so the same might happen with it. We've tried to 
work around this, but never found a fool-proof way of doing so).


Adding Kay as he might have some more insight here.

Another thing:
Do you run LVM on top of multipathing?
If so, could you setup your system with _not_ using LVM and 
disabling the LVM service?


Reasoning here is that multipath should not be that susceptible to 
changes here than LVM2 is (don't nail me on this, I not _that_ into 
LVM2 details).


And as the system is stuck while waiting for LVM it might indeed be 
an side-effect when running LVM on top of multipathing.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev