Re: [PATCH 26/26] block: move the bounce flag into the features field
On 6/17/24 08:04, Christoph Hellwig wrote: Move the bounce flag into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-settings.c| 1 - block/blk.h | 2 +- drivers/scsi/scsi_lib.c | 2 +- include/linux/blkdev.h | 6 -- 4 files changed, 6 insertions(+), 5 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 25/26] block: move the skip_tagset_quiesce flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the skip_tagset_quiesce flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c | 1 - drivers/nvme/host/core.c | 8 +--- include/linux/blkdev.h | 6 -- 3 files changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 24/26] block: move the pci_p2pdma flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the pci_p2pdma flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c | 1 - drivers/nvme/host/core.c | 8 +++- include/linux/blkdev.h | 7 --- 3 files changed, 7 insertions(+), 9 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 23/26] block: move the zone_resetall flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the zone_resetall flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c | 1 - drivers/block/null_blk/zoned.c | 3 +-- drivers/block/ublk_drv.c | 4 +--- drivers/block/virtio_blk.c | 3 +-- drivers/nvme/host/zns.c| 3 +-- drivers/scsi/sd_zbc.c | 5 + include/linux/blkdev.h | 6 -- 7 files changed, 9 insertions(+), 16 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 22/26] block: move the zoned flag into the features field
On 6/17/24 08:04, Christoph Hellwig wrote: Move the zoned flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-settings.c | 5 ++--- drivers/block/null_blk/zoned.c | 2 +- drivers/block/ublk_drv.c | 2 +- drivers/block/virtio_blk.c | 5 +++-- drivers/md/dm-table.c | 11 ++- drivers/md/dm-zone.c | 2 +- drivers/md/dm-zoned-target.c | 2 +- drivers/nvme/host/zns.c| 2 +- drivers/scsi/sd_zbc.c | 2 +- include/linux/blkdev.h | 9 ++--- 10 files changed, 23 insertions(+), 19 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 21/26] block: move the poll flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the poll flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Stacking drivers are simplified in that they now can simply set the flag, and blk_stack_limits will clear it when the features is not supported by any of the underlying devices. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-core.c | 5 ++-- block/blk-mq-debugfs.c| 1 - block/blk-mq.c| 31 +++- block/blk-settings.c | 10 --- block/blk-sysfs.c | 4 +-- drivers/md/dm-table.c | 54 +-- drivers/nvme/host/multipath.c | 12 +--- include/linux/blkdev.h| 4 ++- 8 files changed, 45 insertions(+), 76 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 20/26] block: move the dax flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the dax flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c | 1 - drivers/md/dm-table.c| 4 ++-- drivers/nvdimm/pmem.c| 7 ++- drivers/s390/block/dcssblk.c | 2 +- include/linux/blkdev.h | 6 -- 5 files changed, 9 insertions(+), 11 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 19/26] block: move the nowait flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the nowait flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Stacking drivers are simplified in that they now can simply set the flag, and blk_stack_limits will clear it when the features is not supported by any of the underlying devices. Signed-off-by: Christoph Hellwig --- block/blk-mq-debugfs.c| 1 - block/blk-mq.c| 2 +- block/blk-settings.c | 9 + drivers/block/brd.c | 4 ++-- drivers/md/dm-table.c | 18 +++--- drivers/md/md.c | 18 +- drivers/nvme/host/multipath.c | 3 +-- include/linux/blkdev.h| 9 + 8 files changed, 22 insertions(+), 42 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 18/26] block: move the synchronous flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the synchronous flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c| 1 - drivers/block/brd.c | 2 +- drivers/block/zram/zram_drv.c | 4 ++-- drivers/nvdimm/btt.c | 3 +-- drivers/nvdimm/pmem.c | 4 ++-- include/linux/blkdev.h| 7 --- 6 files changed, 10 insertions(+), 11 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 17/26] block: move the stable_writes flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the stable_writes flag into the queue_limits feature field so that it can be set atomically with the queue frozen. The flag is now inherited by blk_stack_limits, which greatly simplifies the code in dm, and fixed md which previously did not pass on the flag set on lower devices. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c | 1 - block/blk-sysfs.c | 29 + drivers/block/drbd/drbd_main.c | 5 ++--- drivers/block/rbd.c| 9 +++-- drivers/block/zram/zram_drv.c | 2 +- drivers/md/dm-table.c | 19 --- drivers/md/raid5.c | 6 -- drivers/mmc/core/queue.c | 5 +++-- drivers/nvme/host/core.c | 9 + drivers/nvme/host/multipath.c | 4 drivers/scsi/iscsi_tcp.c | 8 include/linux/blkdev.h | 9 ++--- 12 files changed, 29 insertions(+), 77 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 16/26] block: move the io_stat flag setting to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the io_stat flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Simplify md and dm to set the flag unconditionally instead of avoiding setting a simple flag for cases where it already is set by other means, which is a bit pointless. Signed-off-by: Christoph Hellwig --- block/blk-mq-debugfs.c| 1 - block/blk-mq.c| 6 +- block/blk-sysfs.c | 2 +- drivers/md/dm-table.c | 12 +--- drivers/md/dm.c | 13 +++-- drivers/md/md.c | 5 ++--- drivers/nvme/host/multipath.c | 2 +- include/linux/blkdev.h| 9 + 8 files changed, 26 insertions(+), 24 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 15/26] block: move the add_random flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the add_random flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Note that this also removes code from dm to clear the flag based on the underlying devices, which can't be reached as dm devices will always start out without the flag set. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- block/blk-mq-debugfs.c| 1 - block/blk-sysfs.c | 6 +++--- drivers/block/mtip32xx/mtip32xx.c | 1 - drivers/md/dm-table.c | 18 -- drivers/mmc/core/queue.c | 2 -- drivers/mtd/mtd_blkdevs.c | 3 --- drivers/s390/block/scm_blk.c | 4 drivers/scsi/scsi_lib.c | 3 +-- drivers/scsi/sd.c | 11 +++ include/linux/blkdev.h| 5 +++-- 10 files changed, 10 insertions(+), 44 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 14/26] block: move the nonrot flag to queue_limits
On 6/17/24 08:04, Christoph Hellwig wrote: Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal --- Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 13/26] block: move cache control settings out of queue->flags
On 6/17/24 08:04, Christoph Hellwig wrote: Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig Acked-by: Ulf Hansson [mmc] --- Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 03/26] sd: move zone limits setup out of sd_read_block_characteristics
On 6/17/24 08:04, Christoph Hellwig wrote: Move a bit of code that sets up the zone flag and the write granularity into sd_zbc_read_zones to be with the rest of the zoned limits. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 21 + drivers/scsi/sd_zbc.c | 9 + 2 files changed, 10 insertions(+), 20 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 01/26] xen-blkfront: don't disable cache flushes when they fail
On 6/17/24 08:04, Christoph Hellwig wrote: blkfront always had a robust negotiation protocol for detecting a write cache. Stop simply disabling cache flushes in the block layer as the flags handling is moving to the atomic queue limits API that needs user context to freeze the queue for that. Instead handle the case of the feature flags cleared inside of blkfront. This removes old debug code to check for such a mismatch which was previously impossible to hit, including the check for passthrough requests that blkfront never used to start with. Signed-off-by: Christoph Hellwig --- drivers/block/xen-blkfront.c | 44 +++- 1 file changed, 23 insertions(+), 21 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 13/26] block: move cache control settings out of queue->flags
(rq->cmd_flags & REQ_FUA)) + if ((rq->cmd_flags & REQ_FUA) && !supports_fua) policy |= REQ_FSEQ_POSTFLUSH; } @@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq) * REQ_PREFLUSH and FUA for the driver. */ rq->cmd_flags &= ~REQ_PREFLUSH; - if (!(fflags & (1UL << QUEUE_FLAG_FUA))) + if (!supports_fua) rq->cmd_flags &= ~REQ_FUA; /* diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 770c0c2b72faaa..e8b9db7c30c455 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = { QUEUE_FLAG_NAME(INIT_DONE), QUEUE_FLAG_NAME(STABLE_WRITES), QUEUE_FLAG_NAME(POLL), - QUEUE_FLAG_NAME(WC), - QUEUE_FLAG_NAME(FUA), QUEUE_FLAG_NAME(DAX), QUEUE_FLAG_NAME(STATS), QUEUE_FLAG_NAME(REGISTERED), diff --git a/block/blk-settings.c b/block/blk-settings.c index f11c8676eb4c67..536ee202fcdccb 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim) lim->misaligned = 0; } + if (!(lim->features & BLK_FEAT_WRITE_CACHE)) + lim->features &= ~BLK_FEAT_FUA; + err = blk_validate_integrity_limits(lim); if (err) return err; @@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, { unsigned int top, bottom, alignment, ret = 0; + t->features |= (b->features & BLK_FEAT_INHERIT_MASK); + t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_user_sectors = min_not_zero(t->max_user_sectors, b->max_user_sectors); @@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth) } EXPORT_SYMBOL(blk_set_queue_depth); -/** - * blk_queue_write_cache - configure queue's write cache - * @q: the request queue for the device - * @wc:write back cache on or off - * @fua: device supports FUA writes, if true - * - * Tell the block layer about the write cache of @q. - */ -void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua) -{ - if (wc) { - blk_queue_flag_set(QUEUE_FLAG_HW_WC, q); - blk_queue_flag_set(QUEUE_FLAG_WC, q); - } else { - blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q); - blk_queue_flag_clear(QUEUE_FLAG_WC, q); - } - if (fua) - blk_queue_flag_set(QUEUE_FLAG_FUA, q); - else - blk_queue_flag_clear(QUEUE_FLAG_FUA, q); -} -EXPORT_SYMBOL_GPL(blk_queue_write_cache); - int bdev_alignment_offset(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 5c787965b7d09e..4f524c1d5e08bd 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page, static ssize_t queue_wc_show(struct request_queue *q, char *page) { - if (test_bit(QUEUE_FLAG_WC, &q->queue_flags)) - return sprintf(page, "write back\n"); - - return sprintf(page, "write through\n"); + if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED) Where is the difference between 'flags' and 'features'? Ie why is is named BLK_FEAT_FUA but BLK_FLAGS_WRITE_CACHE_DISABLED? And if the feature is the existence of a capability, and the flag is the setting of that capability, can you make it clear in the documentation? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 12/26] block: remove blk_flush_policy
On 6/11/24 07:19, Christoph Hellwig wrote: Fold blk_flush_policy into the only caller to prepare for pending changes to it. Signed-off-by: Christoph Hellwig --- block/blk-flush.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 11/26] block: freeze the queue in queue_attr_store
On 6/11/24 07:19, Christoph Hellwig wrote: queue_attr_store updates attributes used to control generating I/O, and can cause malformed bios if changed with I/O in flight. Freeze the queue in common code instead of adding it to almost every attribute. Signed-off-by: Christoph Hellwig --- block/blk-mq.c| 5 +++-- block/blk-sysfs.c | 9 ++--- 2 files changed, 5 insertions(+), 9 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
On 6/11/24 07:19, Christoph Hellwig wrote: blkfront always had a robust negotiation protocol for detecting a write cache. Stop simply disabling cache flushes when they fail as that is a grave error. Signed-off-by: Christoph Hellwig --- drivers/block/xen-blkfront.c | 29 + 1 file changed, 9 insertions(+), 20 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
On 6/11/24 07:19, Christoph Hellwig wrote: Move setting the cache control flags in nbd in preparation for moving these flags into the queue_limits structure. Signed-off-by: Christoph Hellwig --- drivers/block/nbd.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
On 6/11/24 07:19, Christoph Hellwig wrote: virtblk_update_cache_mode boils down to a single call to blk_queue_write_cache. Remove it in preparation for moving the cache control flags into the queue_limits. Signed-off-by: Christoph Hellwig --- drivers/block/virtio_blk.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits
On 6/11/24 07:19, Christoph Hellwig wrote: This prepares for moving the rotational flag into the queue_limits and also fixes it for the case where the loop device is backed by a block device. Signed-off-by: Christoph Hellwig --- drivers/block/loop.c | 23 --- 1 file changed, 4 insertions(+), 19 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 01/26] sd: fix sd_is_zoned
On 6/11/24 07:19, Christoph Hellwig wrote: Since commit 7437bb73f087 ("block: remove support for the host aware zone model"), only ZBC devices expose a zoned access model. sd_is_zoned is used to check for that and thus return false for host aware devices. Fixes: 7437bb73f087 ("block: remove support for the host aware zone model") Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.h | 7 ++- drivers/scsi/sd_zbc.c | 7 +-- 2 files changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
On 6/11/24 07:19, Christoph Hellwig wrote: Move a bit of code that sets up the zone flag and the write granularity into sd_zbc_read_zones to be with the rest of the zoned limits. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 21 + drivers/scsi/sd_zbc.c | 13 - 2 files changed, 13 insertions(+), 21 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 85b45345a27739..5bfed61c70db8f 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3308,29 +3308,10 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp, blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q); } - -#ifdef CONFIG_BLK_DEV_ZONED /* sd_probe rejects ZBD devices early otherwise */ - if (sdkp->device->type == TYPE_ZBC) { - lim->zoned = true; - - /* -* Per ZBC and ZAC specifications, writes in sequential write -* required zones of host-managed devices must be aligned to -* the device physical block size. -*/ - lim->zone_write_granularity = sdkp->physical_block_size; - } else { - /* -* Host-aware devices are treated as conventional. -*/ - lim->zoned = false; - } -#endif /* CONFIG_BLK_DEV_ZONED */ - if (!sdkp->first_scan) return; - if (lim->zoned) + if (sdkp->device->type == TYPE_ZBC) Why not sd_is_zoned()? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 06/26] loop: also use the default block size from an underlying block device
On 6/11/24 07:19, Christoph Hellwig wrote: Fix the code in loop_reconfigure_limits to pick a default block size for O_DIRECT file descriptors to also work when the loop device sits on top of a block device and not just on a regular file on a block device based file system. Signed-off-by: Christoph Hellwig --- drivers/block/loop.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
On 6/11/24 07:19, Christoph Hellwig wrote: The LOOP_CONFIGURE path automatically upgrades the block size to that of the underlying file for O_DIRECT file descriptors, but the LOOP_SET_BLOCK_SIZE path does not. Fix this by lifting the code to pick the block size into common code. Signed-off-by: Christoph Hellwig --- drivers/block/loop.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits
On 6/11/24 07:19, Christoph Hellwig wrote: Simplify loop_reconfigure_limits by always updating the discard limits. This adds a little more work to loop_set_block_size, but doesn't change the outcome as the discard flag won't change. Signed-off-by: Christoph Hellwig --- drivers/block/loop.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
On 6/11/24 07:19, Christoph Hellwig wrote: __loop_clr_fd wants to clear all settings on the device. Prepare for moving more settings into the block limits by open coding loop_reconfigure_limits. Signed-off-by: Christoph Hellwig --- drivers/block/loop.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect h...@suse.de+49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH v7 1/3 RESEND] block:sed-opal: SED Opal keystore
On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce Add read and write functions that allow SED Opal keys to stored in a permanent keystore. Probably state that these are dummy functions only. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- block/Makefile | 2 +- block/sed-opal-key.c | 24 include/linux/sed-opal-key.h | 15 +++ 3 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 block/sed-opal-key.c create mode 100644 include/linux/sed-opal-key.h diff --git a/block/Makefile b/block/Makefile index 46ada9dc8bbf..ea07d80402a6 100644 --- a/block/Makefile +++ b/block/Makefile @@ -34,7 +34,7 @@ obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o obj-$(CONFIG_BLK_WBT) += blk-wbt.o obj-$(CONFIG_BLK_DEBUG_FS)+= blk-mq-debugfs.o obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o -obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o +obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o sed-opal-key.o obj-$(CONFIG_BLK_PM) += blk-pm.o obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o \ blk-crypto-sysfs.o diff --git a/block/sed-opal-key.c b/block/sed-opal-key.c new file mode 100644 index ..16f380164c44 --- /dev/null +++ b/block/sed-opal-key.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SED key operations. + * + * Copyright (C) 2022 IBM Corporation + * + * These are the accessor functions (read/write) for SED Opal + * keys. Specific keystores can provide overrides. + * + */ + +#include +#include +#include + +int __weak sed_read_key(char *keyname, char *key, u_int *keylen) +{ + return -EOPNOTSUPP; +} + +int __weak sed_write_key(char *keyname, char *key, u_int keylen) +{ + return -EOPNOTSUPP; +} Hmm. We do have security/keys, which is using a 'struct key' for their operations. Why don't you leverage that structure? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v7 3/3 RESEND] powerpc/pseries: PLPKS SED Opal keystore support
On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce Define operations for SED Opal to read/write keys from POWER LPAR Platform KeyStore(PLPKS). This allows non-volatile storage of SED Opal keys. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- arch/powerpc/platforms/pseries/Kconfig| 6 + arch/powerpc/platforms/pseries/Makefile | 1 + .../powerpc/platforms/pseries/plpks_sed_ops.c | 114 ++ block/Kconfig | 1 + 4 files changed, 122 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v7 2/3 RESEND] block: sed-opal: keystore access for SED Opal keys
On 7/21/23 23:19, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce Allow for permanent SED authentication keys by reading/writing to the SED Opal non-volatile keystore. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- block/sed-opal.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/block/sed-opal.c b/block/sed-opal.c index 6d7f25d1711b..fa23a6a60485 100644 --- a/block/sed-opal.c +++ b/block/sed-opal.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -3019,7 +3020,13 @@ static int opal_set_new_pw(struct opal_dev *dev, struct opal_new_pw *opal_pw) if (ret) return ret; - /* update keyring with new password */ + /* update keyring and key store with new password */ + ret = sed_write_key(OPAL_AUTH_KEY, + opal_pw->new_user_pw.opal_key.key, + opal_pw->new_user_pw.opal_key.key_len); + if (ret != -EOPNOTSUPP) + pr_warn("error updating SED key: %d\n", ret); + ret = update_sed_opal_key(OPAL_AUTH_KEY, opal_pw->new_user_pw.opal_key.key, opal_pw->new_user_pw.opal_key.key_len); @@ -3292,6 +3299,8 @@ EXPORT_SYMBOL_GPL(sed_ioctl); static int __init sed_opal_init(void) { struct key *kr; + char init_sed_key[OPAL_KEY_MAX]; + int keylen = OPAL_KEY_MAX - 1; kr = keyring_alloc(".sed_opal", GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), @@ -3304,6 +3313,11 @@ static int __init sed_opal_init(void) sed_opal_keyring = kr; - return 0; + if (sed_read_key(OPAL_AUTH_KEY, init_sed_key, &keylen) < 0) { + memset(init_sed_key, '\0', sizeof(init_sed_key)); + keylen = OPAL_KEY_MAX - 1; + } + + return update_sed_opal_key(OPAL_AUTH_KEY, init_sed_key, keylen); } late_initcall(sed_opal_init); See the previous patch for comments about 'struct key'. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys
On 12/1/22 19:03, Greg Joyce wrote: On Wed, 2022-11-30 at 08:00 +0100, Hannes Reinecke wrote: On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce Extend the SED block driver so it can alternatively obtain a key from a sed-opal kernel keyring. The SED ioctls will indicate the source of the key, either directly in the ioctl data or from the keyring. This allows the use of SED commands in scripts such as udev scripts so that drives may be automatically unlocked as they become available. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- block/Kconfig | 1 + block/sed-opal.c | 174 +- include/linux/sed-opal.h | 3 + include/uapi/linux/sed-opal.h | 8 +- 4 files changed, 183 insertions(+), 3 deletions(-) + ret = opal_get_key(dev, &opal_lrs->session.opal_key); + if (ret) + return ret; mutex_lock(&dev->dev_lock); setup_opal_dev(dev); ret = execute_steps(dev, lr_steps, ARRAY_SIZE(lr_steps)); @@ -2622,6 +2759,14 @@ static int opal_set_new_pw(struct opal_dev *dev, struct opal_new_pw *opal_pw) ret = execute_steps(dev, pw_steps, ARRAY_SIZE(pw_steps)); mutex_unlock(&dev->dev_lock); + if (ret) + return ret; + + /* update keyring with new password */ + ret = update_sed_opal_key(OPAL_AUTH_KEY, + opal_pw->new_user_pw.opal_key.key, + opal_pw- new_user_pw.opal_key.key_len); + return ret; } What about key revocation? You only allow to set a new key, but what happens with the old ones? My understanding was that key_create_or_update() would not allow duplicates so there shouldn't be old ones. Is that incorrect? Ah, right, you only have one key. But still, you might want to revoke that one, too, no? (Think of decommissioning old drives ...) Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys
nly allow to set a new key, but what happens with the old ones? @@ -2642,6 +2787,9 @@ static int opal_activate_user(struct opal_dev *dev, return -EINVAL; } + ret = opal_get_key(dev, &opal_session->opal_key); + if (ret) + return ret; mutex_lock(&dev->dev_lock); setup_opal_dev(dev); ret = execute_steps(dev, act_steps, ARRAY_SIZE(act_steps)); @@ -2728,6 +2876,9 @@ static int opal_generic_read_write_table(struct opal_dev *dev, { int ret, bit_set; + ret = opal_get_key(dev, &rw_tbl->key); + if (ret) + return ret; mutex_lock(&dev->dev_lock); setup_opal_dev(dev); @@ -2776,9 +2927,9 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg) if (!capable(CAP_SYS_ADMIN)) return -EACCES; if (!dev) - return -ENOTSUPP; + return -EOPNOTSUPP; if (!(dev->flags & OPAL_FL_SUPPORTED)) - return -ENOTSUPP; + return -EOPNOTSUPP; if (cmd & IOC_IN) { p = memdup_user(arg, _IOC_SIZE(cmd)); @@ -2854,3 +3005,22 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg) return ret; } EXPORT_SYMBOL_GPL(sed_ioctl); + +static int __init sed_opal_init(void) +{ + struct key *kr; + + kr = keyring_alloc(".sed_opal", + GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), + (KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW | + KEY_USR_READ | KEY_USR_SEARCH | KEY_USR_WRITE, + KEY_ALLOC_NOT_IN_QUOTA, + NULL, NULL); + if (IS_ERR(kr)) + return PTR_ERR(kr); + + sed_opal_keyring = kr; + + return 0; +} +late_initcall(sed_opal_init); Shouldn't you free the keyring on exit? diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h index 7131d7f0eec2..57d483506b4a 100644 --- a/include/linux/sed-opal.h +++ b/include/linux/sed-opal.h @@ -24,6 +24,9 @@ bool opal_unlock_from_suspend(struct opal_dev *dev); struct opal_dev *init_opal_dev(void *data, sec_send_recv *send_recv); int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *ioctl_ptr); +#define OPAL_AUTH_KEY "opal-boot-pin" +#defineOPAL_AUTH_KEY_PREV "opal-boot-pin-prev" + static inline bool is_sed_ioctl(unsigned int cmd) { switch (cmd) { diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h index fccde168e90c..6b79cdcf9518 100644 --- a/include/uapi/linux/sed-opal.h +++ b/include/uapi/linux/sed-opal.h @@ -44,10 +44,16 @@ enum opal_lock_state { OPAL_LK = 0x04, /* 0100 */ }; +enum opal_key_type { + OPAL_INCLUDED = 0, /* key[] is the key */ + OPAL_KEYRING, /* key is in keyring */ +}; + struct opal_key { __u8 lr; __u8 key_len; - __u8 __align[6]; + __u8 key_type; + __u8 __align[5]; __u8 key[OPAL_KEY_MAX]; }; Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v3 2/3] block: sed-opal: Implement IOC_OPAL_REVERT_LSP
On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce This is used in conjunction with IOC_OPAL_REVERT_TPR to return a drive to Original Factory State without erasing the data. If IOC_OPAL_REVERT_LSP is called with opal_revert_lsp.options bit OPAL_PRESERVE set prior to calling IOC_OPAL_REVERT_TPR, the drive global locking range will not be erased. Signed-off-by: Greg Joyce Reviewed-by: Christoph Hellwig Reviewed-by: Jonathan Derrick --- block/opal_proto.h| 4 block/sed-opal.c | 40 +++ include/linux/sed-opal.h | 1 + include/uapi/linux/sed-opal.h | 11 ++ 4 files changed, 56 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v3 1/3] block: sed-opal: Implement IOC_OPAL_DISCOVERY
On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote: From: Greg Joyce Add IOC_OPAL_DISCOVERY ioctl to return raw discovery data to a SED Opal application. This allows the application to display drive capabilities and state. Signed-off-by: Greg Joyce Reviewed-by: Christoph Hellwig Reviewed-by: Jonathan Derrick --- block/sed-opal.c | 38 --- include/linux/sed-opal.h | 1 + include/uapi/linux/sed-opal.h | 6 ++ 3 files changed, 42 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v2 2/3] scsi: BusLogic remove bus_to_virt
On 6/17/22 14:57, Arnd Bergmann wrote: From: Arnd Bergmann The BusLogic driver is the last remaining driver that relies on the deprecated bus_to_virt() function, which in turn only works on a few architectures, and is incompatible with both swiotlb and iommu support. Before commit 391e2f25601e ("[SCSI] BusLogic: Port driver to 64-bit."), the driver had a dependency on x86-32, presumably because of this problem. However, the change introduced another bug that made it still impossible to use the driver on any 64-bit machine. This was in turn fixed in commit 56f396146af2 ("scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic"), 8 years later, which shows that there are not a lot of users. Maciej is still using the driver on 32-bit hardware, and Khalid mentioned that the driver works with the device emulation used in VirtualBox and VMware. Both of those only emulate it for Windows 2000 and older operating systems that did not ship with the better LSI logic driver. Do a minimum fix that searches through the list of descriptors to find one that matches the bus address. This is clearly as inefficient as was indicated in the code comment about the lack of a bus_to_virt() replacement. A better fix would likely involve changing out the entire descriptor allocation for a simpler one, but that would be much more invasive. Cc: Maciej W. Rozycki Cc: Matt Wang Cc: Khalid Aziz Signed-off-by: Arnd Bergmann --- drivers/scsi/BusLogic.c | 27 --- drivers/scsi/Kconfig| 2 +- 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c index a897c8f914cf..d057abfcdd5c 100644 --- a/drivers/scsi/BusLogic.c +++ b/drivers/scsi/BusLogic.c @@ -2515,12 +2515,26 @@ static int blogic_resultcode(struct blogic_adapter *adapter, return (hoststatus << 16) | tgt_status; } +/* + * turn the dma address from an inbox into a ccb pointer + * This is rather inefficient. + */ +static struct blogic_ccb * +blogic_inbox_to_ccb(struct blogic_adapter *adapter, struct blogic_inbox *inbox) +{ + struct blogic_ccb *ccb; + + for (ccb = adapter->all_ccbs; ccb; ccb = ccb->next_all) + if (inbox->ccb == ccb->dma_handle) + break; + + return ccb; +} /* blogic_scan_inbox scans the Incoming Mailboxes saving any Incoming Mailbox entries for completion processing. */ - static void blogic_scan_inbox(struct blogic_adapter *adapter) { /* @@ -2540,16 +2554,7 @@ static void blogic_scan_inbox(struct blogic_adapter *adapter) enum blogic_cmplt_code comp_code; while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) { - /* - We are only allowed to do this because we limit our - architectures we run on to machines where bus_to_virt( - actually works. There *needs* to be a dma_addr_to_virt() - in the new PCI DMA mapping interface to replace - bus_to_virt() or else this code is going to become very - innefficient. -*/ - struct blogic_ccb *ccb = - (struct blogic_ccb *) bus_to_virt(next_inbox->ccb); + struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter, adapter->next_inbox); if (comp_code != BLOGIC_CMD_NOTFOUND) { if (ccb->status == BLOGIC_CCB_ACTIVE || ccb->status == BLOGIC_CCB_RESET) { diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index cf75588a2587..56bdc08d0b77 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -513,7 +513,7 @@ config SCSI_HPTIOP config SCSI_BUSLOGIC tristate "BusLogic SCSI support" - depends on PCI && SCSI && VIRT_TO_BUS + depends on PCI && SCSI help This is support for BusLogic MultiMaster and FlashPoint SCSI Host Adapters. Consult the SCSI-HOWTO, available from CCB handling in the driver is ugly anyway, so that'll be good enough. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Re: [PATCH v2 1/3] scsi: dpt_i2o: drop stale VIRT_TO_BUS dependency
On 6/17/22 14:57, Arnd Bergmann wrote: From: Arnd Bergmann The dpt_i2o driver was fixed to stop using virt_to_bus() in 2008, but it still has a stale reference in an error handling code path that could never work. Fix it up to build without VIRT_TO_BUS and remove the Kconfig dependency. The alternative to this would be to just remove the driver, as it is clearly obsolete. The i2o driver layer was removed in 2015 with commit 4a72a7af462d ("staging: remove i2o subsystem"), but the even older dpt_i2o scsi driver stayed around. The last non-cleanup patches I could find were from Miquel van Smoorenburg and Mark Salyzyn back in 2008, they might know if there is any chance of the hardware still being used anywhere. Fixes: 67af2b060e02 ("[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to dma_alloc_coherent") Cc: Miquel van Smoorenburg Cc: Mark Salyzyn Signed-off-by: Arnd Bergmann --- drivers/scsi/Kconfig | 2 +- drivers/scsi/dpt_i2o.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index a9fe5152addd..cf75588a2587 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -460,7 +460,7 @@ config SCSI_MVUMI config SCSI_DPT_I2O tristate "Adaptec I2O RAID support " - depends on SCSI && PCI && VIRT_TO_BUS + depends on SCSI && PCI help This driver supports all of Adaptec's I2O based RAID controllers as well as the DPT SmartRaid V cards. This is an Adaptec maintained diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c index 2e9155ba7408..55dfe7011912 100644 --- a/drivers/scsi/dpt_i2o.c +++ b/drivers/scsi/dpt_i2o.c @@ -52,11 +52,11 @@ MODULE_DESCRIPTION("Adaptec I2O RAID Driver"); #include #include +#include #include #include #include /* for boot_cpu_data */ -#include /* for virt_to_bus, etc. */ #include #include @@ -2112,7 +2112,7 @@ static irqreturn_t adpt_isr(int irq, void *dev_id) } else { /* Ick, we should *never* be here */ printk(KERN_ERR "dpti: reply frame not from pool\n"); - reply = (u8 *)bus_to_virt(m); + goto out; } if (readl(reply) & MSG_FAIL) { Reviewed-by: Hannes Reinecke Personally I wouldn't mind to see this driver gone, as it's being built upon the (long-defunct) I2O specification. We already deleted the i2o subsystem years ago, so maybe it's time to consign this driver to history, too. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Re: [PATCH 14/26] md: convert to blk_alloc_disk/blk_cleanup_disk
On 5/24/21 9:26 AM, Christoph Hellwig wrote: On Sun, May 23, 2021 at 10:12:49AM +0200, Hannes Reinecke wrote: + blk_set_stacking_limits(&mddev->queue->limits); blk_queue_write_cache(mddev->queue, true, true); /* Allow extended partitions. This makes the * 'mdp' device redundant, but we can't really Wouldn't it make sense to introduce a helper 'blk_queue_from_disk()' or somesuch to avoid having to keep an explicit 'queue' pointer? My rought plan is that a few series from now bio based drivers will never directly deal with the request_queue at all. Go for it. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 26/26] block: unexport blk_alloc_queue
On 5/21/21 7:51 AM, Christoph Hellwig wrote: blk_alloc_queue is just an internal helper now, unexport it and remove it from the public header. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 1 - block/blk.h| 2 ++ include/linux/blkdev.h | 1 - 3 files changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 25/26] null_blk: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Note that the blk-mq mode is left with its own allocations scheme, to be handled later. Signed-off-by: Christoph Hellwig --- drivers/block/null_blk/main.c | 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 24/26] xpram: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the xpram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/s390/block/xpram.c | 26 +- 1 file changed, 9 insertions(+), 17 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 23/26] dcssblk: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/s390/block/dcssblk.c | 26 -- 1 file changed, 8 insertions(+), 18 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 22/26] ps3vram: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the ps3vram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/ps3vram.c | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 21/26] n64cart: convert to blk_alloc_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the n64cart driver to use the blk_alloc_disk helper to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/n64cart.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 20/26] simdisk: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the simdisk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- arch/xtensa/platforms/iss/simdisk.c | 29 +++-- 1 file changed, 7 insertions(+), 22 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 19/26] nfblock: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the nfblock driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- arch/m68k/emu/nfblock.c | 20 +--- 1 file changed, 5 insertions(+), 15 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 18/26] nvme-multipath: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the nvme-multipath driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/nvdimm/pmem.c | 1 - drivers/nvme/host/multipath.c | 45 ++- 2 files changed, 13 insertions(+), 33 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 9fcd05084564..31f3c4bd6f72 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -472,7 +472,6 @@ static int pmem_attach_disk(struct device *dev, blk_queue_flag_set(QUEUE_FLAG_DAX, q); disk->fops = &pmem_fops; - disk->queue = q; disk->private_data = pmem; nvdimm_namespace_disk_name(ndns, disk->disk_name); set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index a5d02f236cca..b5fbdb416022 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -427,7 +427,6 @@ static void nvme_requeue_work(struct work_struct *work) int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) { - struct request_queue *q; bool vwc = false; mutex_init(&head->lock); @@ -443,33 +442,24 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) if (!(ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) || !multipath) return 0; - q = blk_alloc_queue(ctrl->numa_node); - if (!q) - goto out; - blk_queue_flag_set(QUEUE_FLAG_NONROT, q); - /* set to a default value for 512 until disk is validated */ - blk_queue_logical_block_size(q, 512); - blk_set_stacking_limits(&q->limits); - - /* we need to propagate up the VMC settings */ - if (ctrl->vwc & NVME_CTRL_VWC_PRESENT) - vwc = true; - blk_queue_write_cache(q, vwc, vwc); - - head->disk = alloc_disk(0); + head->disk = blk_alloc_disk(ctrl->numa_node); if (!head->disk) - goto out_cleanup_queue; + return -ENOMEM; head->disk->fops = &nvme_ns_head_ops; head->disk->private_data = head; - head->disk->queue = q; sprintf(head->disk->disk_name, "nvme%dn%d", ctrl->subsys->instance, head->instance); - return 0; -out_cleanup_queue: - blk_cleanup_queue(q); -out: - return -ENOMEM; + blk_queue_flag_set(QUEUE_FLAG_NONROT, head->disk->queue); + /* set to a default value of 512 until the disk is validated */ + blk_queue_logical_block_size(head->disk->queue, 512); + blk_set_stacking_limits(&head->disk->queue->limits); + + /* we need to propagate up the VMC settings */ + if (ctrl->vwc & NVME_CTRL_VWC_PRESENT) + vwc = true; + blk_queue_write_cache(head->disk->queue, vwc, vwc); + return 0; } static void nvme_mpath_set_live(struct nvme_ns *ns) @@ -768,16 +758,7 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) /* make sure all pending bios are cleaned up */ kblockd_schedule_work(&head->requeue_work); flush_work(&head->requeue_work); - blk_cleanup_queue(head->disk->queue); - if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) { - /* -* if device_add_disk wasn't called, prevent -* disk release to put a bogus reference on the -* request queue -*/ - head->disk->queue = NULL; - } - put_disk(head->disk); + blk_cleanup_disk(head->disk); } void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl) What about the check for GENHD_FL_UP a bit further up in line 766? Can this still happen with the new allocation scheme, ie is there still a difference in lifetime between ->disk and ->disk->queue? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 17/26] nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the nvdimm-pmem driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/nvdimm/pmem.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 16/26] nvdimm-btt: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the nvdimm-btt driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/nvdimm/btt.c | 24 +++- drivers/nvdimm/btt.h | 2 -- 2 files changed, 7 insertions(+), 19 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 15/26] nvdimm-blk: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the nvdimm-blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/nvdimm/blk.c | 26 ++ 1 file changed, 6 insertions(+), 20 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 14/26] md: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the md driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/md/md.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 49f897fbb89b..d806be8cc210 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5598,12 +5598,10 @@ static void md_free(struct kobject *ko) if (mddev->sysfs_level) sysfs_put(mddev->sysfs_level); - if (mddev->gendisk) + if (mddev->gendisk) { del_gendisk(mddev->gendisk); - if (mddev->queue) - blk_cleanup_queue(mddev->queue); - if (mddev->gendisk) - put_disk(mddev->gendisk); + blk_cleanup_disk(mddev->gendisk); + } percpu_ref_exit(&mddev->writes_pending); bioset_exit(&mddev->bio_set); @@ -5711,20 +5709,13 @@ static int md_alloc(dev_t dev, char *name) goto abort; error = -ENOMEM; - mddev->queue = blk_alloc_queue(NUMA_NO_NODE); - if (!mddev->queue) + disk = blk_alloc_disk(NUMA_NO_NODE); + if (!disk) goto abort; - blk_set_stacking_limits(&mddev->queue->limits); - - disk = alloc_disk(1 << shift); - if (!disk) { - blk_cleanup_queue(mddev->queue); - mddev->queue = NULL; - goto abort; - } disk->major = MAJOR(mddev->unit); disk->first_minor = unit << shift; + disk->minors = 1 << shift; if (name) strcpy(disk->disk_name, name); else if (partitioned) @@ -5733,7 +5724,9 @@ static int md_alloc(dev_t dev, char *name) sprintf(disk->disk_name, "md%d", unit); disk->fops = &md_fops; disk->private_data = mddev; - disk->queue = mddev->queue; + + mddev->queue = disk->queue; + blk_set_stacking_limits(&mddev->queue->limits); blk_queue_write_cache(mddev->queue, true, true); /* Allow extended partitions. This makes the * 'mdp' device redundant, but we can't really Wouldn't it make sense to introduce a helper 'blk_queue_from_disk()' or somesuch to avoid having to keep an explicit 'queue' pointer? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 13/26] dm: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/md/dm.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index ca2aedd8ee7d..3c7c2d257018 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1801,13 +1801,13 @@ static void cleanup_mapped_device(struct mapped_device *md) md->disk->private_data = NULL; spin_unlock(&_minor_lock); del_gendisk(md->disk); - put_disk(md->disk); } - if (md->queue) { + if (md->queue) dm_queue_destroy_keyslot_manager(md->queue); - blk_cleanup_queue(md->queue); - } + + if (md->disk) + blk_cleanup_disk(md->disk); cleanup_srcu_struct(&md->io_barrier); Can't these conditionals be merged into a single 'if (md->disk)'? Eg like: if (md->disk) { spin_lock(&_minor_lock); md->disk->private_data = NULL; spin_unlock(&_minor_lock); del_gendisk(md->disk); dm_queue_destroy_keyslot_manager(md->queue); blk_cleanup_disk(md->queue); } We're now always allocating 'md->disk' and 'md->queue' together, so how can we end up in a situation where one is set without the other? @@ -1869,13 +1869,10 @@ static struct mapped_device *alloc_dev(int minor) * established. If request-based table is loaded: blk-mq will * override accordingly. */ - md->queue = blk_alloc_queue(numa_node_id); - if (!md->queue) - goto bad; - - md->disk = alloc_disk_node(1, md->numa_node_id); + md->disk = blk_alloc_disk(md->numa_node_id); if (!md->disk) goto bad; + md->queue = md->disk->queue; init_waitqueue_head(&md->wait); INIT_WORK(&md->work, dm_wq_work); @@ -1888,6 +1885,7 @@ static struct mapped_device *alloc_dev(int minor) md->disk->major = _major; md->disk->first_minor = minor; + md->disk->minors = 1; md->disk->fops = &dm_blk_dops; md->disk->queue = md->queue; md->disk->private_data = md; Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 12/26] bcache: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the bcache driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/md/bcache/super.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 11/26] lightnvm: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the lightnvm driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/lightnvm/core.c | 23 +-- 1 file changed, 5 insertions(+), 18 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 10/26] zram: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:51 AM, Christoph Hellwig wrote: Convert the zram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/zram/zram_drv.c | 19 --- 1 file changed, 4 insertions(+), 15 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 09/26] rsxx: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Convert the rsxx driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/rsxx/dev.c | 39 +- drivers/block/rsxx/rsxx_priv.h | 1 - 2 files changed, 15 insertions(+), 25 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 08/26] pktcdvd: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Convert the pktcdvd driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/pktcdvd.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 07/26] drbd: convert to blk_alloc_disk/blk_cleanup_disk
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Convert the drbd driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig --- drivers/block/drbd/drbd_main.c | 23 --- 1 file changed, 8 insertions(+), 15 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 06/26] brd: convert to blk_alloc_disk/blk_cleanup_disk
_list); del_gendisk(brd->brd_disk); - brd_free(brd); + blk_cleanup_disk(brd->brd_disk); + brd_free_pages(brd); + kfree(brd); } static inline void brd_check_and_reset_par(void) @@ -485,7 +470,7 @@ static inline void brd_check_and_reset_par(void) static int __init brd_init(void) { struct brd_device *brd, *next; - int i; + int err, i; /* * brd module now has a feature to instantiate underlying device @@ -511,22 +496,11 @@ static int __init brd_init(void) mutex_lock(&brd_devices_mutex); for (i = 0; i < rd_nr; i++) { - brd = brd_alloc(i); - if (!brd) + err = brd_alloc(i); + if (err) goto out_free; - list_add_tail(&brd->brd_list, &brd_devices); } - /* point of no return */ - - list_for_each_entry(brd, &brd_devices, brd_list) { - /* -* associate with queue just before adding disk for -* avoiding to mess up failure path -*/ - brd->brd_disk->queue = brd->brd_queue; - add_disk(brd->brd_disk); - } mutex_unlock(&brd_devices_mutex); pr_info("brd: module loaded\n"); @@ -535,15 +509,13 @@ static int __init brd_init(void) out_free: debugfs_remove_recursive(brd_debugfs_dir); - list_for_each_entry_safe(brd, next, &brd_devices, brd_list) { - list_del(&brd->brd_list); - brd_free(brd); - } + list_for_each_entry_safe(brd, next, &brd_devices, brd_list) + brd_del_one(brd); mutex_unlock(&brd_devices_mutex); unregister_blkdev(RAMDISK_MAJOR, "ramdisk"); pr_info("brd: module NOT loaded !!!\n"); - return -ENOMEM; + return err; } static void __exit brd_exit(void) Other than that: Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 05/26] block: add blk_alloc_disk and blk_cleanup_disk APIs
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Add two new APIs to allocate and free a gendisk including the request_queue for use with BIO based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: Christoph Hellwig --- block/genhd.c | 35 +++ include/linux/genhd.h | 22 ++ 2 files changed, 57 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 04/26] block: add a flag to make put_disk on partially initalized disks safer
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Add a flag to indicate that __device_add_disk did grab a queue reference so that disk_release only drops it if we actually had it. This sort out one of the major pitfals with partially initialized gendisk that pitfalls a lot of drivers did get wrong or still do. Signed-off-by: Christoph Hellwig --- block/genhd.c | 7 +-- include/linux/genhd.h | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 03/26] block: automatically enable GENHD_FL_EXT_DEVT
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated without an explicit number of minors. This is what all new block drivers should do, so make sure it is the default without boilerplate code. Signed-off-by: Christoph Hellwig --- block/genhd.c| 2 +- block/partitions/core.c | 4 drivers/block/n64cart.c | 2 +- drivers/lightnvm/core.c | 1 - drivers/memstick/core/ms_block.c | 1 - drivers/nvdimm/blk.c | 1 - drivers/nvdimm/btt.c | 1 - drivers/nvdimm/pmem.c| 1 - drivers/nvme/host/core.c | 1 - drivers/nvme/host/multipath.c| 1 - 10 files changed, 2 insertions(+), 13 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 02/26] block: move the DISK_MAX_PARTS sanity check into __device_add_disk
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Keep this together with the first place that actually looks at ->minors and prepare for not passing a minors argument to alloc_disk. Signed-off-by: Christoph Hellwig --- block/genhd.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 01/26] block: refactor device number setup in __device_add_disk
On 5/21/21 7:50 AM, Christoph Hellwig wrote: Untangle the mess around blk_alloc_devt by moving the check for the used allocation scheme into the callers. Signed-off-by: Christoph Hellwig --- block/blk.h | 4 +- block/genhd.c | 96 - block/partitions/core.c | 15 +-- 3 files changed, 49 insertions(+), 66 deletions(-) ... and also fixes an issue with GENHD_FL_UP remained set in an error path in __device_add_disk(). Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement
On 12/4/20 3:26 PM, Brian King wrote: On 12/2/20 11:27 AM, Tyrel Datwyler wrote: On 12/2/20 7:14 AM, Brian King wrote: On 12/1/20 6:53 PM, Tyrel Datwyler wrote: Introduce several new vhost fields for managing MQ state of the adapter as well as initial defaults for MQ enablement. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 9 - drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++-- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 42e4d35e0d35..f1d677a7423d 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) } shost->transportt = ibmvfc_transport_template; - shost->can_queue = max_requests; + shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES); This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth. Our max_requests is the total number commands allowed across all queues. From what I understand is can_queue is the total number of commands in flight allowed for each hw queue. /* * In scsi-mq mode, the number of hardware queues supported by the LLD. * * Note: it is assumed that each hardware queue has a queue depth of * can_queue. In other words, the total queue depth per host * is nr_hw_queues * can_queue. However, for when host_tagset is set, * the total queue depth is can_queue. */ We currently don't use the host wide shared tagset. Ok. I missed that bit... In that case, since we allocate by default only 100 event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then we end up with only about 6 commands that can be outstanding per queue, which is going to really hurt performance... I'd suggest bumping up IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point. Before doing that I'd rather use the host-wide shared tagset. Increasing the number of requests will increase the memory footprint of the driver (as each request will be statically allocated). Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH 00/13] ibmvfc: initial MQ development
On 11/26/20 2:48 AM, Tyrel Datwyler wrote: Recent updates in pHyp Firmware and VIOS releases provide new infrastructure towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then negotiated with the VIOS via new Management Datagrams (MADs) for channel setup. This initial implementation adds the necessary Sub-CRQ framework and implements the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS HW backed channels. The event pool and locking still leverages the legacy single queue implementation, and as such lock contention is problematic when increasing the number of queues. However, this initial work demonstrates a 1.2x factor increase in IOPs when configured with two HW queues despite lock contention. Why do you still hold the hold lock during submission? An initial check on the submission code path didn't reveal anything obvious, so it _should_ be possible to drop the host lock there. Or at least move it into the submission function itself to avoid lock contention. Hmm? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [mainline] rcu stalls on CPU when unbinding mpt3sas driver
792d47820] [dfc72604] sas_rphy_remove+0x94/0xa0 > [scsi_transport_sas] > [c07792d47850] [dfc745bc] sas_port_delete+0x4c/0x238 > [scsi_transport_sas] > [c07792d478b0] [d00010e82990] > mpt3sas_transport_port_remove+0x2d0/0x310 [mpt3sas] > [c07792d47950] [d00010e71ba0] _scsih_remove_device+0x100/0x2a0 > [mpt3sas] > [c07792d47a10] [d00010e774d4] > mpt3sas_device_remove_by_sas_address.part.44+0xb4/0x160 [mpt3sas] > [c07792d47a70] [d00010e77614] _scsih_expander_node_remove+0x94/0x170 > [mpt3sas] > [c07792d47af0] [d00010e77a88] > mpt3sas_expander_remove.part.46+0x398/0xe70 [mpt3sas] > [c07792d47b90] [c056a9c4] pci_device_remove+0x64/0x110 > [c07792d47bd0] [c060fa74] > device_release_driver_internal+0x1e4/0x2c0 > [c07792d47c20] [c060d260] unbind_store+0x110/0x140 > [c07792d47c70] [c060c2fc] drv_attr_store+0x3c/0x60 > [c07792d47c90] [c03a03c4] sysfs_kf_write+0x64/0xa0 > [c07792d47cb0] [c039f1b0] kernfs_fop_write+0x170/0x250 > [c07792d47d00] [c02fd370] __vfs_write+0x40/0x200 > [c07792d47d90] [c02fd748] vfs_write+0xc8/0x240 > [c07792d47de0] [c02fda80] SyS_write+0x60/0x110 > [c07792d47e30] [c000b8e0] system_call+0x58/0x6c > This is probably the same issue as discussed in threads "[PATCH] scsi: check for device state in __scsi_remove_target()" and "[PATCH] scsi: fix race condition when removing target". Please test with a patch from there. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 3/3] cxlflash: Update debug prints in reset handlers
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote: > The device and host reset handler contain debug prints to help > identify the entities being reset. Today these reset handlers > are based on a SCSI EH design that uses a SCSI command reference > as a means of identifying the target entity. As such, the debug > trace includes the SCSI command pointer and associated CDB. This > is not necessary as the SCSI command is simply the messenger in > these scenarios. > > Refactor the debug prints in the host and reset handlers to only > present information that is applicable given the function scope. > > Signed-off-by: Matthew R. Ochs > --- > drivers/scsi/cxlflash/main.c | 18 +++--- > 1 file changed, 3 insertions(+), 15 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 2/3] cxlflash: Update send_tmf() parameters
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote: > The current send_tmf() implementation is based on the caller providing > a SCSI command reference. In reality all that is needed is a SCSI device > reference as the routine uses a private command. > > Refactor send_tmf() to pass the private adapter configuration reference > and a SCSI device reference. As a nice side effect, this will ease the > burden of converting caller routines to be based solely off of a SCSI > device reference. > > Signed-off-by: Matthew R. Ochs > --- > drivers/scsi/cxlflash/main.c | 27 +-- > 1 file changed, 13 insertions(+), 14 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 1/3] cxlflash: Avoid double free of character device
On 06/28/2017 07:14 PM, Matthew R. Ochs wrote: > The device_unregister() service used when cleaning up the character > device is already responsible for the internal state associated with > the device upon successful creation. As the cxlflash driver does not > obtain a second reference to the character device, the explicit call > to put_device() is not required and can lead to an inconsistent sysfs > among other issues as the reference is no longer valid after the first > put_device() is performed. > > Remove the unnecessary put_device() to remedy this issue. > > Fixes: a834a36b57d9 ("scsi: cxlflash: Create character device to provide host > management interface") > Signed-off-by: Matthew R. Ochs > --- > drivers/scsi/cxlflash/main.c | 1 - > 1 file changed, 1 deletion(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
[PATCH] powerpc: conditionally compile platform-specific serial drivers
mpsc.c and mpc52xx-psc.c are platform-specific serial drivers, and should be compiled for the respective platforms only. Signed-off-by: Hannes Reinecke --- arch/powerpc/boot/Makefile | 7 --- arch/powerpc/boot/serial.c | 4 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index e82f333..1a4609c 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -99,14 +99,15 @@ src-wlib-y := string.S crt0.S crtsavres.S stdio.c decompress.c main.c \ $(libfdt) libfdt-wrapper.c \ ns16550.c serial.c simple_alloc.c div64.S util.S \ elf_util.c $(zlib-y) devtree.c stdlib.c \ - oflib.c ofconsole.c cuboot.c mpsc.c cpm-serial.c \ - uartlite.c mpc52xx-psc.c opal.c + oflib.c ofconsole.c cuboot.c cpm-serial.c \ + uartlite.c opal.c +src-wlib-$(CONFIG_PPC_MPC52XX) += mpc52xx-psc.c src-wlib-$(CONFIG_PPC64_BOOT_WRAPPER) += opal-calls.S src-wlib-$(CONFIG_40x) += 4xx.c planetcore.c src-wlib-$(CONFIG_44x) += 4xx.c ebony.c bamboo.c src-wlib-$(CONFIG_8xx) += mpc8xx.c planetcore.c fsl-soc.c src-wlib-$(CONFIG_PPC_82xx) += pq2.c fsl-soc.c planetcore.c -src-wlib-$(CONFIG_EMBEDDED6xx) += mv64x60.c mv64x60_i2c.c ugecon.c fsl-soc.c +src-wlib-$(CONFIG_EMBEDDED6xx) += mpsc.c mv64x60.c mv64x60_i2c.c ugecon.c fsl-soc.c src-plat-y := of.c epapr.c src-plat-$(CONFIG_40x) += fixed-head.S ep405.c cuboot-hotfoot.c \ diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c index e04c1e4..7b5c02b 100644 --- a/arch/powerpc/boot/serial.c +++ b/arch/powerpc/boot/serial.c @@ -120,15 +120,19 @@ int serial_console_init(void) if (dt_is_compatible(devp, "ns16550") || dt_is_compatible(devp, "pnpPNP,501")) rc = ns16550_console_init(devp, &serial_cd); +#ifdef CONFIG_EMBEDDED6xx else if (dt_is_compatible(devp, "marvell,mv64360-mpsc")) rc = mpsc_console_init(devp, &serial_cd); +#endif else if (dt_is_compatible(devp, "fsl,cpm1-scc-uart") || dt_is_compatible(devp, "fsl,cpm1-smc-uart") || dt_is_compatible(devp, "fsl,cpm2-scc-uart") || dt_is_compatible(devp, "fsl,cpm2-smc-uart")) rc = cpm_console_init(devp, &serial_cd); +#ifdef CONFIG_PPC_MPC52XX else if (dt_is_compatible(devp, "fsl,mpc5200-psc-uart")) rc = mpc5200_psc_console_init(devp, &serial_cd); +#endif else if (dt_is_compatible(devp, "xlnx,opb-uartlite-1.00.b") || dt_is_compatible(devp, "xlnx,xps-uartlite-1.00.a")) rc = uartlite_console_init(devp, &serial_cd); -- 1.8.5.6
Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly
On 10/28/2016 11:53 AM, Steffen Maier wrote: > > > On 10/13/2016 06:24 PM, Johannes Thumshirn wrote: >> On Thu, Oct 13, 2016 at 05:15:25PM +0200, Steffen Maier wrote: >>> I'm puzzled. >>> >>> $ git bisect start fc_bsg master > >>>> 3087864ce3d7282f59021245d8a5f83ef1caef18 is the first bad commit >>>> commit 3087864ce3d7282f59021245d8a5f83ef1caef18 >>>> Author: Johannes Thumshirn >>>> Date: Wed Oct 12 15:06:28 2016 +0200 >>>> >>>> scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly >>>> >>>> Don't use fc_bsg_job::request and fc_bsg_job::reply directly, >>>> but use >>>> helper variables bsg_request and bsg_reply. This will be >>>> helpfull when >>>> transitioning to bsg-lib. >>>> >>>> Signed-off-by: Johannes Thumshirn >>>> >>>> :04 04 140c4b6829d5cfaec4079716e0795f63f8bc3bd2 >>>> 0d9fe225615679550be91fbd9f84c09ab1e280fc Mdrivers >>> >>> From there (on the reverse bisect path) I get the following Oops, >>> except for the full patch set having another stack trace as in my >>> previous >>> mail (dying in zfcp code). >>> >> >> [...] >> >>> >>>> @@ -3937,6 +3944,7 @@ fc_bsg_request_handler(struct request_queue >>>> *q, struct Scsi_Host *shost, >>>> struct request *req; >>>> struct fc_bsg_job *job; >>>> enum fc_dispatch_result ret; >>>> +struct fc_bsg_reply *bsg_reply; >>>> >>>> if (!get_device(dev)) >>>> return; >>>> @@ -3973,8 +3981,9 @@ fc_bsg_request_handler(struct request_queue >>>> *q, struct Scsi_Host *shost, >>>> /* check if we have the msgcode value at least */ >>>> if (job->request_len < sizeof(uint32_t)) { >>>> BUG_ON(job->reply_len < sizeof(uint32_t)); >>>> -job->reply->reply_payload_rcv_len = 0; >>>> -job->reply->result = -ENOMSG; >>>> +bsg_reply = job->reply; >>>> +bsg_reply->reply_payload_rcv_len = 0; >>>> +bsg_reply->result = -ENOMSG; > > Compiler optimization re-ordered above two lines and the first pointer > derefence is bsg_reply->result [field offset 0] where bsg_reply is NULL. > The assignment tries to write to memory at address NULL causing the > kernel page fault. > I spoke to our compiler people, and they strongly believed this not to be the case. Or, put it the other way round, if such a thing would happen it would be a compiler issue. Have you checked the compiler output? Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 13/16] scsi: fc: use bsg_job_done
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > fc_bsg_jobdone() and bsg_job_done() are 1:1 copies now so use the bsg-lib one > instead of the FC private implementation. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 2 +- > drivers/scsi/bfa/bfad_bsg.c | 4 ++-- > drivers/scsi/ibmvscsi/ibmvfc.c | 2 +- > drivers/scsi/libfc/fc_lport.c| 4 ++-- > drivers/scsi/lpfc/lpfc_bsg.c | 38 +- > drivers/scsi/qla2xxx/qla_bsg.c | 44 > > drivers/scsi/scsi_transport_fc.c | 41 +++-- > include/scsi/scsi_transport_fc.h | 2 -- > 8 files changed, 50 insertions(+), 87 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 10/16] scsi: change FC drivers to use 'struct bsg_job'
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Change FC drivers to use 'struct bsg_job' from bsg-lib.h instead of 'struct > fc_bsg_job' from scsi_transport_fc.h and remove 'struct fc_bsg_job'. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_ext.h | 4 +-- > drivers/s390/scsi/zfcp_fc.c | 15 > drivers/scsi/bfa/bfad_bsg.c | 10 +++--- > drivers/scsi/bfa/bfad_im.h | 4 +-- > drivers/scsi/ibmvscsi/ibmvfc.c | 9 ++--- > drivers/scsi/libfc/fc_lport.c| 10 +++--- > drivers/scsi/lpfc/lpfc_bsg.c | 74 > > drivers/scsi/lpfc/lpfc_crtn.h| 4 +-- > drivers/scsi/qla2xxx/qla_bsg.c | 61 + > drivers/scsi/qla2xxx/qla_def.h | 2 +- > drivers/scsi/qla2xxx/qla_gbl.h | 4 +-- > drivers/scsi/qla2xxx/qla_iocb.c | 8 ++--- > drivers/scsi/qla2xxx/qla_isr.c | 6 ++-- > drivers/scsi/qla2xxx/qla_mr.c| 5 +-- > drivers/scsi/scsi_transport_fc.c | 20 +-- > include/scsi/libfc.h | 2 +- > include/scsi/scsi_transport_fc.h | 63 ++---- > 17 files changed, 138 insertions(+), 163 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 06/16] scsi: fc: provide fc_bsg_to_rport() helper
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Provide fc_bsg_to_rport() helper that will become handy when we're moving > from struct fc_bsg_job to a plain struct bsg_job. Also move all LLDDs to use > the new helper. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 5 +++-- > drivers/scsi/ibmvscsi/ibmvfc.c | 2 +- > drivers/scsi/libfc/fc_lport.c| 4 ++-- > drivers/scsi/lpfc/lpfc_bsg.c | 4 ++-- > drivers/scsi/qla2xxx/qla_bsg.c | 4 ++-- > drivers/scsi/scsi_transport_fc.c | 3 ++- > include/scsi/scsi_transport_fc.h | 5 + > 7 files changed, 17 insertions(+), 10 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 05/16] scsi: fc: provide fc_bsg_to_shost() helper
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Provide fc_bsg_to_shost() helper that will become handy when we're moving from > struct fc_bsg_job to a plain struct bsg_job. Also use this little helper in > the LLDDs. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 4 +-- > drivers/scsi/bfa/bfad_bsg.c | 6 ++--- > drivers/scsi/ibmvscsi/ibmvfc.c | 4 +-- > drivers/scsi/libfc/fc_lport.c| 2 +- > drivers/scsi/lpfc/lpfc_bsg.c | 32 > drivers/scsi/qla2xxx/qla_bsg.c | 54 > > drivers/scsi/scsi_transport_fc.c | 2 +- > include/scsi/scsi_transport_fc.h | 5 > 8 files changed, 56 insertions(+), 53 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 04/16] scsi: Unify interfaces of fc_bsg_jobdone and bsg_job_done
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Unify the interfaces of fc_bsg_jobdone and bsg_job_done. This will reduce the > diff when moving from 'struct fc_bsg_job' to a plain 'struct bsg_job' later > on. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 2 +- > drivers/scsi/bfa/bfad_bsg.c | 6 ++-- > drivers/scsi/ibmvscsi/ibmvfc.c | 3 +- > drivers/scsi/libfc/fc_lport.c| 6 ++-- > drivers/scsi/lpfc/lpfc_bsg.c | 68 > +++- > drivers/scsi/qla2xxx/qla_bsg.c | 66 +- > drivers/scsi/scsi_transport_fc.c | 22 +++-- > include/scsi/scsi_transport_fc.h | 3 +- > 8 files changed, 116 insertions(+), 60 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 03/16] scsi: fc: Export fc_bsg_jobdone and use it in FC drivers
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Export fc_bsg_jobdone so drivers can use it directly instead of doing > the round-trip via struct fc_bsg_job::job_done() and use it in the LLDDs. > > As we've converted all LLDDs over to use fc_bsg_jobdone() directly, > we can remove the function pointer from struct fc_bsg_job as well. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 2 +- > drivers/scsi/bfa/bfad_bsg.c | 4 ++-- > drivers/scsi/ibmvscsi/ibmvfc.c | 2 +- > drivers/scsi/libfc/fc_lport.c| 4 ++-- > drivers/scsi/lpfc/lpfc_bsg.c | 38 +- > drivers/scsi/qla2xxx/qla_bsg.c | 44 > > drivers/scsi/scsi_transport_fc.c | 5 ++--- > include/scsi/scsi_transport_fc.h | 2 +- > 8 files changed, 50 insertions(+), 51 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly
On 10/12/2016 03:06 PM, Johannes Thumshirn wrote: > Don't use fc_bsg_job::request and fc_bsg_job::reply directly, but use > helper variables bsg_request and bsg_reply. This will be helpfull when > transitioning to bsg-lib. > > Signed-off-by: Johannes Thumshirn > --- > drivers/s390/scsi/zfcp_fc.c | 9 +- > drivers/scsi/bfa/bfad_bsg.c | 40 +++--- > drivers/scsi/ibmvscsi/ibmvfc.c | 22 ++-- > drivers/scsi/libfc/fc_lport.c| 23 ++-- > drivers/scsi/lpfc/lpfc_bsg.c | 194 +--- > drivers/scsi/qla2xxx/qla_bsg.c | 264 > ++- > drivers/scsi/qla2xxx/qla_iocb.c | 5 +- > drivers/scsi/qla2xxx/qla_isr.c | 46 --- > drivers/scsi/qla2xxx/qla_mr.c| 10 +- > drivers/scsi/scsi_transport_fc.c | 37 +++--- > 10 files changed, 387 insertions(+), 263 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/25/2015 06:56 PM, Jens Axboe wrote: On 11/25/2015 02:04 AM, Hannes Reinecke wrote: On 11/20/2015 04:28 PM, Ewan Milne wrote: On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote: Can't we have a joint effort here? I've been spending a _LOT_ of time trying to debug things here, but none of the ideas I've come up with have been able to fix anything. Yes. I'm not the one primarily looking at it, and we don't have a reproducer in-house. We just have the one dump right now. I'm almost tempted to increase the count from scsi_alloc_sgtable() by one and be done with ... That might not fix it if it is a problem with the merge code, though. And indeed, it doesn't. Seems I finally found the culprit. What happens is this: We have two paths, with these seg_boundary_masks: path-1:seg_boundary_mask = 65535, path-2:seg_boundary_mask = 4294967295, consequently the DM request queue has this: md-1:seg_boundary_mask = 65535, What happens now is that a request is being formatted, and sent to path 2. During submission req->nr_phys_segments is formatted with the limits of path 2, arriving at a count of 3. Now the request gets retried on path 1, but as the NOMERGE request flag is set req->nr_phys_segments is never updated. But blk_rq_map_sg() ignores all counters, and just uses the bi_vec directly, resulting in a count of 4 -> boom. So the culprit here is the NOMERGE flag, which is evaluated via ->dm_dispatch_request() ->blk_insert_cloned_request() ->blk_rq_check_limits() If the above assessment is correct, the following patch should fix it: diff --git a/block/blk-core.c b/block/blk-core.c index 801ced7..12cccd6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (!rq_mergeable(rq)) + if (rq->cmd_type != REQ_TYPE_FS) return 0; if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) { Mike? Jens? Can you comment on it? We only support merging on REQ_TYPE_FS already, so how is the above making it any different? In general, NOMERGE being set or not should not make a difference. It's only a hint that we need not check further if we should be merging on this request, since we already tried it once, found we'd exceed various limits, then set NOMERGE to reflect that. The problem is that NOMERGE does too much, as it inhibits _any_ merging. Unfortunately, the req->nr_phys_segments value is evaluated in the final _driver_ context _after_ the merging happend; cf scsi_lib.c:scsi_init_sgtable(). As nr_phys_segments is inherited from the original request (and never recalculated with the new request queue limits) the following blk_rq_map_sg() call might end up at a different calculation, especially after retrying a request on another path. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/25/2015 07:01 PM, Mike Snitzer wrote: On Wed, Nov 25 2015 at 4:04am -0500, Hannes Reinecke wrote: On 11/20/2015 04:28 PM, Ewan Milne wrote: On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote: Can't we have a joint effort here? I've been spending a _LOT_ of time trying to debug things here, but none of the ideas I've come up with have been able to fix anything. Yes. I'm not the one primarily looking at it, and we don't have a reproducer in-house. We just have the one dump right now. I'm almost tempted to increase the count from scsi_alloc_sgtable() by one and be done with ... That might not fix it if it is a problem with the merge code, though. And indeed, it doesn't. How did you arrive at that? Do you have a reproducer now? Not a reproducer, but several dumps for analysis. Seems I finally found the culprit. What happens is this: We have two paths, with these seg_boundary_masks: path-1:seg_boundary_mask = 65535, path-2:seg_boundary_mask = 4294967295, consequently the DM request queue has this: md-1:seg_boundary_mask = 65535, What happens now is that a request is being formatted, and sent to path 2. During submission req->nr_phys_segments is formatted with the limits of path 2, arriving at a count of 3. Now the request gets retried on path 1, but as the NOMERGE request flag is set req->nr_phys_segments is never updated. But blk_rq_map_sg() ignores all counters, and just uses the bi_vec directly, resulting in a count of 4 -> boom. So the culprit here is the NOMERGE flag, NOMERGE is always set in __blk_rq_prep_clone() for cloned requests. Yes. which is evaluated via ->dm_dispatch_request() ->blk_insert_cloned_request() ->blk_rq_check_limits() blk_insert_cloned_request() is the only caller of blk_rq_check_limits(); anyway after reading your mail I'm still left wondering if your proposed patch is correct. If the above assessment is correct, the following patch should fix it: diff --git a/block/blk-core.c b/block/blk-core.c index 801ced7..12cccd6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (!rq_mergeable(rq)) + if (rq->cmd_type != REQ_TYPE_FS) return 0; if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) { Mike? Jens? Can you comment on it? You're not explaining the actual change in the patch very well; I think you're correct but you're leaving the justification as an exercise to the reviewer: blk_rq_check_limits() will call blk_recalc_rq_segments() after the !rq_mergeable() check but you're saying for this case in question we never get there -- due to the cloned request having NOMERGE set. So in blk_rq_check_limits() you've unrolled rq_mergeable() and open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS) I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no sense for cloned requests that always have NOMERGE set. So you're saying that by having blk_rq_check_limits() go on to call blk_recalc_rq_segments() this bug will be fixed? That is the idea. I've already established that in all instances I have seen so far req->nr_phys_segments is _less_ than req->bio->bi_phys_segments. As it turns out, req->nr_phys_segemnts _would_ have been updated in blk_rq_check_limits(), but isn't due to the NOMERGE flag being set for the cloned request. So each cloned request inherits the values from the original request, despite the fact that req->nr_phys_segments _has_ to be evaluated in the final request_queue context, as the queue limits _might_ be different from the original (merged) queue limits of the multipath request queue. BTW, I think blk_rq_check_limits()'s export should be removed and the function made static and renamed to blk_clone_rq_check_limits(), again: blk_insert_cloned_request() is the only caller of blk_rq_check_limits() Actually, seeing Jens' last comment the check for REQ_TYPE_FS is pointless, too, so we might as well remove the entire if-clause. Seems prudent to make that change now to be clear that this code is only used by cloned requests. Yeah, that would make sense. I'll be preparing a patch. With a more detailed description :-) Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/20/2015 04:28 PM, Ewan Milne wrote: > On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote: >> Can't we have a joint effort here? >> I've been spending a _LOT_ of time trying to debug things here, but >> none of the ideas I've come up with have been able to fix anything. > > Yes. I'm not the one primarily looking at it, and we don't have a > reproducer in-house. We just have the one dump right now. > >> >> I'm almost tempted to increase the count from scsi_alloc_sgtable() >> by one and be done with ... >> > > That might not fix it if it is a problem with the merge code, though. > And indeed, it doesn't. Seems I finally found the culprit. What happens is this: We have two paths, with these seg_boundary_masks: path-1:seg_boundary_mask = 65535, path-2:seg_boundary_mask = 4294967295, consequently the DM request queue has this: md-1:seg_boundary_mask = 65535, What happens now is that a request is being formatted, and sent to path 2. During submission req->nr_phys_segments is formatted with the limits of path 2, arriving at a count of 3. Now the request gets retried on path 1, but as the NOMERGE request flag is set req->nr_phys_segments is never updated. But blk_rq_map_sg() ignores all counters, and just uses the bi_vec directly, resulting in a count of 4 -> boom. So the culprit here is the NOMERGE flag, which is evaluated via ->dm_dispatch_request() ->blk_insert_cloned_request() ->blk_rq_check_limits() If the above assessment is correct, the following patch should fix it: diff --git a/block/blk-core.c b/block/blk-core.c index 801ced7..12cccd6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (!rq_mergeable(rq)) + if (rq->cmd_type != REQ_TYPE_FS) return 0; if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) { Mike? Jens? Can you comment on it? Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/20/2015 04:28 PM, Ewan Milne wrote: On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote: Can't we have a joint effort here? I've been spending a _LOT_ of time trying to debug things here, but none of the ideas I've come up with have been able to fix anything. Yes. I'm not the one primarily looking at it, and we don't have a reproducer in-house. We just have the one dump right now. Oh, I got plenty of them :-( I'm almost tempted to increase the count from scsi_alloc_sgtable() by one and be done with ... That might not fix it if it is a problem with the merge code, though. Of course not. But it'll be a band-aid to keep the customer happy. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/20/2015 03:38 PM, Ewan Milne wrote: > On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote: >> On 11/19/2015 09:23 AM, Christoph Hellwig wrote: >>> It's pretty much guaranteed a block layer bug, most likely in the >>> merge bios to request infrastucture where we don't obey the merging >>> limits properly. >>> >>> Does either of you have a known good and first known bad kernel? >> >> Well, I have been fighting a similar issue for several months now, >> albeit with multipath enabled. Haven't had much progress with this, >> sadly. >> Seeing that this is our distro kernel it might or might not be >> related; however, as the symptoms are identical there still is a >> chance that this is actually a generic block-layer problem. >> >> Cheers, >> >> Hannes > > We have seen this also. (e.g. req->nr_phys_segments was 3, but > blk_rq_map_sg() returned 4.) I was suspicious of the patch: > > bio: modify __bio_add_page() to accept pages that don't start a new segment > > But we put some debugging code in and didn't hit it. We haven't > found the problem yet, either, though. We're still looking. > Can't we have a joint effort here? I've been spending a _LOT_ of time trying to debug things here, but none of the ideas I've come up with have been able to fix anything. I'm almost tempted to increase the count from scsi_alloc_sgtable() by one and be done with ... > As Christoph said, it would seem to be a problem with the block layer > merging. > > The API for this seems defective, in that blk_rq_map_sg() should > never be returning a value indicating that it overwrote past the > end of the supplied SG array and depend on the caller to check it. > (We could get data corruption on another I/O if it used adjacent > memory for a different SG list, for example.) > Yeah, the API is bloody useless. By the time you hit the BUG_ON you've already caused a memory corruption, so no way you can recover there. At the very least we should be passing in the sg list count into blk_map_rq_sg(), but that's a core blocklayer API and changes here would require changes by quite a set of drivers. Plus it wouldn't help me for a distribution kernel ... Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
On 11/19/2015 09:23 AM, Christoph Hellwig wrote: > It's pretty much guaranteed a block layer bug, most likely in the > merge bios to request infrastucture where we don't obey the merging > limits properly. > > Does either of you have a known good and first known bad kernel? Well, I have been fighting a similar issue for several months now, albeit with multipath enabled. Haven't had much progress with this, sadly. Seeing that this is our distro kernel it might or might not be related; however, as the symptoms are identical there still is a chance that this is actually a generic block-layer problem. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 3.15 on POWER8 with multipath SCSI
On 06/30/2014 11:28 PM, Paul Mackerras wrote: On Mon, Jun 30, 2014 at 01:35:20PM +0200, Hannes Reinecke wrote: On 06/30/2014 01:02 PM, Paul Mackerras wrote: [ .. ] No, I'm not using LVM, and in fact I deleted all the physical volumes that were on any of the disks (they were installations of other distros), so there are no physical or logical volumes anywhere on any disk. I haven't tried disabling the LVM service completely, though. What would it mean if disabling the LVM service made a difference? Yes. LVM integration with systemd is a science unto itself. I'm reasonably confident with multipath, but not LVM. Plus the fact the the LVM service apparently is waiting for something sort of points into that direction. So please do disable the lvm service. I disabled the LVM service, and it's still bad. Unmodified 3.15 booted successfully in only 18 out of 50 attempts with LVM disabled. So it's not LVM. In any case LVM was fine with a 3.14 kernel. Right, that was just a cross-check to eliminate any variables. I'll be checking here at my end. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 3.15 on POWER8 with multipath SCSI
On 06/30/2014 01:02 PM, Paul Mackerras wrote: On Mon, Jun 30, 2014 at 12:52:29PM +0200, Hannes Reinecke wrote: On 06/30/2014 12:30 PM, Paul Mackerras wrote: I have a machine on which 3.15 usually fails to boot, and 3.14 boots every time. The machine is a POWER8 2-socket server with 20 cores (thus 160 CPUs), 128GB of RAM, and 7 SCSI disks connected via a hardware-RAID-capable adapter which appears as two IPR controllers which are both connected to each disk. I am booting from a disk that has Fedora 20 installed on it. After over two weeks of bisections, I can finally point to the commits that cause the problems. The culprits are: 3e9f1be1 dm mpath: remove process_queued_ios() e8099177 dm mpath: push back requests instead of queueing bcccff93 kobject: don't block for each kobject_uevent The interesting thing is that neither e8099177 nor bcccff93 cause failures on their own, but with both commits in there are failures where the system will fail to find /home on some occasions. With 3e9f1be1 included, the system appears to be prone to a deadlock condition which typically causes the boot process to hang with this message showing: A start job is running for Monitoring of LVM2 mirror...rogress polling (with a [*** ] thing before it where the asterisks move back and forth). If I revert 63d832c3 ("dm mpath: really fix lockdep warning") , 4cdd2ad7 ("dm mpath: fix lock order inconsistency in multipath_ioctl"), 3e9f1be1 and bcccff93, in that order, I get a kernel that will boot every time. The first two are later commits that fix some problems with 3e9f1be1 (though not the problems I am seeing). Can anyone see any reason why e8099177 and bcccff93 would interfere with each other? It might be running afoul with the 'cookie' mechanism. Device-mapper is using inserting a 'cookie' with the ioctl, and listens to any event containing the cookie to ensure udev has finished processing that device and hence the device node is accessible. Added to this is the problem that we don't have any good means of detecting any changes to device-mapper devices. EG look at this sequence of events: add dm-1 remove dm-1 add dm-1 Originally udev would pick up the event, read the details from sysfs, and return control to the kernel. With bcccff93 udev will _not_have a chance to read the details from sysfs for 'dm-1', as anything read from sysfs relating to 'dm-1' might infact refer to the _second_ 'add' event, which might be a totally different device. As far as I know udev doesn't have any mechanism to drop events, so it'll always process all events. Assuming that the sysfs attributes it reads _do_ relate to that event. If they don't things become interesting ... (Actually, this issue was always present, especially with multipathing. multipath occasionally can become sluggish when processing events, so the same might happen with it. We've tried to work around this, but never found a fool-proof way of doing so). Adding Kay as he might have some more insight here. Another thing: Do you run LVM on top of multipathing? If so, could you setup your system with _not_ using LVM and disabling the LVM service? No, I'm not using LVM, and in fact I deleted all the physical volumes that were on any of the disks (they were installations of other distros), so there are no physical or logical volumes anywhere on any disk. I haven't tried disabling the LVM service completely, though. What would it mean if disabling the LVM service made a difference? Yes. LVM integration with systemd is a science unto itself. I'm reasonably confident with multipath, but not LVM. Plus the fact the the LVM service apparently is waiting for something sort of points into that direction. So please do disable the lvm service. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 3.15 on POWER8 with multipath SCSI
On 06/30/2014 12:30 PM, Paul Mackerras wrote: I have a machine on which 3.15 usually fails to boot, and 3.14 boots every time. The machine is a POWER8 2-socket server with 20 cores (thus 160 CPUs), 128GB of RAM, and 7 SCSI disks connected via a hardware-RAID-capable adapter which appears as two IPR controllers which are both connected to each disk. I am booting from a disk that has Fedora 20 installed on it. After over two weeks of bisections, I can finally point to the commits that cause the problems. The culprits are: 3e9f1be1 dm mpath: remove process_queued_ios() e8099177 dm mpath: push back requests instead of queueing bcccff93 kobject: don't block for each kobject_uevent The interesting thing is that neither e8099177 nor bcccff93 cause failures on their own, but with both commits in there are failures where the system will fail to find /home on some occasions. With 3e9f1be1 included, the system appears to be prone to a deadlock condition which typically causes the boot process to hang with this message showing: A start job is running for Monitoring of LVM2 mirror...rogress polling (with a [*** ] thing before it where the asterisks move back and forth). If I revert 63d832c3 ("dm mpath: really fix lockdep warning") , 4cdd2ad7 ("dm mpath: fix lock order inconsistency in multipath_ioctl"), 3e9f1be1 and bcccff93, in that order, I get a kernel that will boot every time. The first two are later commits that fix some problems with 3e9f1be1 (though not the problems I am seeing). Can anyone see any reason why e8099177 and bcccff93 would interfere with each other? It might be running afoul with the 'cookie' mechanism. Device-mapper is using inserting a 'cookie' with the ioctl, and listens to any event containing the cookie to ensure udev has finished processing that device and hence the device node is accessible. Added to this is the problem that we don't have any good means of detecting any changes to device-mapper devices. EG look at this sequence of events: add dm-1 remove dm-1 add dm-1 Originally udev would pick up the event, read the details from sysfs, and return control to the kernel. With bcccff93 udev will _not_have a chance to read the details from sysfs for 'dm-1', as anything read from sysfs relating to 'dm-1' might infact refer to the _second_ 'add' event, which might be a totally different device. As far as I know udev doesn't have any mechanism to drop events, so it'll always process all events. Assuming that the sysfs attributes it reads _do_ relate to that event. If they don't things become interesting ... (Actually, this issue was always present, especially with multipathing. multipath occasionally can become sluggish when processing events, so the same might happen with it. We've tried to work around this, but never found a fool-proof way of doing so). Adding Kay as he might have some more insight here. Another thing: Do you run LVM on top of multipathing? If so, could you setup your system with _not_ using LVM and disabling the LVM service? Reasoning here is that multipath should not be that susceptible to changes here than LVM2 is (don't nail me on this, I not _that_ into LVM2 details). And as the system is stuck while waiting for LVM it might indeed be an side-effect when running LVM on top of multipathing. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev