Re: [dm-devel] RFC: one more time: SCSI device identification

2021-04-30 Thread Ewan D. Milne
On Wed, 2021-04-28 at 10:09 +1000, Erwin van Londen wrote:
> 
> On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:
> > On Tue, 2021-04-27 at 20:33 +, Martin Wilck wrote:
> > > On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > > > There's no way to do that, in principle.  Because there could
> > > > be
> > > > other I/Os in flight.  You might (somehow) avoid retrying an
> > > > I/O
> > > > that got a UA until you figured out if something changed, but
> > > > other
> > > > I/Os can already have been sent to the target, or issued before
> > > > you
> > > > get to look at the status.
> 
> If something happens on a storage side where a lun gets it's
> attributes changed (any, doesn't matter which one) a UA should be
> sent. Also all outstanding IO's on that lun should be returning an
> Abort as it can no longer warrant the validity of any IO due to these
> changes. Especially when parameters are involved like reservations
> (PR's) etc. If that does not happen from an array side all bets are
> off as the only way to be able to get back in business is to start
> from scratch.

Perhaps an array might abort I/Os it has received in the Device Server
whensomething changes.  I have no idea if most or any arrays actually
do that.
But, what about I/O that has already been queued from the host to
thehost bus adapter?  I don't see how we can abort those I/Os
properly.Most high-performance HBAs have a queue of commands and a
queueof responses, there could be lots of commands queued before
wemanage to notice an interesting status.  And AFAIK there is no
conditionalmechanism that could hold them off (and, they could be in-
flight on thewire anyway).
I get what you are saying about what SAM describes, I just don't see
howwe can guarantee we don't send any further commands after the
statuswith the UA is sent back, before we can understand what happened.
-Ewan
> > > 
> > > Right. But in practice, a WWID change will hardly happen under
> > > full
> > > IO
> > > load. The storage side will probably have to block IO while this
> > > happens, at least for a short time period. So blocking and
> > > quiescing
> > > the queue upon an UA might still work, most of the time. Even if
> > > we
> > > were too late already, the sooner we stop the queue, the better.
> 
> I think in most cases when something happens on an array side you
> will see IO's being aborted. That might be a good time to start doing
> TUR's and if these come back OK do a new inquiry. From a host side
> there is only so much you can do.
> 
> > > The current algorithm in multipath-tools needs to detect a path
> > > going
> > > down and being reinstated. The time interval during which a WWID
> > > change
> > > will go unnoticed is one or more path checker intervals,
> > > typically on
> > > the order of 5-30 seconds. If we could decrease this interval to
> > > a
> > > sub-
> > > second or even millisecond range by blocking the queue in the
> > > kernel
> > > quickly, we'd have made a big step forward.
> > 
> > Yes, and in many situations this may help.  But in the general case
> > we can't protect against a storage array misconfiguration,
> > where something like this can happen.  So I worry about people
> > believing the host software will protect them against a mistake,
> > when we can't really do that.
> 
> My thought exactly. 
> 
> > All it takes is one I/O (a discard) to make a thorough mess of the
> > LUN.
> > 
> > -Ewan
> > 
> > > Regards
> > > Martin
> > > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://listman.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> --dm-devel mailing listdm-de...@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [RFC PATCH v2 1/2] scsi: convert scsi_result_to_blk_status() to inline

2021-04-30 Thread Bart Van Assche
On 4/29/21 1:33 PM, Martin Wilck wrote:
> On Thu, 2021-04-29 at 09:20 -0700, Bart Van Assche wrote:
>> On 4/29/21 8:50 AM, mwi...@suse.com wrote:
>>> This makes it possible to use scsi_result_to_blk_status() from
>>> code that shouldn't depend on scsi_mod (e.g. device mapper).
>>
>> I think that's the wrong reason to make a function inline. Please
>> consider moving scsi_result_to_blk_status() into one of the block
>> layer
>> source code files that already deals with SG I/O, e.g.
>> block/scsi_ioctl.c.
> 
> scsi_ioctl.c, are you certain? scsi_result_to_blk_status() is an
> important part of the block/scsi interface... You're right that that
> this function is not a prime candidate for inlining, but I'm still
> wondering where it belongs if we don't.

The block/scsi_ioctl.c file is included in the kernel if and only if
BLK_SCSI_REQUEST is enabled. And that Kconfig symbol only selects the
block/scsi_ioctl.c file. Additionally, the following occurs in the SCSI
Kconfig file:

config SCSI
tristate "SCSI device support"
[ ... ]
select BLK_SCSI_REQUEST

In other words, block/scsi_ioctl.c is built unconditionally if SCSI is
enabled. Adding the scsi_result_to_blk_status() function to the
block/scsi_ioctl.c file would increase the size of kernels that enable
bsg, ide, the SCSI target code or nfsd but not the SCSI initiator code.
If the latter is a concern, how about moving scsi_result_to_blk_status()
into a new file in the block directory?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath

2021-04-30 Thread Mike Snitzer
On Thu, Apr 29 2021 at  2:02am -0400,
Hannes Reinecke  wrote:

> On 4/28/21 9:54 PM, Mike Snitzer wrote:
> >On Thu, Apr 22 2021 at  4:21P -0400,
> >mwi...@suse.com  wrote:
> >
> >>From: Martin Wilck 
> >>
> >>In virtual deployments, SCSI passthrough over dm-multipath devices is a
> >>common setup. The qemu "pr-helper" was specifically invented for it. I
> >>believe that this is the most important real-world scenario for sending
> >>SG_IO ioctls to device-mapper devices.
> >>
> >>In this configuration, guests send SCSI IO to the hypervisor in the form of
> >>SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> >>ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> >>switch paths in dm_blk_ioctl() code path"), no path switching was done at
> >>all. Worse though, if an SG_IO call fails because of a path error,
> >>dm-multipath doesn't retry the IO on a another path; rather, the failure is
> >>passed back to the guest, and paths are not marked as faulty.  This is in
> >>stark contrast with regular block IO of guests on dm-multipath devices, and
> >>certainly comes as a surprise to users who switch to SCSI passthrough in
> >>qemu. In general, users of dm-multipath devices would probably expect 
> >>failover
> >>to work at least in a basic way.
> >>
> >>This patch fixes this by taking a special code path for SG_IO on request-
> >>based device mapper targets. Rather then just choosing a single path,
> >>sending the IO to it, and failing to the caller if the IO on the path
> >>failed, it retries the same IO on another path for certain error codes,
> >>using the same logic as blk_path_error() to determine if a retry would make
> >>sense for the given error code. Moreover, it sends a message to the
> >>multipath target to mark the path as failed.
> >>
> >>I am aware that this looks sort of hack-ish. I'm submitting it here as an
> >>RFC, asking for guidance how to reach a clean implementation that would be
> >>acceptable in mainline. I believe that it fixes an important problem.
> >>Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
> >>which is non-functional without it.
> >>
> >>One problem remains open: if all paths in a multipath map are failed,
> >>normal multipath IO may switch to queueing mode (depending on
> >>configuration). This isn't possible for SG_IO, as SG_IO requests can't
> >>easily be queued like regular block I/O. Thus in the "no path" case, the
> >>guest will still see an error. If anybody can conceive of a solution for
> >>that, I'd be interested.
> >>
> >>Signed-off-by: Martin Wilck 
> >>---
> >>  block/scsi_ioctl.c |   5 +-
> >>  drivers/md/dm.c| 134 -
> >>  include/linux/blkdev.h |   2 +
> >>  3 files changed, 137 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> >>index 6599bac0a78c..bcc60552f7b1 100644
> >>--- a/block/scsi_ioctl.c
> >>+++ b/block/scsi_ioctl.c
> >>@@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, 
> >>struct sg_io_hdr *hdr,
> >>return ret;
> >>  }
> >>-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>-   struct sg_io_hdr *hdr, fmode_t mode)
> >>+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>+ struct sg_io_hdr *hdr, fmode_t mode)
> >>  {
> >>unsigned long start_time;
> >>ssize_t ret = 0;
> >>@@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct 
> >>gendisk *bd_disk,
> >>blk_put_request(rq);
> >>return ret;
> >>  }
> >>+EXPORT_SYMBOL_GPL(sg_io);
> >>  /**
> >>   * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> >>diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >>index 50b693d776d6..443ac5e5406c 100644
> >>--- a/drivers/md/dm.c
> >>+++ b/drivers/md/dm.c
> >>@@ -29,6 +29,8 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >>+#include 
> >>+#include 
> >>  #define DM_MSG_PREFIX "core"
> >
> >Ngh... not loving this at all.  But here is a patch (that I didn't
> >compile test) that should be folded in, still have to think about how
> >this could be made cleaner in general though.
> >
> >Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
> >
> I fully share your sentiments; this just feels so awkward, having to
> redo the logic in scsi_cmd_ioctl().
> 
> My original idea to that was to _use_ scsi_cmd_ioctl() directly from
> dm_blk_ioctl:
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 50b693d776d6..007ff981f888 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -567,6 +567,9 @@ static int dm_blk_ioctl(struct block_device
> *bdev, fmode_t mode,
> struct mapped_device *md = bdev->bd_disk->private_data;
> int r, srcu_idx;
> 
> +   if (cmd == SG_IO)
> +   return scsi_cmd_blk_ioctl(bdev, mode, cmd, arg);
> +
> r = dm_prepare_ioctl(md, _idx, );
> if (r < 0)
> goto out;
> 
> 
> But that 

[dm-devel] [git pull] device mapper changes for 5.13

2021-04-30 Thread Mike Snitzer
Hi Linus,

Some changes were slightly late and their inclusion made later because
I had a worse-than-expected reaction to my 2nd covid shot the past
couple days. Nothing crazy here but wanted to explain why some commits
are recent.

The following changes since commit 4edbe1d7bcffcd6269f3b5eb63f710393ff2ec7a:

  dm ioctl: fix out of bounds array access when no devices (2021-03-26 14:51:50 
-0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
tags/for-5.13/dm-changes

for you to fetch changes up to ca4a4e9a55beeb138bb06e3867f5e486da896d44:

  dm raid: remove unnecessary discard limits for raid0 and raid10 (2021-04-30 
14:38:37 -0400)

Please pull, thanks.
Mike


- Improve scalability of DM's device hash by switching to rbtree

- Extend DM ioctl's DM_LIST_DEVICES_CMD handling to include UUID and
  allow filtering based on name or UUID prefix.

- Various small fixes for typos, warnings, unused function, or
  needlessly exported interfaces.

- Remove needless request_queue NULL pointer checks in DM thin and
  cache targets.

- Remove unnecessary loop in DM core's __split_and_process_bio().

- Remove DM core's dm_vcalloc() and just use kvcalloc or
  kvmalloc_array instead (depending whether zeroing is useful).

- Fix request-based DM's double free of blk_mq_tag_set in device
  remove after table load fails.

- Improve DM persistent data performance on non-x86 by fixing packed
  structs to have a stated alignment. Also remove needless extra work
  from redundant calls to sm_disk_get_nr_free() and a paranoid BUG_ON()
  that caused duplicate checksum calculation.

- Fix missing goto in DM integrity's bitmap_flush_interval error
  handling.

- Add "reset_recalculate" feature flag to DM integrity.

- Improve DM integrity by leveraging discard support to avoid needless
  re-writing of metadata and also use discard support to improve
  hash recalculation.

- Fix race with DM raid target's reshape and MD raid4/5/6 resync that
  resulted in inconsistant reshape state during table reloads.

- Update DM raid target to temove unnecessary discard limits for raid0
  and raid10 now that MD has optimized discard handling for both raid
  levels.


Benjamin Block (1):
  dm rq: fix double free of blk_mq_tag_set in dev remove after table load 
fails

Bhaskar Chowdhury (1):
  dm ebs: fix a few typos

Christoph Hellwig (1):
  dm: unexport dm_{get,put}_table_device

Gustavo A. R. Silva (1):
  dm raid: fix fall-through warning in rs_check_takeover() for Clang

Heinz Mauelshagen (1):
  dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload 
sequences

JeongHyeon Lee (1):
  dm verity: allow only one error handling mode

Jiapeng Chong (2):
  dm persistent data: remove unused return from exit_shadow_spine()
  dm clone metadata: remove unused function

Joe Thornber (4):
  dm space map disk: remove redundant calls to sm_disk_get_nr_free()
  dm btree spine: remove paranoid node_check call in node_prep_for_write()
  dm persistent data: packed struct should have an aligned() attribute too
  dm space map common: fix division bug in sm_ll_find_free_block()

Julia Lawall (1):
  dm writecache: fix flexible_array.cocci warnings

Matthew Wilcox (Oracle) (1):
  dm: replace dm_vcalloc()

Mike Snitzer (1):
  dm raid: remove unnecessary discard limits for raid0 and raid10

Mikulas Patocka (8):
  dm: remove useless loop in __split_and_process_bio
  dm ioctl: replace device hash with red-black tree
  dm ioctl: return UUID in DM_LIST_DEVICES_CMD result
  dm ioctl: filter the returned values according to name or uuid prefix
  dm integrity: add the "reset_recalculate" feature flag
  dm integrity: don't re-write metadata if discarding same blocks
  dm integrity: increase RECALC_SECTORS to improve recalculate speed
  dm integrity: use discard support when recalculating

Tian Tao (1):
  dm integrity: fix missing goto in bitmap_flush_interval error handling

Xu Wang (2):
  dm thin: remove needless request_queue NULL pointer check
  dm cache: remove needless request_queue NULL pointer checks

 drivers/md/dm-cache-target.c |   2 +-
 drivers/md/dm-clone-metadata.c   |   6 -
 drivers/md/dm-ebs-target.c   |   6 +-
 drivers/md/dm-integrity.c|  85 ---
 drivers/md/dm-ioctl.c| 294 ++-
 drivers/md/dm-raid.c |  44 ++--
 drivers/md/dm-rq.c   |   2 +
 drivers/md/dm-snap-persistent.c  |   6 +-
 drivers/md/dm-snap.c |   5 +-
 drivers/md/dm-table.c|  30 +--
 drivers/md/dm-thin.c | 

[dm-devel] Raid1 and Raid10 Discard Limit Fixups for 5.13 Merge Window

2021-04-30 Thread Matthew Ruffell
Hi Mike,

The Raid10 block discard performance patchset from Xiao Ni has reached mainline
this morning in the following commits:

254c271da071 md/raid10: improve discard request for far layout
d30588b2731f md/raid10: improve raid10 discard request
f2e7e269a752 md/raid10: pull the code that wait for blocked dev into one 
function
c2968285925a md/raid10: extend r10bio devs to raid disks
cf78408f937a md: add md_submit_discard_bio() for submitting discard bio

I was wondering, are you planning to resubmit your patches to fixup discard
limits for raid1 and raid10 this merge cycle?

commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
Author: Mike Snitzer 
Date: Thu Sep 24 13:14:52 2020 -0400
Subject: dm raid: fix discard limits for raid1 and raid10
Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
Author: Mike Snitzer 
Date: Thu Sep 24 16:40:12 2020 -0400
Subject: dm raid: remove unnecessary discard limits for raid10
Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28

Thanks,
Matthew

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [LKP] Re: [ext4] 21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail

2021-04-30 Thread Rong Chen



On 4/28/21 10:03 PM, Theodore Ts'o wrote:

(Hmm, why did you cc linux-km on this report?  I would have thought
dm-devel would have made more sense?)

On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote:

FYI, we noticed the following commit (built with gcc-9):

commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps 
default")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5d518de-1_20201008
with following parameters:

disk: 1HDD
test_prefix: 01r1
ucode: 0x21

So this failure makes no sense to me.  Looking at the kmesg failure
logs, it's failing in the md layer:

kern  :info  : [   99.775514] md/raid1:md0: not clean -- starting background 
reconstruction
kern  :info  : [   99.783372] md/raid1:md0: active with 3 out of 4 mirrors
kern  :info  : [   99.789735] md0: detected capacity change from 0 to 37888
kern  :info  : [   99.796216] md: resync of RAID array md0
kern  :crit  : [   99.900450] md/raid1:md0: Disk failure on loop2, disabling 
device.
   md/raid1:md0: Operation continuing on 2 devices.
kern  :crit  : [   99.918281] md/raid1:md0: Disk failure on loop1, disabling 
device.
   md/raid1:md0: Operation continuing on 1 devices.
kern  :info  : [  100.835833] md: md0: resync interrupted.
kern  :info  : [  101.852898] md: resync of RAID array md0
kern  :info  : [  101.858347] md: md0: resync done.
user  :notice: [  102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... 
FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details

The referenced commit just turns block bitmap prefetching in ext4.
This should not cause md to failure; if so, that's an md bug, not an
ext4 bug.  There should not be anything that the file system is doing
that would cause the kernel to think there is a disk failure.

By the way, the reproduction instructions aren't working currently:


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp installjob.yaml  # job file is attached in 
this email

This fails because lkp is trying to apply a patch which does not apply
with the current version of the md tools.


Hi Ted,

Thanks for the feedback, yes, there's patch already be merged into mdadm,
we have removed it from our code.




 bin/lkp split-job --compatible job.yaml
 bin/lkp runcompatible-job.yaml

And the current versions lkp don't generate a compatible-job.yaml file
when you run "lkp split-job --compatable"; instead it generates a new
yaml file with a set of random characters to generate a unique name.
(What Multics parlance would be called a "shriek name"[1] :-)


We have updated the steps to avoid misunderstanding.



Since I was having trouble running the reproduction; could you send
the /var/tmp/*fail.logs so we could have a bit more insight what is
going on?


I attached the log file for your reference,
btw the test is from 
https://github.com/neilbrown/mdadm/blob/master/tests/01r1fail,

you may want to run it directly.

Best Regards,
Rong Chen



Thanks!

- Ted
___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org


+ . /lkp/benchmarks/mdadm-selftests/tests/01r1fail
++ mdadm -CR /dev/md0 -l1 -n4 /dev/loop0 /dev/loop1 /dev/loop2 missing
++ rm -f /var/tmp/stderr
++ case $* in
++ case $* in
++ for args in $*
++ [[ -CR =~ /dev/ ]]
++ for args in $*
++ [[ /dev/md0 =~ /dev/ ]]
++ [[ /dev/md0 =~ md ]]
++ for args in $*
++ [[ -l1 =~ /dev/ ]]
++ for args in $*
++ [[ -n4 =~ /dev/ ]]
++ for args in $*
++ [[ /dev/loop0 =~ /dev/ ]]
++ [[ /dev/loop0 =~ md ]]
++ /lkp/benchmarks/mdadm-selftests/mdadm --zero /dev/loop0
mdadm: Unrecognised md component device - /dev/loop0
++ for args in $*
++ [[ /dev/loop1 =~ /dev/ ]]
++ [[ /dev/loop1 =~ md ]]
++ /lkp/benchmarks/mdadm-selftests/mdadm --zero /dev/loop1
mdadm: Unrecognised md component device - /dev/loop1
++ for args in $*
++ [[ /dev/loop2 =~ /dev/ ]]
++ [[ /dev/loop2 =~ md ]]
++ /lkp/benchmarks/mdadm-selftests/mdadm --zero /dev/loop2
mdadm: Unrecognised md component device - /dev/loop2
++ for args in $*
++ [[ missing =~ /dev/ ]]
++ /lkp/benchmarks/mdadm-selftests/mdadm --quiet -CR /dev/md0 -l1 -n4 
/dev/loop0 /dev/loop1 /dev/loop2 missing --auto=yes
++ rv=0
++ case $* in
++ cat /var/tmp/stderr
++ return 0
++ check resync
++ case $1 in
++ cnt=5
++ grep -sq resync /proc/mdstat
++ mdadm /dev/md0 --fail /dev/loop2
++ rm -f /var/tmp/stderr
++ case $* in
++ case $* in
++ /lkp/benchmarks/mdadm-selftests/mdadm --quiet /dev/md0 --fail /dev/loop2
++ rv=0
++ case $* in
++ cat /var/tmp/stderr
++ return 0
++ check resync
++ case $1 in
++ cnt=5
++ grep -sq resync /proc/mdstat
++ mdadm /dev/md0 --fail /dev/loop1
++ rm -f /var/tmp/stderr
++