Re: [dm-devel] Question about iscsi session block

2022-02-15 Thread michael . christie
On 2/15/22 7:28 PM, Zhengyuan Liu wrote:
> On Wed, Feb 16, 2022 at 12:31 AM Mike Christie
>  wrote:
>>
>> On 2/15/22 9:49 AM, Zhengyuan Liu wrote:
>>> Hi, all
>>>
>>> We have an online server which uses multipath + iscsi to attach storage
>>> from Storage Server. There are two NICs on the server and for each it
>>> carries about 20 iscsi sessions and for each session it includes about 50
>>>  iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block 
>>> devices
>>>  on the server). The problem is: once a NIC gets faulted, it will take too 
>>> long
>>> (nearly 80s) for multipath to switch to another good NIC link, because it
>>> needs to block all iscsi devices over that faulted NIC firstly. The 
>>> callstack is
>>>  shown below:
>>>
>>> void iscsi_block_session(struct iscsi_cls_session *session)
>>> {
>>> queue_work(iscsi_eh_timer_workq, &session->block_work);
>>> }
>>>
>>>  __iscsi_block_session() -> scsi_target_block() -> target_block() ->
>>>   device_block() ->  scsi_internal_device_block() -> scsi_stop_queue() ->
>>>  blk_mq_quiesce_queue()>synchronize_rcu()
>>>
>>> For all sessions and all devices, it was processed sequentially, and we have
>>> traced that for each synchronize_rcu() call it takes about 80ms, so
>>> the total cost
>>> is about 80s (80ms * 20 * 50). It's so long that the application can't
>>> tolerate and
>>> may interrupt service.
>>>
>>> So my question is that can we optimize the procedure to reduce the time 
>>> cost on
>>> blocking all iscsi devices?  I'm not sure if it is a good idea to increase 
>>> the
>>> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.
>>
>> We need a patch, so the unblock call waits/cancels/flushes the block call or
>> they could be running in parallel.
>>
>> I'll send a patchset later today so you can test it.
> 
> I'm glad to test once you push the patchset.
> 
> Thank you, Mike.

I forgot I did this recently :)

commit 7ce9fc5ecde0d8bd64c29baee6c5e3ce7074ec9a
Author: Mike Christie 
Date:   Tue May 25 13:18:09 2021 -0500

scsi: iscsi: Flush block work before unblock

We set the max_active iSCSI EH works to 1, so all work is going to execute
in order by default. However, userspace can now override this in sysfs. If
max_active > 1, we can end up with the block_work on CPU1 and
iscsi_unblock_session running the unblock_work on CPU2 and the session and
target/device state will end up out of sync with each other.

This adds a flush of the block_work in iscsi_unblock_session.


It was merged in 5.14.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.

2022-06-05 Thread michael . christie
On 6/4/22 2:38 AM, Hannes Reinecke wrote:
> On 6/3/22 21:45, Keith Busch wrote:
>> On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote:
>>> @@ -171,6 +171,7 @@ static const struct {
>>>   /* zone device specific errors */
>>>   [BLK_STS_ZONE_OPEN_RESOURCE]    = { -ETOOMANYREFS, "open zones 
>>> exceeded" },
>>>   [BLK_STS_ZONE_ACTIVE_RESOURCE]    = { -EOVERFLOW, "active zones 
>>> exceeded" },
>>> +    [BLK_STS_RSV_CONFLICT]    = { -EBADE,    "resevation conflict" },
>>
>> You misspelled "reservation". :)
>>
>> And since you want a different error, why reuse EBADE for the errno? That is
>> already used for BLK_STS_NEXUS that you're trying to differentiate from, 
>> right?
>> At least for nvme, this error code is returned when the host lacks sufficient
>> rights, so something like EACCESS might make sense.
>>
>> Looks good otherwise.
> 
> Welll ... BLK_STS_NEXUS _is_ the reservation error.

I was not sure of xen/virtio scsi uses of BLK_STS_NEXUS/DID_NEXUS_FAILURE.
The virtio spec's description for VIRTIO_SCSI_S_NEXUS_FAILURE:

if the nexus is suffering a failure but retrying on other paths might
yield a different result. 

looks like the description for DID_NEXUS_FAILURE in scsi_status.h.
To me the the description sounded generic where it could used for
other errors like the endpoint/port for the I_T is removed.

However, the qemu code only uses VIRTIO_SCSI_S_NEXUS_FAILURE for
reservation conflicts. If we are saying that is always the case in
other virt implementations, I don't even need this patch :) and we
can do what you requested and do more of a rename.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 2/3] dm: Start pr_reserve from the same starting path

2022-07-15 Thread michael . christie
On 7/14/22 7:34 PM, Mike Christie wrote:> I also ran the libiscsi PGR tests. We 
pass all of those tests except the
> Write Exclusive and Exclusive Access tests. I don't think those reservation
> types make sense for multipath devices though. And as I write this I'm
> thinking we should add a check for these types and just return failure).

Let me tack that last part back.

I forgot people use dm-multipath with a single path so they can use the
queue_if_no_path/no_path_retry feature and then do path management using
some other non-multipathd daemon that does stuff like IP takover uses iscsi's
redirect feature.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] dm: Start pr_reserve from the same starting path

2022-07-18 Thread michael . christie
On 7/18/22 12:39 PM, Mike Christie wrote:
> On 7/18/22 12:55 AM, Christoph Hellwig wrote:
>> On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote:
>>> I also ran the libiscsi PGR tests. We pass all of those tests except the
>>> Write Exclusive and Exclusive Access tests. I don't think those reservation
>>> types make sense for multipath devices though. And as I write this I'm
>>> thinking we should add a check for these types and just return failure).
>>
>> Why would any reservation type make less sense for multipath vs
>> non-multipath setups/
> 
> I think those 2 types only work for really specific use cases in multipath
> setups.
> 

...

>>  2) the target actually has a useful concept of the Linux system being
>> a single initiator, which outside of a few setups like software
>> only iSCSI are rarely true
>>
Oh yeah, if we are talking about the same type of thing then I've seen a
target that has some smarts and treats Write Exclusive and Exclusive Access like
a All Reg or Reg Only variants when it detects or is setup for linux and it sees
all an initiator's paths registered.

So yeah I guess we should not fail those reservations types.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array

2022-10-30 Thread michael . christie
On 10/27/22 12:06 PM, Mike Christie wrote:
> On 10/27/22 10:18 AM, Keith Busch wrote:
>> On Wed, Oct 26, 2022 at 06:19:36PM -0500, Mike Christie wrote:
>>> For Reservation Report support we need to also convert from the NVMe spec
>>> PR type back to the block PR definition. This moves us to an array, so in
>>> the next patch we can add another helper to do the conversion without
>>> having to manage 2 switches.
>>>
>>> Signed-off-by: Mike Christie 
>>> ---
>>>  drivers/nvme/host/pr.c | 42 +++---
>>>  include/linux/nvme.h   |  9 +
>>>  2 files changed, 32 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
>>> index df7eb2440c67..5c4611d15d9c 100644
>>> --- a/drivers/nvme/host/pr.c
>>> +++ b/drivers/nvme/host/pr.c
>>> @@ -6,24 +6,28 @@
>>>  
>>>  #include "nvme.h"
>>>  
>>> -static char nvme_pr_type(enum pr_type type)
>>> +static const struct {
>>> +   enum nvme_pr_type   nvme_type;
>>> +   enum pr_typeblk_type;
>>> +} nvme_pr_types[] = {
>>> +   { NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE },
>>> +   { NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS },
>>> +   { NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY },
>>> +   { NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY },
>>> +   { NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS },
>>> +   { NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS },
>>> +};
>>
>> Wouldn't it be easier to use the block type as the array index to avoid
>> the whole looped lookup?
>>
>>   enum nvme_pr_type types[] = {
>>  .PR_WRITE_EXCLUSIVE = NVME_PR_WRITE_EXCLUSIVE,
>>  .PR_EXCLUSIVE_ACCESS  = NVME_PR_EXCLUSIVE_ACCESS,
>> ...
>>   };
> 
> It would be. However,
> 
> 1. I wasn't sure how future proof we wanted it and I might have
> misinterpreted Chaitanya's original review comment. The part of
> the comment about handling "every new nvme_type" made me think that
> we were worried there would be new types in the future. So I thought
> we wanted it to be really generic and be able to handle cases where
> the values could be funky like -1 in the future.
> 
> 2. I also need to go from NVME_PR type to PR type, so we need a
> second array. So we can either have 2 arrays or 1 array and 2
> loops (the next patch in this set added the second loop).
> If we don't care about #1 then I can I see 2 arrays is nicer.

Oh wait there was also a

3. The pr_types come from userspace so if it passes us 10
and we just do:

types[pr_type]

then we would crash due an out of bounds error.

Similarly I thought there could be a bad target that does the
same thing.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 04/18] scsi: Move sd_pr_type to header to share

2023-03-26 Thread michael . christie
On 3/24/23 1:25 PM, Bart Van Assche wrote:
> On 3/24/23 11:17, Mike Christie wrote:
>> diff --git a/include/scsi/scsi_block_pr.h b/include/scsi/scsi_block_pr.h
>> new file mode 100644
>> index ..44766d7a81d8
>> --- /dev/null
>> +++ b/include/scsi/scsi_block_pr.h
>> @@ -0,0 +1,36 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _SCSI_BLOCK_PR_H
>> +#define _SCSI_BLOCK_PR_H
>> +
>> +#include 
>> +
>> +enum scsi_pr_type {
>> +    SCSI_PR_WRITE_EXCLUSIVE    = 0x01,
>> +    SCSI_PR_EXCLUSIVE_ACCESS    = 0x03,
>> +    SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY    = 0x05,
>> +    SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY    = 0x06,
>> +    SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS    = 0x07,
>> +    SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS    = 0x08,
>> +};
>> +
>> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
>> +{
>> +    switch (type) {
>> +    case PR_WRITE_EXCLUSIVE:
>> +    return SCSI_PR_WRITE_EXCLUSIVE;
>> +    case PR_EXCLUSIVE_ACCESS:
>> +    return SCSI_PR_EXCLUSIVE_ACCESS;
>> +    case PR_WRITE_EXCLUSIVE_REG_ONLY:
>> +    return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
>> +    case PR_EXCLUSIVE_ACCESS_REG_ONLY:
>> +    return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
>> +    case PR_WRITE_EXCLUSIVE_ALL_REGS:
>> +    return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
>> +    case PR_EXCLUSIVE_ACCESS_ALL_REGS:
>> +    return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +#endif
> 
> Hi Mike,
> 
> Has it been considered to move enum scsi_pr_type into 
> include/scsi/scsi_common.h and block_pr_type_to_scsi() into 
> drivers/scsi/scsi_common.c? Other definitions that are shared between SCSI 
> initiator and SCSI target code exist in these files.

Nice. I didn't know that existed. Will move them.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel