Re: [dm-devel] Question about iscsi session block
On 2/15/22 7:28 PM, Zhengyuan Liu wrote: > On Wed, Feb 16, 2022 at 12:31 AM Mike Christie > wrote: >> >> On 2/15/22 9:49 AM, Zhengyuan Liu wrote: >>> Hi, all >>> >>> We have an online server which uses multipath + iscsi to attach storage >>> from Storage Server. There are two NICs on the server and for each it >>> carries about 20 iscsi sessions and for each session it includes about 50 >>> iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block >>> devices >>> on the server). The problem is: once a NIC gets faulted, it will take too >>> long >>> (nearly 80s) for multipath to switch to another good NIC link, because it >>> needs to block all iscsi devices over that faulted NIC firstly. The >>> callstack is >>> shown below: >>> >>> void iscsi_block_session(struct iscsi_cls_session *session) >>> { >>> queue_work(iscsi_eh_timer_workq, &session->block_work); >>> } >>> >>> __iscsi_block_session() -> scsi_target_block() -> target_block() -> >>> device_block() -> scsi_internal_device_block() -> scsi_stop_queue() -> >>> blk_mq_quiesce_queue()>synchronize_rcu() >>> >>> For all sessions and all devices, it was processed sequentially, and we have >>> traced that for each synchronize_rcu() call it takes about 80ms, so >>> the total cost >>> is about 80s (80ms * 20 * 50). It's so long that the application can't >>> tolerate and >>> may interrupt service. >>> >>> So my question is that can we optimize the procedure to reduce the time >>> cost on >>> blocking all iscsi devices? I'm not sure if it is a good idea to increase >>> the >>> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency. >> >> We need a patch, so the unblock call waits/cancels/flushes the block call or >> they could be running in parallel. >> >> I'll send a patchset later today so you can test it. > > I'm glad to test once you push the patchset. > > Thank you, Mike. I forgot I did this recently :) commit 7ce9fc5ecde0d8bd64c29baee6c5e3ce7074ec9a Author: Mike Christie Date: Tue May 25 13:18:09 2021 -0500 scsi: iscsi: Flush block work before unblock We set the max_active iSCSI EH works to 1, so all work is going to execute in order by default. However, userspace can now override this in sysfs. If max_active > 1, we can end up with the block_work on CPU1 and iscsi_unblock_session running the unblock_work on CPU2 and the session and target/device state will end up out of sync with each other. This adds a flush of the block_work in iscsi_unblock_session. It was merged in 5.14. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 09/11] block, nvme: Add error for reservation conflicts.
On 6/4/22 2:38 AM, Hannes Reinecke wrote: > On 6/3/22 21:45, Keith Busch wrote: >> On Fri, Jun 03, 2022 at 01:55:34AM -0500, Mike Christie wrote: >>> @@ -171,6 +171,7 @@ static const struct { >>> /* zone device specific errors */ >>> [BLK_STS_ZONE_OPEN_RESOURCE] = { -ETOOMANYREFS, "open zones >>> exceeded" }, >>> [BLK_STS_ZONE_ACTIVE_RESOURCE] = { -EOVERFLOW, "active zones >>> exceeded" }, >>> + [BLK_STS_RSV_CONFLICT] = { -EBADE, "resevation conflict" }, >> >> You misspelled "reservation". :) >> >> And since you want a different error, why reuse EBADE for the errno? That is >> already used for BLK_STS_NEXUS that you're trying to differentiate from, >> right? >> At least for nvme, this error code is returned when the host lacks sufficient >> rights, so something like EACCESS might make sense. >> >> Looks good otherwise. > > Welll ... BLK_STS_NEXUS _is_ the reservation error. I was not sure of xen/virtio scsi uses of BLK_STS_NEXUS/DID_NEXUS_FAILURE. The virtio spec's description for VIRTIO_SCSI_S_NEXUS_FAILURE: if the nexus is suffering a failure but retrying on other paths might yield a different result. looks like the description for DID_NEXUS_FAILURE in scsi_status.h. To me the the description sounded generic where it could used for other errors like the endpoint/port for the I_T is removed. However, the qemu code only uses VIRTIO_SCSI_S_NEXUS_FAILURE for reservation conflicts. If we are saying that is always the case in other virt implementations, I don't even need this patch :) and we can do what you requested and do more of a rename. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 2/3] dm: Start pr_reserve from the same starting path
On 7/14/22 7:34 PM, Mike Christie wrote:> I also ran the libiscsi PGR tests. We pass all of those tests except the > Write Exclusive and Exclusive Access tests. I don't think those reservation > types make sense for multipath devices though. And as I write this I'm > thinking we should add a check for these types and just return failure). Let me tack that last part back. I forgot people use dm-multipath with a single path so they can use the queue_if_no_path/no_path_retry feature and then do path management using some other non-multipathd daemon that does stuff like IP takover uses iscsi's redirect feature. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 2/3] dm: Start pr_reserve from the same starting path
On 7/18/22 12:39 PM, Mike Christie wrote: > On 7/18/22 12:55 AM, Christoph Hellwig wrote: >> On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote: >>> I also ran the libiscsi PGR tests. We pass all of those tests except the >>> Write Exclusive and Exclusive Access tests. I don't think those reservation >>> types make sense for multipath devices though. And as I write this I'm >>> thinking we should add a check for these types and just return failure). >> >> Why would any reservation type make less sense for multipath vs >> non-multipath setups/ > > I think those 2 types only work for really specific use cases in multipath > setups. > ... >> 2) the target actually has a useful concept of the Linux system being >> a single initiator, which outside of a few setups like software >> only iSCSI are rarely true >> Oh yeah, if we are talking about the same type of thing then I've seen a target that has some smarts and treats Write Exclusive and Exclusive Access like a All Reg or Reg Only variants when it detects or is setup for linux and it sees all an initiator's paths registered. So yeah I guess we should not fail those reservations types. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
On 10/27/22 12:06 PM, Mike Christie wrote: > On 10/27/22 10:18 AM, Keith Busch wrote: >> On Wed, Oct 26, 2022 at 06:19:36PM -0500, Mike Christie wrote: >>> For Reservation Report support we need to also convert from the NVMe spec >>> PR type back to the block PR definition. This moves us to an array, so in >>> the next patch we can add another helper to do the conversion without >>> having to manage 2 switches. >>> >>> Signed-off-by: Mike Christie >>> --- >>> drivers/nvme/host/pr.c | 42 +++--- >>> include/linux/nvme.h | 9 + >>> 2 files changed, 32 insertions(+), 19 deletions(-) >>> >>> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c >>> index df7eb2440c67..5c4611d15d9c 100644 >>> --- a/drivers/nvme/host/pr.c >>> +++ b/drivers/nvme/host/pr.c >>> @@ -6,24 +6,28 @@ >>> >>> #include "nvme.h" >>> >>> -static char nvme_pr_type(enum pr_type type) >>> +static const struct { >>> + enum nvme_pr_type nvme_type; >>> + enum pr_typeblk_type; >>> +} nvme_pr_types[] = { >>> + { NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE }, >>> + { NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS }, >>> + { NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY }, >>> + { NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY }, >>> + { NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS }, >>> + { NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS }, >>> +}; >> >> Wouldn't it be easier to use the block type as the array index to avoid >> the whole looped lookup? >> >> enum nvme_pr_type types[] = { >> .PR_WRITE_EXCLUSIVE = NVME_PR_WRITE_EXCLUSIVE, >> .PR_EXCLUSIVE_ACCESS = NVME_PR_EXCLUSIVE_ACCESS, >> ... >> }; > > It would be. However, > > 1. I wasn't sure how future proof we wanted it and I might have > misinterpreted Chaitanya's original review comment. The part of > the comment about handling "every new nvme_type" made me think that > we were worried there would be new types in the future. So I thought > we wanted it to be really generic and be able to handle cases where > the values could be funky like -1 in the future. > > 2. I also need to go from NVME_PR type to PR type, so we need a > second array. So we can either have 2 arrays or 1 array and 2 > loops (the next patch in this set added the second loop). > If we don't care about #1 then I can I see 2 arrays is nicer. Oh wait there was also a 3. The pr_types come from userspace so if it passes us 10 and we just do: types[pr_type] then we would crash due an out of bounds error. Similarly I thought there could be a bad target that does the same thing. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 04/18] scsi: Move sd_pr_type to header to share
On 3/24/23 1:25 PM, Bart Van Assche wrote: > On 3/24/23 11:17, Mike Christie wrote: >> diff --git a/include/scsi/scsi_block_pr.h b/include/scsi/scsi_block_pr.h >> new file mode 100644 >> index ..44766d7a81d8 >> --- /dev/null >> +++ b/include/scsi/scsi_block_pr.h >> @@ -0,0 +1,36 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#ifndef _SCSI_BLOCK_PR_H >> +#define _SCSI_BLOCK_PR_H >> + >> +#include >> + >> +enum scsi_pr_type { >> + SCSI_PR_WRITE_EXCLUSIVE = 0x01, >> + SCSI_PR_EXCLUSIVE_ACCESS = 0x03, >> + SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY = 0x05, >> + SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY = 0x06, >> + SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS = 0x07, >> + SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS = 0x08, >> +}; >> + >> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type) >> +{ >> + switch (type) { >> + case PR_WRITE_EXCLUSIVE: >> + return SCSI_PR_WRITE_EXCLUSIVE; >> + case PR_EXCLUSIVE_ACCESS: >> + return SCSI_PR_EXCLUSIVE_ACCESS; >> + case PR_WRITE_EXCLUSIVE_REG_ONLY: >> + return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY; >> + case PR_EXCLUSIVE_ACCESS_REG_ONLY: >> + return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY; >> + case PR_WRITE_EXCLUSIVE_ALL_REGS: >> + return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS; >> + case PR_EXCLUSIVE_ACCESS_ALL_REGS: >> + return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS; >> + } >> + >> + return 0; >> +} >> + >> +#endif > > Hi Mike, > > Has it been considered to move enum scsi_pr_type into > include/scsi/scsi_common.h and block_pr_type_to_scsi() into > drivers/scsi/scsi_common.c? Other definitions that are shared between SCSI > initiator and SCSI target code exist in these files. Nice. I didn't know that existed. Will move them. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel