RE: [PATCH] qib_keys: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
Subject: [PATCH] qib_keys: Replace rcu_assign_pointer() with RCU_INIT_POINTER() I would prefer the summary be: IB/qib: qib_remove_lkey() Replace rcu_assign_pointer() with RCU_INIT_POINTER() Otherwise the patch looks ok and has been tested. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] qib: qib_qp: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
Subject: [PATCH] qib: qib_qp: Replace rcu_assign_pointer() with RCU_INIT_POINTER() Why not consolidate this with http://marc.info/?l=linux-rdmam=140836578119485w=2 so there is just one patch? Mike -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/8] IB/srp: Add multichannel support
Hello, Although the SRP protocol supports multichannel operation, although since considerable time RDMA HCA's are available that support multiple completion vectors and although multichannel operation yields better performance than using a single channel, the Linux SRP initiator does not yet support multichannel operation. While adding multichannel support in the SRP initiator I encountered a few challenges of which I think these need wider discussion. The topics I would like invite wider discussion about are as follows: - How to avoid unneeded inter-socket cache traffic. Should the blk-mq layer e.g. assign CPU cores to hardware queues such that all CPU cores associated with a single hardware queue reside on the same CPU socket? (see also patch 1/8) - How to pass the hardware context selected by the block layer to the SCSI LLD queuecommand() callback function ? (see also patches 2/8 and 3/8). - Which approach should a SCSI LLD follow for selection of an MSI-X completion vector to ensure that the interrupt handler is invoked on the same CPU socket as the blk-mq hardware context data structures ? As one can see patch 8/8 relies on the assumption that completion vectors have been spread evenly over CPU sockets. If a HCA e.g. supports eight completion vectors then that means that in a system with two CPU sockets vectors 0-3 are associated with a CPU core on the first CPU socket and vectors 4-7 with a CPU core on the second CPU socket. The patches in this series are: 0001-blk-mq-Use-all-available-hardware-queues.patch 0002-scsi-mq-Add-support-for-multiple-hardware-queues.patch 0003-scsi-mq-Pass-hctx-to-low-level-SCSI-drivers.patch 0004-IB-srp-Move-ib_destroy_cm_id-call-into-srp_free_ch_i.patch 0005-IB-srp-Remove-stale-connection-retry-mechanism.patch 0006-IB-srp-Avoid-that-I-O-hangs-due-to-a-cable-pull-duri.patch 0007-IB-srp-Separate-target-and-channel-variables.patch 0008-IB-srp-Add-multichannel-support.patch Note: a predecessor of this patch series has been used to measure the performance of Christoph's scsi-mq patches that have been merged in kernel version v3.17-rc1. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
[PATCH 1/8] blk-mq: Use all available hardware queues Suppose that a system has two CPU sockets, three cores per socket, that it does not support hyperthreading and that four hardware queues are provided by a block driver. With the current algorithm this will lead to the following assignment of CPU cores to hardware queues: HWQ 0: 0 1 HWQ 1: 2 3 HWQ 2: 4 5 HWQ 3: (none) This patch changes the queue assignment into: HWQ 0: 0 1 HWQ 1: 2 HWQ 2: 3 4 HWQ 3: 5 In other words, this patch has the following three effects: - All four hardware queues are used instead of only three. - CPU cores are spread more evenly over hardware queues. For the above example the range of the number of CPU cores associated with a single HWQ is reduced from [0..2] to [1..2]. - If the number of HWQ's is a multiple of the number of CPU sockets it is now guaranteed that all CPU cores associated with a single HWQ reside on the same CPU socket. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Jens Axboe ax...@fb.com Cc: Christoph Hellwig h...@lst.de Cc: Ming Lei ming@canonical.com --- block/blk-mq-cpumap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 1065d7c..8e56455 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -17,7 +17,7 @@ static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues, const int cpu) { - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues); + return cpu * nr_queues / nr_cpus; } static int get_first_sibling(unsigned int cpu) -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] IB/srp: Remove stale connection retry mechanism
Attempting to connect three times may be insufficient after an initiator system that was using multiple RDMA channels tries to relogin. Additionally, this login retry mechanism is a workaround for particular behavior of the IB/CM. Since the srp_daemon retries a failed login attempt anyway, remove the stale connection retry mechanism. Signed-off-by: Bart Van Assche bvanass...@acm.org --- drivers/infiniband/ulp/srp/ib_srp.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index d3c712f..9608e7a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport) static int srp_connect_target(struct srp_target_port *target) { - int retries = 3; int ret; WARN_ON_ONCE(target-connected); @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port *target) break; case SRP_STALE_CONN: - /* Our current CM id was stale, and is now in timewait. -* Try to reconnect with a new one. -*/ - if (!retries-- || srp_new_cm_id(target)) { - shost_printk(KERN_ERR, target-scsi_host, PFX -giving up on stale connection\n); - target-status = -ECONNRESET; - return target-status; - } - shost_printk(KERN_ERR, target-scsi_host, PFX -retrying stale connection\n); - break; +giving up on stale connection\n); + target-status = -ECONNRESET; + return target-status; default: return target-status; -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
If a cable is pulled during LUN scanning it can happen that the SRP rport and the SCSI host have been created but no LUNs have been added to the SCSI host. Since multipathd only sends SCSI commands to a SCSI target if one or more SCSI devices are present and since there is no keepalive mechanism for IB queue pairs this means that after a LUN scan failed and after a reconnect has succeeded no data will be sent over the QP and hence that a subsequent cable pull will not be detected. Avoid this by not creating an rport or SCSI host if a cable is pulled during a SCSI LUN scan. Note: so far the above behavior has only been observed with the kernel module parameter ch_count set to a value = 2. Signed-off-by: Bart Van Assche bvanass...@acm.org --- drivers/infiniband/ulp/srp/ib_srp.c | 56 +++-- drivers/infiniband/ulp/srp/ib_srp.h | 1 + 2 files changed, 48 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 9608e7a..fd88fb8 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -,6 +,10 @@ static int srp_rport_reconnect(struct srp_rport *rport) int i, ret; srp_disconnect_target(target); + + if (target-state == SRP_TARGET_SCANNING) + return -ENODEV; + /* * Now get a new local CM ID so that we avoid confusing the target in * case things are really fouled up. Doing so also ensures that all CM @@ -2607,11 +2611,23 @@ static struct scsi_host_template srp_template = { .shost_attrs= srp_host_attrs }; +static int srp_sdev_count(struct Scsi_Host *host) +{ + struct scsi_device *sdev; + int c = 0; + + shost_for_each_device(sdev, host) + c++; + + return c; +} + static int srp_add_target(struct srp_host *host, struct srp_target_port *target) { struct srp_rport_identifiers ids; struct srp_rport *rport; + target-state = SRP_TARGET_SCANNING; sprintf(target-target_name, SRP.T10:%016llX, (unsigned long long) be64_to_cpu(target-id_ext)); @@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) list_add_tail(target-list, host-target_list); spin_unlock(host-target_lock); - target-state = SRP_TARGET_LIVE; - scsi_scan_target(target-scsi_host-shost_gendev, 0, target-scsi_id, SCAN_WILD_CARD, 0); + if (!target-connected || target-qp_in_error) { + shost_printk(KERN_INFO, target-scsi_host, +PFX SCSI scan failed - removing SCSI host\n); + srp_queue_remove_work(target); + goto out; + } + + pr_debug(PFX %s: SCSI scan succeeded - detected %d LUNs\n, +dev_name(target-scsi_host-shost_gendev), +srp_sdev_count(target-scsi_host)); + + spin_lock_irq(target-lock); + if (target-state == SRP_TARGET_SCANNING) + target-state = SRP_TARGET_LIVE; + spin_unlock_irq(target-lock); + +out: return 0; } @@ -3044,13 +3075,20 @@ static ssize_t srp_create_target(struct device *dev, if (ret) goto err_disconnect; - shost_printk(KERN_DEBUG, target-scsi_host, PFX -new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n, -be64_to_cpu(target-id_ext), -be64_to_cpu(target-ioc_guid), -be16_to_cpu(target-path.pkey), -be64_to_cpu(target-service_id), -target-path.sgid.raw, target-path.dgid.raw); + /* Protects against concurrent srp_remove_target() invocation. */ + scsi_host_get(target-scsi_host); + + if (target-state != SRP_TARGET_REMOVED) { + shost_printk(KERN_DEBUG, target-scsi_host, PFX +new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n, +be64_to_cpu(target-id_ext), +be64_to_cpu(target-ioc_guid), +be16_to_cpu(target-path.pkey), +be64_to_cpu(target-service_id), +target-path.sgid.raw, target-orig_dgid); + } + + scsi_host_put(target-scsi_host); ret = count; diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index e46ecb1..00c7c48 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -73,6 +73,7 @@ enum { }; enum srp_target_state { + SRP_TARGET_SCANNING, SRP_TARGET_LIVE, SRP_TARGET_REMOVED, }; -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to
[PATCH 8/8] IB/srp: Add multichannel support
Improve performance by using multiple RDMA/RC channels per SCSI host for communicating with an SRP target. Signed-off-by: Bart Van Assche bvanass...@acm.org --- Documentation/ABI/stable/sysfs-driver-ib_srp | 25 +- drivers/infiniband/ulp/srp/ib_srp.c | 337 --- drivers/infiniband/ulp/srp/ib_srp.h | 20 +- 3 files changed, 287 insertions(+), 95 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index b9688de..d5a459e 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect to a new target. only safe with partial memory descriptor list support enabled (allow_ext_sg=1). * comp_vector, a number in the range 0..n-1 specifying the - MSI-X completion vector. Some HCA's allocate multiple (n) - MSI-X vectors per HCA port. If the IRQ affinity masks of - these interrupts have been configured such that each MSI-X - interrupt is handled by a different CPU then the comp_vector - parameter can be used to spread the SRP completion workload - over multiple CPU's. + MSI-X completion vector of the first RDMA channel. Some + HCA's allocate multiple (n) MSI-X vectors per HCA port. If + the IRQ affinity masks of these interrupts have been + configured such that each MSI-X interrupt is handled by a + different CPU then the comp_vector parameter can be used to + spread the SRP completion workload over multiple CPU's. * tl_retry_count, a number in the range 2..7 specifying the IB RC retry count. * queue_size, the maximum number of commands that the @@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a partial memory descriptor list in an SRP_CMD when communicating with an SRP target. +What: /sys/class/scsi_host/hostn/ch_count +Date: November 1, 2014 +KernelVersion: 3.18 +Contact: linux-rdma@vger.kernel.org +Description: Number of RDMA channels used for communication with the SRP + target. + What: /sys/class/scsi_host/hostn/cmd_sg_entries Date: May 19, 2011 KernelVersion: 2.6.39 @@ -95,6 +102,12 @@ Contact:linux-rdma@vger.kernel.org Description: Maximum number of data buffer descriptors that may be sent to the target in a single SRP_CMD request. +What: /sys/class/scsi_host/hostn/comp_vector +Date: September 2, 2013 +KernelVersion: 3.11 +Contact: linux-rdma@vger.kernel.org +Description: Completion vector used for the first RDMA channel. + What: /sys/class/scsi_host/hostn/dgid Date: June 17, 2006 KernelVersion: 2.6.17 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 9feeea1..58ca618 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo, if fast_io_fail_tmo has not been set. \off\ means that this functionality is disabled.); +static unsigned ch_count; +module_param(ch_count, uint, 0444); +MODULE_PARM_DESC(ch_count, +Number of RDMA channels to use for communication with an SRP + target. Using more than one channel improves performance + if the HCA supports multiple completion vectors. The + default value is the minimum of four times the number of + online CPU sockets and the number of completion vectors + supported by the HCA.); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr); @@ -556,17 +566,32 @@ err: * Note: this function may be called without srp_alloc_iu_bufs() having been * invoked. Hence the ch-[rt]x_ring checks. */ -static void srp_free_ch_ib(struct srp_rdma_ch *ch) +static void srp_free_ch_ib(struct srp_target_port *target, + struct srp_rdma_ch *ch) { - struct srp_target_port *target = ch-target; struct srp_device *dev = target-srp_host-srp_dev; int i; + if (!ch-target) + return; + + /* +* Avoid that the SCSI error handler tries to use this channel after +* it has been freed. The SCSI error handler can namely continue +* trying to perform recovery actions after scsi_remove_host() +* returned. +*/ + ch-target = NULL; + if (ch-cm_id) { ib_destroy_cm_id(ch-cm_id);
[PATCH 7/8] IB/srp: Separate target and channel variables
Changes in this patch: - Move channel variables into a new structure (struct srp_rdma_ch). - cm_id and completion handler context pointer are now of type srp_rdma_ch * insteoad of srp_target_port *. No functionality is changed. Signed-off-by: Bart Van Assche bvanass...@acm.org --- drivers/infiniband/ulp/srp/ib_srp.c | 707 +++- drivers/infiniband/ulp/srp/ib_srp.h | 59 +-- 2 files changed, 417 insertions(+), 349 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index fd88fb8..9feeea1 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo, static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); -static void srp_recv_completion(struct ib_cq *cq, void *target_ptr); -static void srp_send_completion(struct ib_cq *cq, void *target_ptr); +static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr); +static void srp_send_completion(struct ib_cq *cq, void *ch_ptr); static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); static struct scsi_transport_template *ib_srp_transport_template; @@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target, ret = ib_find_pkey(target-srp_host-srp_dev-dev, target-srp_host-port, - be16_to_cpu(target-path.pkey), + be16_to_cpu(target-pkey), attr-pkey_index); if (ret) goto out; @@ -283,18 +283,23 @@ out: return ret; } -static int srp_new_cm_id(struct srp_target_port *target) +static int srp_new_cm_id(struct srp_rdma_ch *ch) { + struct srp_target_port *target = ch-target; struct ib_cm_id *new_cm_id; new_cm_id = ib_create_cm_id(target-srp_host-srp_dev-dev, - srp_cm_handler, target); + srp_cm_handler, ch); if (IS_ERR(new_cm_id)) return PTR_ERR(new_cm_id); - if (target-cm_id) - ib_destroy_cm_id(target-cm_id); - target-cm_id = new_cm_id; + if (ch-cm_id) + ib_destroy_cm_id(ch-cm_id); + ch-cm_id = new_cm_id; + ch-path.sgid = target-sgid; + ch-path.dgid = target-orig_dgid; + ch-path.pkey = target-pkey; + ch-path.service_id = target-service_id; return 0; } @@ -443,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target) dev-max_pages_per_mr); } -static int srp_create_target_ib(struct srp_target_port *target) +static int srp_create_ch_ib(struct srp_rdma_ch *ch) { + struct srp_target_port *target = ch-target; struct srp_device *dev = target-srp_host-srp_dev; struct ib_qp_init_attr *init_attr; struct ib_cq *recv_cq, *send_cq; @@ -458,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port *target) if (!init_attr) return -ENOMEM; - recv_cq = ib_create_cq(dev-dev, srp_recv_completion, NULL, target, - target-queue_size, target-comp_vector); + recv_cq = ib_create_cq(dev-dev, srp_recv_completion, NULL, ch, + target-queue_size, ch-comp_vector); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } - send_cq = ib_create_cq(dev-dev, srp_send_completion, NULL, target, - m * target-queue_size, target-comp_vector); + send_cq = ib_create_cq(dev-dev, srp_send_completion, NULL, ch, + m * target-queue_size, ch-comp_vector); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; @@ -502,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port *target) FR pool allocation failed (%d)\n, ret); goto err_qp; } - if (target-fr_pool) - srp_destroy_fr_pool(target-fr_pool); - target-fr_pool = fr_pool; + if (ch-fr_pool) + srp_destroy_fr_pool(ch-fr_pool); + ch-fr_pool = fr_pool; } else if (!dev-use_fast_reg dev-has_fmr) { fmr_pool = srp_alloc_fmr_pool(target); if (IS_ERR(fmr_pool)) { @@ -513,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port *target) FMR pool allocation failed (%d)\n, ret); goto err_qp; } - if (target-fmr_pool) - ib_destroy_fmr_pool(target-fmr_pool); - target-fmr_pool = fmr_pool; + if (ch-fmr_pool) +
RFC: FMR support in SRP
Hi Bart and all, During go through SRP FMR support, I found ib_srp pre-alloc fmr_list/map_page in each request, fmr_list are alloced as many as target-cmd_sg_cnt I add some debug message when run fio test with different settings. Result show, state.ndesc and state.nmdesc is 1, state.npages is 0, after srp_map_sg, my question is : do we really need as many cmd_sg_cnt fmr_list, or I miss something, ndesc and nmdesc could be bigger than 1? -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: FMR support in SRP
On 09/19/14 15:11, Jinpu Wang wrote: During go through SRP FMR support, I found ib_srp pre-alloc fmr_list/map_page in each request, fmr_list are alloced as many as target-cmd_sg_cnt I add some debug message when run fio test with different settings. Result show, state.ndesc and state.nmdesc is 1, state.npages is 0, after srp_map_sg, my question is : do we really need as many cmd_sg_cnt fmr_list, or I miss something, ndesc and nmdesc could be bigger than 1? Hello Jack, The limitations for FMR / FR memory registration are more restrictive than those imposed by the SCSI core on S/G-list layout. A few examples: * Memory registered via FMR must be aligned on an FMR page boundary. * Memory registered via a single FMR / FR registration must be a contiguous virtual memory region. * The maximum size for a memory region registered via FMR or FR (dev-mr_max_size) can be below the maximum size of an S/G-list that can be passed by the SCSI core. Hence the need for multiple memory descriptors. In case you are wondering how I tested memory registration involving multiple memory descriptors: I wrote a test program that causes the SCSI core to send an S/G-list to the SRP initiator that consists of 128 elements with four bytes of data. None of these elements are aligned on a page boundary and no two S/G-list elements are contiguous in virtual memory. This test program causes the SRP initiator to allocate 128 memory descriptors. Please note that I/O requests submitted by the attached test program will only be accepted by the SRP initiator if the cmd_sg_entries kernel module parameter has been set to a value = 128. Bart. #include cassert #include cstring // memset() #include fcntl.h // O_RDONLY #include iomanip #include iostream #include linux/fs.h // BLKSSZGET #include scsi/sg.h // sg_io_hdr_t #include sys/ioctl.h #include unistd.h// open() #include vector class file_descriptor { public: file_descriptor(int fd = -1) : m_fd(fd) { } ~file_descriptor() { if (m_fd = 0) close(m_fd); } operator int() const { return m_fd; } private: file_descriptor(const file_descriptor ); file_descriptor operator=(const file_descriptor ); int m_fd; }; class iovec_t { public: iovec_t() { } ~iovec_t() { } size_t size() const { return m_v.size(); } const sg_iovec_t operator[](const int i) const { return m_v[i]; } sg_iovec_t operator[](const int i) { return m_v[i]; } void append(void *addr, size_t len) { m_v.resize(m_v.size() + 1); decltype(m_v)::iterator p = m_v.end() - 1; p-iov_base = addr; p-iov_len = len; } const void *address() const { return *m_v.begin(); } size_t data_len() const { size_t len = 0; for (decltype(m_v)::const_iterator p = m_v.begin(); p != m_v.end(); ++p) len += p-iov_len; return len; } void trunc(size_t len) { size_t s = 0; for (decltype(m_v)::iterator p = m_v.begin(); p != m_v.end(); ++p) { s += p-iov_len; if (s = len) { p-iov_len -= s - len; assert(p-iov_len 0 || (p-iov_len == 0 len == 0)); m_v.resize(p - m_v.begin() + 1); break; } } } std::ostream write(std::ostream os) const { for (decltype(m_v)::const_iterator p = m_v.begin(); p != m_v.end(); ++p) os.write((const char *)p-iov_base, p-iov_len); return os; } private: iovec_t(const iovec_t ); iovec_t operator=(const iovec_t ); std::vectorsg_iovec_t m_v; }; static unsigned block_size; static void dumphex(std::ostream os, const void *a, size_t len) { for (int i = 0; i len; i += 16) { os std::hex std::setfill('0') std::setw(16) (uintptr_t)a + i ':'; for (int j = i; j i + 16 j len; j++) { if (j % 4 == 0) os ' '; os std::hex std::setfill('0') std::setw(2) (unsigned)((uint8_t*)a)[j]; } os; for (int j = i; j i + 16 j len; j++) { unsigned char c = ((uint8_t*)a)[j]; os (c = ' ' c 128 ? (char)c : '.'); } os '\n'; } } enum { MAX_READ_WRITE_6_LBA = 0x1f, MAX_READ_WRITE_6_LENGTH = 0xff, }; static ssize_t sg_read(const file_descriptor fd, uint32_t lba, const iovec_t v) { if (lba MAX_READ_WRITE_6_LBA) return -1; if (v.data_len() == 0 || (v.data_len() % block_size) != 0) return -1; if (v.data_len() / block_size MAX_READ_WRITE_6_LENGTH) return -1; int sg_version; if (ioctl(fd, SG_GET_VERSION_NUM, sg_version) 0 || sg_version 3) return -1; uint8_t read6[6] = { 0x08, (uint8_t)(lba 16), (uint8_t)(lba 8), (uint8_t)(lba), (uint8_t)(v.data_len() / block_size), 0 }; unsigned char sense_buffer[32]; sg_io_hdr_t h = { .interface_id = 'S' }; h.cmdp = read6; h.cmd_len = sizeof(read6); h.dxfer_direction = SG_DXFER_FROM_DEV; h.iovec_count = v.size(); h.dxfer_len = v.data_len(); h.dxferp = const_castvoid*(v.address()); h.sbp = sense_buffer; h.mx_sb_len = sizeof(sense_buffer); h.timeout = 1000; /* 1000
Re: [PATCH 8/8] IB/srp: Add multichannel support
On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote: Improve performance by using multiple RDMA/RC channels per SCSI host for communicating with an SRP target. Signed-off-by: Bart Van Assche bvanass...@acm.org --- Documentation/ABI/stable/sysfs-driver-ib_srp | 25 +- drivers/infiniband/ulp/srp/ib_srp.c | 337 --- drivers/infiniband/ulp/srp/ib_srp.h | 20 +- 3 files changed, 287 insertions(+), 95 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index b9688de..d5a459e 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect to a new target. only safe with partial memory descriptor list support enabled (allow_ext_sg=1). * comp_vector, a number in the range 0..n-1 specifying the - MSI-X completion vector. Some HCA's allocate multiple (n) - MSI-X vectors per HCA port. If the IRQ affinity masks of - these interrupts have been configured such that each MSI-X - interrupt is handled by a different CPU then the comp_vector - parameter can be used to spread the SRP completion workload - over multiple CPU's. + MSI-X completion vector of the first RDMA channel. Some + HCA's allocate multiple (n) MSI-X vectors per HCA port. If + the IRQ affinity masks of these interrupts have been + configured such that each MSI-X interrupt is handled by a + different CPU then the comp_vector parameter can be used to + spread the SRP completion workload over multiple CPU's. * tl_retry_count, a number in the range 2..7 specifying the IB RC retry count. * queue_size, the maximum number of commands that the @@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a partial memory descriptor list in an SRP_CMD when communicating with an SRP target. +What: /sys/class/scsi_host/hostn/ch_count +Date: November 1, 2014 +KernelVersion: 3.18 +Contact: linux-rdma@vger.kernel.org +Description: Number of RDMA channels used for communication with the SRP + target. + What: /sys/class/scsi_host/hostn/cmd_sg_entries Date: May 19, 2011 KernelVersion: 2.6.39 @@ -95,6 +102,12 @@ Contact:linux-rdma@vger.kernel.org Description: Maximum number of data buffer descriptors that may be sent to the target in a single SRP_CMD request. +What: /sys/class/scsi_host/hostn/comp_vector +Date: September 2, 2013 +KernelVersion: 3.11 +Contact: linux-rdma@vger.kernel.org +Description: Completion vector used for the first RDMA channel. + What: /sys/class/scsi_host/hostn/dgid Date: June 17, 2006 KernelVersion: 2.6.17 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 9feeea1..58ca618 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo, if fast_io_fail_tmo has not been set. \off\ means that this functionality is disabled.); +static unsigned ch_count; +module_param(ch_count, uint, 0444); +MODULE_PARM_DESC(ch_count, +Number of RDMA channels to use for communication with an SRP + target. Using more than one channel improves performance + if the HCA supports multiple completion vectors. The + default value is the minimum of four times the number of + online CPU sockets and the number of completion vectors + supported by the HCA.); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr); @@ -556,17 +566,32 @@ err: * Note: this function may be called without srp_alloc_iu_bufs() having been * invoked. Hence the ch-[rt]x_ring checks. */ -static void srp_free_ch_ib(struct srp_rdma_ch *ch) +static void srp_free_ch_ib(struct srp_target_port *target, + struct srp_rdma_ch *ch) { - struct srp_target_port *target = ch-target; struct srp_device *dev = target-srp_host-srp_dev; int i; + if (!ch-target) + return; + + /* +* Avoid that the SCSI error handler tries to use this channel after +* it has been freed. The SCSI error handler can namely continue +* trying to perform recovery actions
Re: RFC: FMR support in SRP
Hello Jack, Did you know that file descriptor 0 corresponds to stdin ? With command-line option -w the test program reads data from stdin and sends that data to a SCSI device. I think the test program is waiting for you to provide input data :-) Bart. Thanks Bart, Now I know it:) This command works, generated 128 descriptors. ./discontiguous-io -l 512 -s /dev/sdb Thanks a lot. -- Mit freundlichen Grüßen, Best Regards, Jack Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Add multichannel support
On 09/19/14 17:27, Ming Lei wrote: On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote: On 09/19/14 16:28, Ming Lei wrote: On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote: @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = { .proc_name = DRV_NAME, .slave_configure= srp_slave_configure, .info = srp_target_info, - .queuecommand = srp_queuecommand, + .queuecommand = srp_sq_queuecommand, + .mq_queuecommand= srp_mq_queuecommand, Another choice is to obtain hctx from request directly, then mq can reuse the .queuecommand interface too. Hello Ming, Is the hctx information already available in the request data structure ? I have found a mq_ctx member but no hctx member. Did I perhaps overlook something ? You are right, but the mq_ctx can be mapped to hctx like below way: ctx = rq-mq_ctx; hctx = q-mq_ops-map_queue(q, ctx-cpu); Hello Ming, Had you already noticed that the blk_mq_ctx data structure is a private data structure (declared in block/blk-mq.h) and hence not available to SCSI LLDs ? However, what might be possible is to cache the hctx pointer in the request structure, e.g. like in the (completely untested) patch below. [PATCH] blk-mq: Cache hardware context pointer in struct request Cache the hardware context pointer in struct request such that block drivers can determine which hardware queue has been selected by reading rq-mq_hctx-queue_num. This information is needed to select the appropriate hardware queue in e.g. SCSI LLDs. --- block/blk-flush.c | 6 +- block/blk-mq.c | 16 +--- include/linux/blkdev.h | 1 + 3 files changed, 11 insertions(+), 12 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 3cb5e9e..698812d 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -327,13 +327,9 @@ static void flush_data_end_io(struct request *rq, int error) static void mq_flush_data_end_io(struct request *rq, int error) { struct request_queue *q = rq-q; - struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *ctx; + struct blk_mq_hw_ctx *hctx = rq-mq_hctx; unsigned long flags; - ctx = rq-mq_ctx; - hctx = q-mq_ops-map_queue(q, ctx-cpu); - /* * After populating an empty queue, kick it to avoid stall. Read * the comment in flush_end_io(). diff --git a/block/blk-mq.c b/block/blk-mq.c index 383ea0c..8e428fe 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -210,6 +210,7 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int rw) } rq-tag = tag; + rq-mq_hctx = data-hctx; blk_mq_rq_ctx_init(data-q, data-ctx, rq, rw); return rq; } @@ -267,12 +268,10 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx, void blk_mq_free_request(struct request *rq) { struct blk_mq_ctx *ctx = rq-mq_ctx; - struct blk_mq_hw_ctx *hctx; - struct request_queue *q = rq-q; + struct blk_mq_hw_ctx *hctx = rq-mq_hctx; ctx-rq_completed[rq_is_sync(rq)]++; - hctx = q-mq_ops-map_queue(q, ctx-cpu); __blk_mq_free_request(hctx, ctx, rq); } @@ -287,10 +286,10 @@ void blk_mq_free_request(struct request *rq) void blk_mq_clone_flush_request(struct request *flush_rq, struct request *orig_rq) { - struct blk_mq_hw_ctx *hctx = - orig_rq-q-mq_ops-map_queue(orig_rq-q, orig_rq-mq_ctx-cpu); + struct blk_mq_hw_ctx *hctx = orig_rq-mq_hctx; flush_rq-mq_ctx = orig_rq-mq_ctx; + flush_rq-mq_hctx = hctx; flush_rq-tag = orig_rq-tag; memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq), hctx-cmd_size); @@ -956,6 +955,7 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, rq-mq_ctx = ctx = current_ctx; hctx = q-mq_ops-map_queue(q, ctx-cpu); + rq-mq_hctx = hctx; if (rq-cmd_flags (REQ_FLUSH | REQ_FUA) !(rq-cmd_flags (REQ_FLUSH_SEQ))) { @@ -1001,6 +1001,7 @@ static void blk_mq_insert_requests(struct request_queue *q, rq = list_first_entry(list, struct request, queuelist); list_del_init(rq-queuelist); rq-mq_ctx = ctx; + rq-mq_hctx = hctx; __blk_mq_insert_request(hctx, rq, false); } spin_unlock(ctx-lock); @@ -1475,6 +1476,8 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu) return NOTIFY_OK; ctx = blk_mq_get_ctx(q); + hctx = q-mq_ops-map_queue(q, ctx-cpu); + spin_lock(ctx-lock); while (!list_empty(tmp)) { @@ -1482,10 +1485,9 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx,
Re: [PATCH 8/8] IB/srp: Add multichannel support
On 09/19/2014 09:35 AM, Bart Van Assche wrote: On 09/19/14 17:27, Ming Lei wrote: On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote: On 09/19/14 16:28, Ming Lei wrote: On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote: @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = { .proc_name = DRV_NAME, .slave_configure= srp_slave_configure, .info = srp_target_info, - .queuecommand = srp_queuecommand, + .queuecommand = srp_sq_queuecommand, + .mq_queuecommand= srp_mq_queuecommand, Another choice is to obtain hctx from request directly, then mq can reuse the .queuecommand interface too. Hello Ming, Is the hctx information already available in the request data structure ? I have found a mq_ctx member but no hctx member. Did I perhaps overlook something ? You are right, but the mq_ctx can be mapped to hctx like below way: ctx = rq-mq_ctx; hctx = q-mq_ops-map_queue(q, ctx-cpu); Hello Ming, Had you already noticed that the blk_mq_ctx data structure is a private data structure (declared in block/blk-mq.h) and hence not available to SCSI LLDs ? However, what might be possible is to cache the hctx pointer in the request structure, e.g. like in the (completely untested) patch below. ctx was meant to be private, unfortunately it's leaked a bit into other parts of block/. But it's still private within that, at least. Lets not add more stuff to struct request, it's already way too large. We could add an exported struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq) { struct request_queue *q = rq-q; return q-mq_ops-map_queue(q, rq-mq_ctx-cpu); } for this. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: FMR support in SRP
On Fri, Sep 19, 2014 at 5:35 PM, Jinpu Wang jinpu.w...@profitbricks.com wrote: Hello Jack, Did you know that file descriptor 0 corresponds to stdin ? With command-line option -w the test program reads data from stdin and sends that data to a SCSI device. I think the test program is waiting for you to provide input data :-) Bart. Thanks Bart, Now I know it:) This command works, generated 128 descriptors. ./discontiguous-io -l 512 -s /dev/sdb Thanks a lot. -- Mit freundlichen Grüßen, Best Regards, Jack Wang Another question, in srp_map_finish_fmr, the desc va is set to 0, could you point me how SRP protect multiple rdma operation write to same addr? or different fmr-rkey will protect this? -- Best Regards, Jack Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Add multichannel support
On 9/19/2014 6:38 PM, Jens Axboe wrote: On 09/19/2014 09:35 AM, Bart Van Assche wrote: On 09/19/14 17:27, Ming Lei wrote: On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote: On 09/19/14 16:28, Ming Lei wrote: On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote: @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = { .proc_name = DRV_NAME, .slave_configure= srp_slave_configure, .info = srp_target_info, - .queuecommand = srp_queuecommand, + .queuecommand = srp_sq_queuecommand, + .mq_queuecommand= srp_mq_queuecommand, Another choice is to obtain hctx from request directly, then mq can reuse the .queuecommand interface too. Hello Ming, Is the hctx information already available in the request data structure ? I have found a mq_ctx member but no hctx member. Did I perhaps overlook something ? You are right, but the mq_ctx can be mapped to hctx like below way: ctx = rq-mq_ctx; hctx = q-mq_ops-map_queue(q, ctx-cpu); Hello Ming, Had you already noticed that the blk_mq_ctx data structure is a private data structure (declared in block/blk-mq.h) and hence not available to SCSI LLDs ? However, what might be possible is to cache the hctx pointer in the request structure, e.g. like in the (completely untested) patch below. ctx was meant to be private, unfortunately it's leaked a bit into other parts of block/. But it's still private within that, at least. Lets not add more stuff to struct request, it's already way too large. We could add an exported struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq) { struct request_queue *q = rq-q; return q-mq_ops-map_queue(q, rq-mq_ctx-cpu); } for this. Hey, So I agree that we shouldn't overload struct request with another pointer, but it also seems a bit unnecessary to map again just to get the hctx. Since in the future we would like LLDs to use scsi-mq why not modify existing .queuecommand to pass hctx (or even better hctx-driver_data) and for now LLDs won't use it. Once they choose to, it will be available to them. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Add multichannel support
On 09/19/2014 11:30 AM, Sagi Grimberg wrote: On 9/19/2014 6:38 PM, Jens Axboe wrote: On 09/19/2014 09:35 AM, Bart Van Assche wrote: On 09/19/14 17:27, Ming Lei wrote: On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote: On 09/19/14 16:28, Ming Lei wrote: On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote: @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = { .proc_name = DRV_NAME, .slave_configure= srp_slave_configure, .info = srp_target_info, - .queuecommand = srp_queuecommand, + .queuecommand = srp_sq_queuecommand, + .mq_queuecommand= srp_mq_queuecommand, Another choice is to obtain hctx from request directly, then mq can reuse the .queuecommand interface too. Hello Ming, Is the hctx information already available in the request data structure ? I have found a mq_ctx member but no hctx member. Did I perhaps overlook something ? You are right, but the mq_ctx can be mapped to hctx like below way: ctx = rq-mq_ctx; hctx = q-mq_ops-map_queue(q, ctx-cpu); Hello Ming, Had you already noticed that the blk_mq_ctx data structure is a private data structure (declared in block/blk-mq.h) and hence not available to SCSI LLDs ? However, what might be possible is to cache the hctx pointer in the request structure, e.g. like in the (completely untested) patch below. ctx was meant to be private, unfortunately it's leaked a bit into other parts of block/. But it's still private within that, at least. Lets not add more stuff to struct request, it's already way too large. We could add an exported struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq) { struct request_queue *q = rq-q; return q-mq_ops-map_queue(q, rq-mq_ctx-cpu); } for this. Hey, So I agree that we shouldn't overload struct request with another pointer, but it also seems a bit unnecessary to map again just to get the hctx. Since in the future we would like LLDs to use scsi-mq why not modify existing .queuecommand to pass hctx (or even better hctx-driver_data) and for now LLDs won't use it. Once they choose to, it will be available to them. That'd be fine as well. The mapping is cheap, though, but it would make sense to have an appropriate way to just pass it in like it happens for -queue_rq() for native blk-mq drivers. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 9/19/2014 3:56 PM, Bart Van Assche wrote: [PATCH 1/8] blk-mq: Use all available hardware queues Suppose that a system has two CPU sockets, three cores per socket, that it does not support hyperthreading and that four hardware queues are provided by a block driver. With the current algorithm this will lead to the following assignment of CPU cores to hardware queues: HWQ 0: 0 1 HWQ 1: 2 3 HWQ 2: 4 5 HWQ 3: (none) This patch changes the queue assignment into: HWQ 0: 0 1 HWQ 1: 2 HWQ 2: 3 4 HWQ 3: 5 In other words, this patch has the following three effects: - All four hardware queues are used instead of only three. - CPU cores are spread more evenly over hardware queues. For the above example the range of the number of CPU cores associated with a single HWQ is reduced from [0..2] to [1..2]. - If the number of HWQ's is a multiple of the number of CPU sockets it is now guaranteed that all CPU cores associated with a single HWQ reside on the same CPU socket. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Jens Axboe ax...@fb.com Cc: Christoph Hellwig h...@lst.de Cc: Ming Lei ming@canonical.com --- block/blk-mq-cpumap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 1065d7c..8e56455 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -17,7 +17,7 @@ static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues, const int cpu) { - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues); + return cpu * nr_queues / nr_cpus; } static int get_first_sibling(unsigned int cpu) Seems reasonable enough. Reviewed-by: Sagi Grimberg sa...@mellanox.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues
On 9/19/2014 3:57 PM, Bart Van Assche wrote: Allow a SCSI LLD to declare how many hardware queues it supports by setting Scsi_Host.nr_hw_queues before calling scsi_add_host(). Note: it is assumed that each hardware queue has a queue depth of shost-can_queue. In other words, the total queue depth per host is (number of hardware queues) * (shost-can_queue). Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Christoph Hellwig h...@lst.de --- drivers/scsi/scsi_lib.c | 2 +- include/scsi/scsi_host.h | 4 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index d837dc1..b0b6117 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2071,7 +2071,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) memset(shost-tag_set, 0, sizeof(shost-tag_set)); shost-tag_set.ops = scsi_mq_ops; - shost-tag_set.nr_hw_queues = 1; + shost-tag_set.nr_hw_queues = shost-nr_hw_queues ? : 1; shost-tag_set.queue_depth = shost-can_queue; shost-tag_set.cmd_size = cmd_size; shost-tag_set.numa_node = NUMA_NO_NODE; diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index ba20347..0a867d9 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -638,6 +638,10 @@ struct Scsi_Host { short unsigned int sg_prot_tablesize; unsigned int max_sectors; unsigned long dma_boundary; + /* + * In scsi-mq mode, the number of hardware queues supported by the LLD. + */ + unsigned nr_hw_queues; /* * Used to assign serial numbers to the cmds. * Protected by the host lock. I think this patch should be squashed with passing LLD hctx patch (in whatever form it ends up). -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Add multichannel support
On Fri, Sep 19, 2014 at 11:33:15AM -0600, Jens Axboe wrote: That'd be fine as well. The mapping is cheap, though, but it would make sense to have an appropriate way to just pass it in like it happens for -queue_rq() for native blk-mq drivers. I think just passing the hw_ctx is fine. But I don't want drivers to have to implement both methods, so we should make sure the new method works for both the blk-mq and !blk-mq case. Note that once thing the new method should get is a bool last paramter similar to the one I added to the queue_rq method in the block tree tree. It might be worth it to simply do a global search and replace and pass a hw_ctx method to all instances, too. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues
On Fri, Sep 19, 2014 at 09:05:53PM +0300, Sagi Grimberg wrote: I think this patch should be squashed with passing LLD hctx patch (in whatever form it ends up). Agreed. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] IB/srp: Remove stale connection retry mechanism
On 9/19/2014 3:58 PM, Bart Van Assche wrote: Attempting to connect three times may be insufficient after an initiator system that was using multiple RDMA channels tries to relogin. Additionally, this login retry mechanism is a workaround for particular behavior of the IB/CM. Since the srp_daemon retries a failed login attempt anyway, remove the stale connection retry mechanism. I agree, this work-around doesn't even always work for some targets. In case the RDMA stack restarted while IB ports were down and srp initiator didn't manage to teardown the connections, some targets are keeping cm_ids online returning stale connection rejects longer then expected. let user-space retry from scratch... Reviewed-by: Sagi Grimberg sa...@mellanox.com Signed-off-by: Bart Van Assche bvanass...@acm.org --- drivers/infiniband/ulp/srp/ib_srp.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index d3c712f..9608e7a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport) static int srp_connect_target(struct srp_target_port *target) { - int retries = 3; int ret; WARN_ON_ONCE(target-connected); @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port *target) break; case SRP_STALE_CONN: - /* Our current CM id was stale, and is now in timewait. - * Try to reconnect with a new one. - */ - if (!retries-- || srp_new_cm_id(target)) { - shost_printk(KERN_ERR, target-scsi_host, PFX - giving up on stale connection\n); - target-status = -ECONNRESET; - return target-status; - } - shost_printk(KERN_ERR, target-scsi_host, PFX - retrying stale connection\n); - break; + giving up on stale connection\n); + target-status = -ECONNRESET; + return target-status; default: return target-status; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 09/19/2014 06:55 AM, Bart Van Assche wrote: Hello, Although the SRP protocol supports multichannel operation, although since considerable time RDMA HCA's are available that support multiple completion vectors and although multichannel operation yields better performance than using a single channel, the Linux SRP initiator does not yet support multichannel operation. While adding multichannel support in the SRP initiator I encountered a few challenges of which I think these need wider discussion. The topics I would like invite wider discussion about are as follows: - How to avoid unneeded inter-socket cache traffic. Should the blk-mq layer e.g. assign CPU cores to hardware queues such that all CPU cores associated with a single hardware queue reside on the same CPU socket? (see also patch 1/8) Right now these are deliberately symmetric, and hence can result in hardware queues being left unused. Whether it makes sense to make this change or not, I think will greatly depend on how many queues are available. There's a crossover point where having more queues doesn't help, and it can even make things worse (interrupt load, etc). For your specific case or 4 queues, it probably DOES make sense to use them all, however. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html