RE: [PATCH] qib_keys: Replace rcu_assign_pointer() with RCU_INIT_POINTER()

2014-09-19 Thread Marciniszyn, Mike
 Subject: [PATCH] qib_keys: Replace rcu_assign_pointer() with
 RCU_INIT_POINTER()
 

I would prefer the summary be:
IB/qib: qib_remove_lkey()  Replace rcu_assign_pointer() with
 RCU_INIT_POINTER()

Otherwise the patch looks ok and has been tested.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] qib: qib_qp: Replace rcu_assign_pointer() with RCU_INIT_POINTER()

2014-09-19 Thread Marciniszyn, Mike
 Subject: [PATCH] qib: qib_qp: Replace rcu_assign_pointer() with
 RCU_INIT_POINTER()


Why not consolidate this with 
http://marc.info/?l=linux-rdmam=140836578119485w=2 so there is just one patch?

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche

Hello,

Although the SRP protocol supports multichannel operation, although 
since considerable time RDMA HCA's are available that support multiple 
completion vectors and although multichannel operation yields better 
performance than using a single channel, the Linux SRP initiator does 
not yet support multichannel operation. While adding multichannel 
support in the SRP initiator I encountered a few challenges of which I 
think these need wider discussion. The topics I would like invite wider 
discussion about are as follows:

- How to avoid unneeded inter-socket cache traffic. Should the blk-mq
  layer e.g. assign CPU cores to hardware queues such that all CPU cores
  associated with a single hardware queue reside on the same CPU socket?
  (see also patch 1/8)
- How to pass the hardware context selected by the block layer to the
  SCSI LLD queuecommand() callback function ? (see also patches 2/8 and
  3/8).
- Which approach should a SCSI LLD follow for selection of an MSI-X
  completion vector to ensure that the interrupt handler is invoked on
  the same CPU socket as the blk-mq hardware context data structures ?
  As one can see patch 8/8 relies on the assumption that completion
  vectors have been spread evenly over CPU sockets. If a HCA e.g.
  supports eight completion vectors then that means that in a system
  with two CPU sockets vectors 0-3 are associated with a CPU core on
  the first CPU socket and vectors 4-7 with a CPU core on the second CPU
  socket.

The patches in this series are:
0001-blk-mq-Use-all-available-hardware-queues.patch
0002-scsi-mq-Add-support-for-multiple-hardware-queues.patch
0003-scsi-mq-Pass-hctx-to-low-level-SCSI-drivers.patch
0004-IB-srp-Move-ib_destroy_cm_id-call-into-srp_free_ch_i.patch
0005-IB-srp-Remove-stale-connection-retry-mechanism.patch
0006-IB-srp-Avoid-that-I-O-hangs-due-to-a-cable-pull-duri.patch
0007-IB-srp-Separate-target-and-channel-variables.patch
0008-IB-srp-Add-multichannel-support.patch

Note: a predecessor of this patch series has been used to measure the 
performance of Christoph's scsi-mq patches that have been merged in 
kernel version v3.17-rc1.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche
[PATCH 1/8] blk-mq: Use all available hardware queues

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Jens Axboe ax...@fb.com
Cc: Christoph Hellwig h...@lst.de
Cc: Ming Lei ming@canonical.com
---
 block/blk-mq-cpumap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c..8e56455 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -17,7 +17,7 @@
 static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
  const int cpu)
 {
-   return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
+   return cpu * nr_queues / nr_cpus;
 }
 
 static int get_first_sibling(unsigned int cpu)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] IB/srp: Remove stale connection retry mechanism

2014-09-19 Thread Bart Van Assche
Attempting to connect three times may be insufficient after an
initiator system that was using multiple RDMA channels tries to
relogin. Additionally, this login retry mechanism is a workaround
for particular behavior of the IB/CM. Since the srp_daemon retries
a failed login attempt anyway, remove the stale connection retry
mechanism.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index d3c712f..9608e7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)
 
 static int srp_connect_target(struct srp_target_port *target)
 {
-   int retries = 3;
int ret;
 
WARN_ON_ONCE(target-connected);
@@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port 
*target)
break;
 
case SRP_STALE_CONN:
-   /* Our current CM id was stale, and is now in timewait.
-* Try to reconnect with a new one.
-*/
-   if (!retries-- || srp_new_cm_id(target)) {
-   shost_printk(KERN_ERR, target-scsi_host, PFX
-giving up on stale connection\n);
-   target-status = -ECONNRESET;
-   return target-status;
-   }
-
shost_printk(KERN_ERR, target-scsi_host, PFX
-retrying stale connection\n);
-   break;
+giving up on stale connection\n);
+   target-status = -ECONNRESET;
+   return target-status;
 
default:
return target-status;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning

2014-09-19 Thread Bart Van Assche
If a cable is pulled during LUN scanning it can happen that the
SRP rport and the SCSI host have been created but no LUNs have been
added to the SCSI host. Since multipathd only sends SCSI commands
to a SCSI target if one or more SCSI devices are present and since
there is no keepalive mechanism for IB queue pairs this means that
after a LUN scan failed and after a reconnect has succeeded no
data will be sent over the QP and hence that a subsequent cable
pull will not be detected. Avoid this by not creating an rport or
SCSI host if a cable is pulled during a SCSI LUN scan.

Note: so far the above behavior has only been observed with the
kernel module parameter ch_count set to a value = 2.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 drivers/infiniband/ulp/srp/ib_srp.c | 56 +++--
 drivers/infiniband/ulp/srp/ib_srp.h |  1 +
 2 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9608e7a..fd88fb8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -,6 +,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
int i, ret;
 
srp_disconnect_target(target);
+
+   if (target-state == SRP_TARGET_SCANNING)
+   return -ENODEV;
+
/*
 * Now get a new local CM ID so that we avoid confusing the target in
 * case things are really fouled up. Doing so also ensures that all CM
@@ -2607,11 +2611,23 @@ static struct scsi_host_template srp_template = {
.shost_attrs= srp_host_attrs
 };
 
+static int srp_sdev_count(struct Scsi_Host *host)
+{
+   struct scsi_device *sdev;
+   int c = 0;
+
+   shost_for_each_device(sdev, host)
+   c++;
+
+   return c;
+}
+
 static int srp_add_target(struct srp_host *host, struct srp_target_port 
*target)
 {
struct srp_rport_identifiers ids;
struct srp_rport *rport;
 
+   target-state = SRP_TARGET_SCANNING;
sprintf(target-target_name, SRP.T10:%016llX,
 (unsigned long long) be64_to_cpu(target-id_ext));
 
@@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host *host, struct 
srp_target_port *target)
list_add_tail(target-list, host-target_list);
spin_unlock(host-target_lock);
 
-   target-state = SRP_TARGET_LIVE;
-
scsi_scan_target(target-scsi_host-shost_gendev,
 0, target-scsi_id, SCAN_WILD_CARD, 0);
 
+   if (!target-connected || target-qp_in_error) {
+   shost_printk(KERN_INFO, target-scsi_host,
+PFX SCSI scan failed - removing SCSI host\n);
+   srp_queue_remove_work(target);
+   goto out;
+   }
+
+   pr_debug(PFX %s: SCSI scan succeeded - detected %d LUNs\n,
+dev_name(target-scsi_host-shost_gendev),
+srp_sdev_count(target-scsi_host));
+
+   spin_lock_irq(target-lock);
+   if (target-state == SRP_TARGET_SCANNING)
+   target-state = SRP_TARGET_LIVE;
+   spin_unlock_irq(target-lock);
+
+out:
return 0;
 }
 
@@ -3044,13 +3075,20 @@ static ssize_t srp_create_target(struct device *dev,
if (ret)
goto err_disconnect;
 
-   shost_printk(KERN_DEBUG, target-scsi_host, PFX
-new target: id_ext %016llx ioc_guid %016llx pkey %04x 
service_id %016llx sgid %pI6 dgid %pI6\n,
-be64_to_cpu(target-id_ext),
-be64_to_cpu(target-ioc_guid),
-be16_to_cpu(target-path.pkey),
-be64_to_cpu(target-service_id),
-target-path.sgid.raw, target-path.dgid.raw);
+   /* Protects against concurrent srp_remove_target() invocation. */
+   scsi_host_get(target-scsi_host);
+
+   if (target-state != SRP_TARGET_REMOVED) {
+   shost_printk(KERN_DEBUG, target-scsi_host, PFX
+new target: id_ext %016llx ioc_guid %016llx pkey 
%04x service_id %016llx sgid %pI6 dgid %pI6\n,
+be64_to_cpu(target-id_ext),
+be64_to_cpu(target-ioc_guid),
+be16_to_cpu(target-path.pkey),
+be64_to_cpu(target-service_id),
+target-path.sgid.raw, target-orig_dgid);
+   }
+
+   scsi_host_put(target-scsi_host);
 
ret = count;
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index e46ecb1..00c7c48 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -73,6 +73,7 @@ enum {
 };
 
 enum srp_target_state {
+   SRP_TARGET_SCANNING,
SRP_TARGET_LIVE,
SRP_TARGET_REMOVED,
 };
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to 

[PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche
Improve performance by using multiple RDMA/RC channels per SCSI host
for communicating with an SRP target.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |  25 +-
 drivers/infiniband/ulp/srp/ib_srp.c  | 337 ---
 drivers/infiniband/ulp/srp/ib_srp.h  |  20 +-
 3 files changed, 287 insertions(+), 95 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
b/Documentation/ABI/stable/sysfs-driver-ib_srp
index b9688de..d5a459e 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect to 
a new target.
  only safe with partial memory descriptor list support enabled
  (allow_ext_sg=1).
* comp_vector, a number in the range 0..n-1 specifying the
- MSI-X completion vector. Some HCA's allocate multiple (n)
- MSI-X vectors per HCA port. If the IRQ affinity masks of
- these interrupts have been configured such that each MSI-X
- interrupt is handled by a different CPU then the comp_vector
- parameter can be used to spread the SRP completion workload
- over multiple CPU's.
+ MSI-X completion vector of the first RDMA channel. Some
+ HCA's allocate multiple (n) MSI-X vectors per HCA port. If
+ the IRQ affinity masks of these interrupts have been
+ configured such that each MSI-X interrupt is handled by a
+ different CPU then the comp_vector parameter can be used to
+ spread the SRP completion workload over multiple CPU's.
* tl_retry_count, a number in the range 2..7 specifying the
  IB RC retry count.
* queue_size, the maximum number of commands that the
@@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a partial 
memory
descriptor list in an SRP_CMD when communicating with an SRP
target.
 
+What:  /sys/class/scsi_host/hostn/ch_count
+Date:  November 1, 2014
+KernelVersion: 3.18
+Contact:   linux-rdma@vger.kernel.org
+Description:   Number of RDMA channels used for communication with the SRP
+   target.
+
 What:  /sys/class/scsi_host/hostn/cmd_sg_entries
 Date:  May 19, 2011
 KernelVersion: 2.6.39
@@ -95,6 +102,12 @@ Contact:linux-rdma@vger.kernel.org
 Description:   Maximum number of data buffer descriptors that may be sent to
the target in a single SRP_CMD request.
 
+What:  /sys/class/scsi_host/hostn/comp_vector
+Date:  September 2, 2013
+KernelVersion: 3.11
+Contact:   linux-rdma@vger.kernel.org
+Description:   Completion vector used for the first RDMA channel.
+
 What:  /sys/class/scsi_host/hostn/dgid
 Date:  June 17, 2006
 KernelVersion: 2.6.17
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9feeea1..58ca618 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo,
  if fast_io_fail_tmo has not been set. \off\ means that
  this functionality is disabled.);
 
+static unsigned ch_count;
+module_param(ch_count, uint, 0444);
+MODULE_PARM_DESC(ch_count,
+Number of RDMA channels to use for communication with an SRP
+ target. Using more than one channel improves performance
+ if the HCA supports multiple completion vectors. The
+ default value is the minimum of four times the number of
+ online CPU sockets and the number of completion vectors
+ supported by the HCA.);
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
@@ -556,17 +566,32 @@ err:
  * Note: this function may be called without srp_alloc_iu_bufs() having been
  * invoked. Hence the ch-[rt]x_ring checks.
  */
-static void srp_free_ch_ib(struct srp_rdma_ch *ch)
+static void srp_free_ch_ib(struct srp_target_port *target,
+  struct srp_rdma_ch *ch)
 {
-   struct srp_target_port *target = ch-target;
struct srp_device *dev = target-srp_host-srp_dev;
int i;
 
+   if (!ch-target)
+   return;
+
+   /*
+* Avoid that the SCSI error handler tries to use this channel after
+* it has been freed. The SCSI error handler can namely continue
+* trying to perform recovery actions after scsi_remove_host()
+* returned.
+*/
+   ch-target = NULL;
+
if (ch-cm_id) {
ib_destroy_cm_id(ch-cm_id);

[PATCH 7/8] IB/srp: Separate target and channel variables

2014-09-19 Thread Bart Van Assche
Changes in this patch:
- Move channel variables into a new structure (struct srp_rdma_ch).
- cm_id and completion handler context pointer are now of type
  srp_rdma_ch * insteoad of srp_target_port *.

No functionality is changed.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 drivers/infiniband/ulp/srp/ib_srp.c | 707 +++-
 drivers/infiniband/ulp/srp/ib_srp.h |  59 +--
 2 files changed, 417 insertions(+), 349 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index fd88fb8..9feeea1 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo,
 
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
-static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
-static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
+static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
+static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
 
 static struct scsi_transport_template *ib_srp_transport_template;
@@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target,
 
ret = ib_find_pkey(target-srp_host-srp_dev-dev,
   target-srp_host-port,
-  be16_to_cpu(target-path.pkey),
+  be16_to_cpu(target-pkey),
   attr-pkey_index);
if (ret)
goto out;
@@ -283,18 +283,23 @@ out:
return ret;
 }
 
-static int srp_new_cm_id(struct srp_target_port *target)
+static int srp_new_cm_id(struct srp_rdma_ch *ch)
 {
+   struct srp_target_port *target = ch-target;
struct ib_cm_id *new_cm_id;
 
new_cm_id = ib_create_cm_id(target-srp_host-srp_dev-dev,
-   srp_cm_handler, target);
+   srp_cm_handler, ch);
if (IS_ERR(new_cm_id))
return PTR_ERR(new_cm_id);
 
-   if (target-cm_id)
-   ib_destroy_cm_id(target-cm_id);
-   target-cm_id = new_cm_id;
+   if (ch-cm_id)
+   ib_destroy_cm_id(ch-cm_id);
+   ch-cm_id = new_cm_id;
+   ch-path.sgid = target-sgid;
+   ch-path.dgid = target-orig_dgid;
+   ch-path.pkey = target-pkey;
+   ch-path.service_id = target-service_id;
 
return 0;
 }
@@ -443,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct 
srp_target_port *target)
  dev-max_pages_per_mr);
 }
 
-static int srp_create_target_ib(struct srp_target_port *target)
+static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 {
+   struct srp_target_port *target = ch-target;
struct srp_device *dev = target-srp_host-srp_dev;
struct ib_qp_init_attr *init_attr;
struct ib_cq *recv_cq, *send_cq;
@@ -458,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
if (!init_attr)
return -ENOMEM;
 
-   recv_cq = ib_create_cq(dev-dev, srp_recv_completion, NULL, target,
-  target-queue_size, target-comp_vector);
+   recv_cq = ib_create_cq(dev-dev, srp_recv_completion, NULL, ch,
+  target-queue_size, ch-comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
-   send_cq = ib_create_cq(dev-dev, srp_send_completion, NULL, target,
-  m * target-queue_size, target-comp_vector);
+   send_cq = ib_create_cq(dev-dev, srp_send_completion, NULL, ch,
+  m * target-queue_size, ch-comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -502,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
 FR pool allocation failed (%d)\n, ret);
goto err_qp;
}
-   if (target-fr_pool)
-   srp_destroy_fr_pool(target-fr_pool);
-   target-fr_pool = fr_pool;
+   if (ch-fr_pool)
+   srp_destroy_fr_pool(ch-fr_pool);
+   ch-fr_pool = fr_pool;
} else if (!dev-use_fast_reg  dev-has_fmr) {
fmr_pool = srp_alloc_fmr_pool(target);
if (IS_ERR(fmr_pool)) {
@@ -513,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
 FMR pool allocation failed (%d)\n, ret);
goto err_qp;
}
-   if (target-fmr_pool)
-   ib_destroy_fmr_pool(target-fmr_pool);
-   target-fmr_pool = fmr_pool;
+   if (ch-fmr_pool)
+   

RFC: FMR support in SRP

2014-09-19 Thread Jinpu Wang
Hi Bart and all,

During go through SRP FMR support, I found ib_srp pre-alloc
fmr_list/map_page in each request,
fmr_list are alloced as many as target-cmd_sg_cnt

I add some debug message when run fio test with different settings.
Result show, state.ndesc and state.nmdesc is 1, state.npages is 0,
after srp_map_sg,
my question is : do we really need as many cmd_sg_cnt fmr_list, or I
miss something,  ndesc and nmdesc could be bigger than 1?



-- 

Mit freundlichen Grüßen,
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: FMR support in SRP

2014-09-19 Thread Bart Van Assche

On 09/19/14 15:11, Jinpu Wang wrote:

During go through SRP FMR support, I found ib_srp pre-alloc
fmr_list/map_page in each request,
fmr_list are alloced as many as target-cmd_sg_cnt

I add some debug message when run fio test with different settings.
Result show, state.ndesc and state.nmdesc is 1, state.npages is 0,
after srp_map_sg,
my question is : do we really need as many cmd_sg_cnt fmr_list, or I
miss something,  ndesc and nmdesc could be bigger than 1?


Hello Jack,

The limitations for FMR / FR memory registration are more restrictive 
than those imposed by the SCSI core on S/G-list layout. A few examples:

* Memory registered via FMR must be aligned on an FMR page boundary.
* Memory registered via a single FMR / FR registration must be a 
contiguous virtual memory region.
* The maximum size for a memory region registered via FMR or FR 
(dev-mr_max_size) can be below the maximum size of an S/G-list that can 
be passed by the SCSI core.


Hence the need for multiple memory descriptors. In case you are 
wondering how I tested memory registration involving multiple memory 
descriptors: I wrote a test program that causes the SCSI core to send an 
S/G-list to the SRP initiator that consists of 128 elements with four 
bytes of data. None of these elements are aligned on a page boundary and 
no two S/G-list elements are contiguous in virtual memory. This test 
program causes the SRP initiator to allocate 128 memory descriptors. 
Please note that I/O requests submitted by the attached test program 
will only be accepted by the SRP initiator if the cmd_sg_entries kernel 
module parameter has been set to a value = 128.


Bart.

#include cassert
#include cstring // memset()
#include fcntl.h // O_RDONLY
#include iomanip
#include iostream
#include linux/fs.h  // BLKSSZGET
#include scsi/sg.h   // sg_io_hdr_t
#include sys/ioctl.h
#include unistd.h// open()
#include vector

class file_descriptor {
public:
file_descriptor(int fd = -1)
	: m_fd(fd)
{ }
~file_descriptor()
{ if (m_fd = 0) close(m_fd); }
operator int() const
{ return m_fd; }

private:
file_descriptor(const file_descriptor );
file_descriptor operator=(const file_descriptor );

int m_fd;
};

class iovec_t {
public:
iovec_t()
{ }
~iovec_t()
{ }
size_t size() const
{ return m_v.size(); }
const sg_iovec_t operator[](const int i) const
{ return m_v[i]; }
sg_iovec_t operator[](const int i)
{ return m_v[i]; }
void append(void *addr, size_t len) {
	m_v.resize(m_v.size() + 1);
	decltype(m_v)::iterator p = m_v.end() - 1;
	p-iov_base = addr;
	p-iov_len = len;
}
const void *address() const {
	return *m_v.begin();
}
size_t data_len() const {
	size_t len = 0;
	for (decltype(m_v)::const_iterator p = m_v.begin(); p != m_v.end(); ++p)
	len += p-iov_len;
	return len;
}
void trunc(size_t len) {
	size_t s = 0;
	for (decltype(m_v)::iterator p = m_v.begin(); p != m_v.end(); ++p) {
	s += p-iov_len;
	if (s = len) {
		p-iov_len -= s - len;
		assert(p-iov_len  0 || (p-iov_len == 0  len == 0));
		m_v.resize(p - m_v.begin() + 1);
		break;
	}
	}
}
std::ostream write(std::ostream os) const {
	for (decltype(m_v)::const_iterator p = m_v.begin(); p != m_v.end(); ++p)
	os.write((const char *)p-iov_base, p-iov_len);
	return os;
}

private:
iovec_t(const iovec_t );
iovec_t operator=(const iovec_t );

std::vectorsg_iovec_t m_v;
};

static unsigned block_size;

static void dumphex(std::ostream os, const void *a, size_t len)
{
for (int i = 0; i  len; i += 16) {
	os  std::hex  std::setfill('0')  std::setw(16)
	(uintptr_t)a + i  ':';
	for (int j = i; j  i + 16  j  len; j++) {
	if (j % 4 == 0)
		os  ' ';
	os  std::hex  std::setfill('0')  std::setw(2)
	(unsigned)((uint8_t*)a)[j];
	}
	os;
	for (int j = i; j  i + 16  j  len; j++) {
	unsigned char c = ((uint8_t*)a)[j];
	os  (c = ' '  c  128 ? (char)c : '.');
	}
	os  '\n';
}
}

enum {
MAX_READ_WRITE_6_LBA = 0x1f,
MAX_READ_WRITE_6_LENGTH = 0xff,
};

static ssize_t sg_read(const file_descriptor fd, uint32_t lba, const iovec_t v)
{
if (lba  MAX_READ_WRITE_6_LBA)
	return -1;

if (v.data_len() == 0 || (v.data_len() % block_size) != 0)
	return -1;

if (v.data_len() / block_size  MAX_READ_WRITE_6_LENGTH)
	return -1;

int sg_version;
if (ioctl(fd, SG_GET_VERSION_NUM, sg_version)  0 || sg_version  3)
	return -1;

uint8_t read6[6] = { 0x08, (uint8_t)(lba  16), (uint8_t)(lba  8),
			 (uint8_t)(lba), (uint8_t)(v.data_len() / block_size),
			 0 };
unsigned char sense_buffer[32];
sg_io_hdr_t h = { .interface_id = 'S' };

h.cmdp = read6;
h.cmd_len = sizeof(read6);
h.dxfer_direction = SG_DXFER_FROM_DEV;
h.iovec_count = v.size();
h.dxfer_len = v.data_len();
h.dxferp = const_castvoid*(v.address());
h.sbp = sense_buffer;
h.mx_sb_len = sizeof(sense_buffer);
h.timeout = 1000; /* 1000 

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Ming Lei
On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org wrote:
 Improve performance by using multiple RDMA/RC channels per SCSI host
 for communicating with an SRP target.

 Signed-off-by: Bart Van Assche bvanass...@acm.org
 ---
  Documentation/ABI/stable/sysfs-driver-ib_srp |  25 +-
  drivers/infiniband/ulp/srp/ib_srp.c  | 337 
 ---
  drivers/infiniband/ulp/srp/ib_srp.h  |  20 +-
  3 files changed, 287 insertions(+), 95 deletions(-)

 diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
 b/Documentation/ABI/stable/sysfs-driver-ib_srp
 index b9688de..d5a459e 100644
 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp
 +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
 @@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect 
 to a new target.
   only safe with partial memory descriptor list support 
 enabled
   (allow_ext_sg=1).
 * comp_vector, a number in the range 0..n-1 specifying the
 - MSI-X completion vector. Some HCA's allocate multiple (n)
 - MSI-X vectors per HCA port. If the IRQ affinity masks of
 - these interrupts have been configured such that each MSI-X
 - interrupt is handled by a different CPU then the comp_vector
 - parameter can be used to spread the SRP completion workload
 - over multiple CPU's.
 + MSI-X completion vector of the first RDMA channel. Some
 + HCA's allocate multiple (n) MSI-X vectors per HCA port. If
 + the IRQ affinity masks of these interrupts have been
 + configured such that each MSI-X interrupt is handled by a
 + different CPU then the comp_vector parameter can be used to
 + spread the SRP completion workload over multiple CPU's.
 * tl_retry_count, a number in the range 2..7 specifying the
   IB RC retry count.
 * queue_size, the maximum number of commands that the
 @@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a 
 partial memory
 descriptor list in an SRP_CMD when communicating with an SRP
 target.

 +What:  /sys/class/scsi_host/hostn/ch_count
 +Date:  November 1, 2014
 +KernelVersion: 3.18
 +Contact:   linux-rdma@vger.kernel.org
 +Description:   Number of RDMA channels used for communication with the SRP
 +   target.
 +
  What:  /sys/class/scsi_host/hostn/cmd_sg_entries
  Date:  May 19, 2011
  KernelVersion: 2.6.39
 @@ -95,6 +102,12 @@ Contact:linux-rdma@vger.kernel.org
  Description:   Maximum number of data buffer descriptors that may be sent to
 the target in a single SRP_CMD request.

 +What:  /sys/class/scsi_host/hostn/comp_vector
 +Date:  September 2, 2013
 +KernelVersion: 3.11
 +Contact:   linux-rdma@vger.kernel.org
 +Description:   Completion vector used for the first RDMA channel.
 +
  What:  /sys/class/scsi_host/hostn/dgid
  Date:  June 17, 2006
  KernelVersion: 2.6.17
 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
 b/drivers/infiniband/ulp/srp/ib_srp.c
 index 9feeea1..58ca618 100644
 --- a/drivers/infiniband/ulp/srp/ib_srp.c
 +++ b/drivers/infiniband/ulp/srp/ib_srp.c
 @@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo,
   if fast_io_fail_tmo has not been set. \off\ means that
   this functionality is disabled.);

 +static unsigned ch_count;
 +module_param(ch_count, uint, 0444);
 +MODULE_PARM_DESC(ch_count,
 +Number of RDMA channels to use for communication with an 
 SRP
 + target. Using more than one channel improves performance
 + if the HCA supports multiple completion vectors. The
 + default value is the minimum of four times the number of
 + online CPU sockets and the number of completion vectors
 + supported by the HCA.);
 +
  static void srp_add_one(struct ib_device *device);
  static void srp_remove_one(struct ib_device *device);
  static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
 @@ -556,17 +566,32 @@ err:
   * Note: this function may be called without srp_alloc_iu_bufs() having been
   * invoked. Hence the ch-[rt]x_ring checks.
   */
 -static void srp_free_ch_ib(struct srp_rdma_ch *ch)
 +static void srp_free_ch_ib(struct srp_target_port *target,
 +  struct srp_rdma_ch *ch)
  {
 -   struct srp_target_port *target = ch-target;
 struct srp_device *dev = target-srp_host-srp_dev;
 int i;

 +   if (!ch-target)
 +   return;
 +
 +   /*
 +* Avoid that the SCSI error handler tries to use this channel after
 +* it has been freed. The SCSI error handler can namely continue
 +* trying to perform recovery actions 

Re: RFC: FMR support in SRP

2014-09-19 Thread Jinpu Wang


 Hello Jack,

 Did you know that file descriptor 0 corresponds to stdin ? With command-line
 option -w the test program reads data from stdin and sends that data to a
 SCSI device. I think the test program is waiting for you to provide input
 data :-)

 Bart.

Thanks Bart,

Now I know it:)

This command works, generated 128 descriptors.
./discontiguous-io -l 512  -s /dev/sdb



Thanks a lot.


-- 
Mit freundlichen Grüßen,
Best Regards,

Jack Wang
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche
On 09/19/14 17:27, Ming Lei wrote:
 On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote:
 On 09/19/14 16:28, Ming Lei wrote:

 On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org
 wrote:

 @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
   .proc_name  = DRV_NAME,
   .slave_configure= srp_slave_configure,
   .info   = srp_target_info,
 -   .queuecommand   = srp_queuecommand,
 +   .queuecommand   = srp_sq_queuecommand,
 +   .mq_queuecommand= srp_mq_queuecommand,


 Another choice is to obtain hctx from request directly, then mq can
 reuse the .queuecommand interface too.


 Hello Ming,

 Is the hctx information already available in the request data structure ? I
 have found a mq_ctx member but no hctx member. Did I perhaps overlook
 something ?
 
 You are right, but the mq_ctx can be mapped to hctx like below way:
 
 ctx = rq-mq_ctx;
 hctx = q-mq_ops-map_queue(q, ctx-cpu);

Hello Ming,

Had you already noticed that the blk_mq_ctx data structure is a private
data structure (declared in block/blk-mq.h) and hence not available to
SCSI LLDs ? However, what might be possible is to cache the hctx pointer
in the request structure, e.g. like in the (completely untested) patch
below.

[PATCH] blk-mq: Cache hardware context pointer in struct request

Cache the hardware context pointer in struct request such that block
drivers can determine which hardware queue has been selected by
reading rq-mq_hctx-queue_num. This information is needed to select
the appropriate hardware queue in e.g. SCSI LLDs.
---
 block/blk-flush.c  |  6 +-
 block/blk-mq.c | 16 +---
 include/linux/blkdev.h |  1 +
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3cb5e9e..698812d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -327,13 +327,9 @@ static void flush_data_end_io(struct request *rq, int 
error)
 static void mq_flush_data_end_io(struct request *rq, int error)
 {
struct request_queue *q = rq-q;
-   struct blk_mq_hw_ctx *hctx;
-   struct blk_mq_ctx *ctx;
+   struct blk_mq_hw_ctx *hctx = rq-mq_hctx;
unsigned long flags;
 
-   ctx = rq-mq_ctx;
-   hctx = q-mq_ops-map_queue(q, ctx-cpu);
-
/*
 * After populating an empty queue, kick it to avoid stall.  Read
 * the comment in flush_end_io().
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 383ea0c..8e428fe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -210,6 +210,7 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int 
rw)
}
 
rq-tag = tag;
+   rq-mq_hctx = data-hctx;
blk_mq_rq_ctx_init(data-q, data-ctx, rq, rw);
return rq;
}
@@ -267,12 +268,10 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
*hctx,
 void blk_mq_free_request(struct request *rq)
 {
struct blk_mq_ctx *ctx = rq-mq_ctx;
-   struct blk_mq_hw_ctx *hctx;
-   struct request_queue *q = rq-q;
+   struct blk_mq_hw_ctx *hctx = rq-mq_hctx;
 
ctx-rq_completed[rq_is_sync(rq)]++;
 
-   hctx = q-mq_ops-map_queue(q, ctx-cpu);
__blk_mq_free_request(hctx, ctx, rq);
 }
 
@@ -287,10 +286,10 @@ void blk_mq_free_request(struct request *rq)
 void blk_mq_clone_flush_request(struct request *flush_rq,
struct request *orig_rq)
 {
-   struct blk_mq_hw_ctx *hctx =
-   orig_rq-q-mq_ops-map_queue(orig_rq-q, orig_rq-mq_ctx-cpu);
+   struct blk_mq_hw_ctx *hctx = orig_rq-mq_hctx;
 
flush_rq-mq_ctx = orig_rq-mq_ctx;
+   flush_rq-mq_hctx = hctx;
flush_rq-tag = orig_rq-tag;
memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
hctx-cmd_size);
@@ -956,6 +955,7 @@ void blk_mq_insert_request(struct request *rq, bool 
at_head, bool run_queue,
rq-mq_ctx = ctx = current_ctx;
 
hctx = q-mq_ops-map_queue(q, ctx-cpu);
+   rq-mq_hctx = hctx;
 
if (rq-cmd_flags  (REQ_FLUSH | REQ_FUA) 
!(rq-cmd_flags  (REQ_FLUSH_SEQ))) {
@@ -1001,6 +1001,7 @@ static void blk_mq_insert_requests(struct request_queue 
*q,
rq = list_first_entry(list, struct request, queuelist);
list_del_init(rq-queuelist);
rq-mq_ctx = ctx;
+   rq-mq_hctx = hctx;
__blk_mq_insert_request(hctx, rq, false);
}
spin_unlock(ctx-lock);
@@ -1475,6 +1476,8 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx 
*hctx, int cpu)
return NOTIFY_OK;
 
ctx = blk_mq_get_ctx(q);
+   hctx = q-mq_ops-map_queue(q, ctx-cpu);
+
spin_lock(ctx-lock);
 
while (!list_empty(tmp)) {
@@ -1482,10 +1485,9 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx 
*hctx, 

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe
On 09/19/2014 09:35 AM, Bart Van Assche wrote:
 On 09/19/14 17:27, Ming Lei wrote:
 On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org wrote:
 On 09/19/14 16:28, Ming Lei wrote:

 On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org
 wrote:

 @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
   .proc_name  = DRV_NAME,
   .slave_configure= srp_slave_configure,
   .info   = srp_target_info,
 -   .queuecommand   = srp_queuecommand,
 +   .queuecommand   = srp_sq_queuecommand,
 +   .mq_queuecommand= srp_mq_queuecommand,


 Another choice is to obtain hctx from request directly, then mq can
 reuse the .queuecommand interface too.


 Hello Ming,

 Is the hctx information already available in the request data structure ? I
 have found a mq_ctx member but no hctx member. Did I perhaps overlook
 something ?

 You are right, but the mq_ctx can be mapped to hctx like below way:

 ctx = rq-mq_ctx;
 hctx = q-mq_ops-map_queue(q, ctx-cpu);
 
 Hello Ming,
 
 Had you already noticed that the blk_mq_ctx data structure is a private
 data structure (declared in block/blk-mq.h) and hence not available to
 SCSI LLDs ? However, what might be possible is to cache the hctx pointer
 in the request structure, e.g. like in the (completely untested) patch
 below.

ctx was meant to be private, unfortunately it's leaked a bit into other
parts of block/. But it's still private within that, at least.

Lets not add more stuff to struct request, it's already way too large.
We could add an exported

struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
{
struct request_queue *q = rq-q;

return q-mq_ops-map_queue(q, rq-mq_ctx-cpu);
}

for this.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: FMR support in SRP

2014-09-19 Thread Jinpu Wang
On Fri, Sep 19, 2014 at 5:35 PM, Jinpu Wang jinpu.w...@profitbricks.com wrote:


 Hello Jack,

 Did you know that file descriptor 0 corresponds to stdin ? With command-line
 option -w the test program reads data from stdin and sends that data to a
 SCSI device. I think the test program is waiting for you to provide input
 data :-)

 Bart.

 Thanks Bart,

 Now I know it:)

 This command works, generated 128 descriptors.
 ./discontiguous-io -l 512  -s /dev/sdb



 Thanks a lot.


 --
 Mit freundlichen Grüßen,
 Best Regards,

 Jack Wang

Another question, in srp_map_finish_fmr, the desc va is set to 0,
could you point me how SRP protect multiple rdma operation write to
same
addr?  or different fmr-rkey will protect this?

-- 

Best Regards,

Jack Wang
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Sagi Grimberg
On 9/19/2014 6:38 PM, Jens Axboe wrote:
 On 09/19/2014 09:35 AM, Bart Van Assche wrote:
 On 09/19/14 17:27, Ming Lei wrote:
 On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org 
 wrote:
 On 09/19/14 16:28, Ming Lei wrote:

 On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org
 wrote:

 @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
.proc_name  = DRV_NAME,
.slave_configure= srp_slave_configure,
.info   = srp_target_info,
 -   .queuecommand   = srp_queuecommand,
 +   .queuecommand   = srp_sq_queuecommand,
 +   .mq_queuecommand= srp_mq_queuecommand,


 Another choice is to obtain hctx from request directly, then mq can
 reuse the .queuecommand interface too.


 Hello Ming,

 Is the hctx information already available in the request data structure ? I
 have found a mq_ctx member but no hctx member. Did I perhaps overlook
 something ?

 You are right, but the mq_ctx can be mapped to hctx like below way:

 ctx = rq-mq_ctx;
 hctx = q-mq_ops-map_queue(q, ctx-cpu);

 Hello Ming,

 Had you already noticed that the blk_mq_ctx data structure is a private
 data structure (declared in block/blk-mq.h) and hence not available to
 SCSI LLDs ? However, what might be possible is to cache the hctx pointer
 in the request structure, e.g. like in the (completely untested) patch
 below.

 ctx was meant to be private, unfortunately it's leaked a bit into other
 parts of block/. But it's still private within that, at least.

 Lets not add more stuff to struct request, it's already way too large.
 We could add an exported

 struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
 {
 struct request_queue *q = rq-q;

 return q-mq_ops-map_queue(q, rq-mq_ctx-cpu);
 }

 for this.


Hey,

So I agree that we shouldn't overload struct request with another
pointer, but it also seems a bit unnecessary to map again just to get
the hctx. Since in the future we would like LLDs to use scsi-mq why not
modify existing .queuecommand to pass hctx (or even better
hctx-driver_data) and for now LLDs won't use it. Once they choose to,
it will be available to them.

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe
On 09/19/2014 11:30 AM, Sagi Grimberg wrote:
 On 9/19/2014 6:38 PM, Jens Axboe wrote:
 On 09/19/2014 09:35 AM, Bart Van Assche wrote:
 On 09/19/14 17:27, Ming Lei wrote:
 On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche bvanass...@acm.org 
 wrote:
 On 09/19/14 16:28, Ming Lei wrote:

 On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche bvanass...@acm.org
 wrote:

 @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
.proc_name  = DRV_NAME,
.slave_configure= srp_slave_configure,
.info   = srp_target_info,
 -   .queuecommand   = srp_queuecommand,
 +   .queuecommand   = srp_sq_queuecommand,
 +   .mq_queuecommand= srp_mq_queuecommand,


 Another choice is to obtain hctx from request directly, then mq can
 reuse the .queuecommand interface too.


 Hello Ming,

 Is the hctx information already available in the request data structure ? 
 I
 have found a mq_ctx member but no hctx member. Did I perhaps overlook
 something ?

 You are right, but the mq_ctx can be mapped to hctx like below way:

 ctx = rq-mq_ctx;
 hctx = q-mq_ops-map_queue(q, ctx-cpu);

 Hello Ming,

 Had you already noticed that the blk_mq_ctx data structure is a private
 data structure (declared in block/blk-mq.h) and hence not available to
 SCSI LLDs ? However, what might be possible is to cache the hctx pointer
 in the request structure, e.g. like in the (completely untested) patch
 below.

 ctx was meant to be private, unfortunately it's leaked a bit into other
 parts of block/. But it's still private within that, at least.

 Lets not add more stuff to struct request, it's already way too large.
 We could add an exported

 struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
 {
 struct request_queue *q = rq-q;

 return q-mq_ops-map_queue(q, rq-mq_ctx-cpu);
 }

 for this.

 
 Hey,
 
 So I agree that we shouldn't overload struct request with another
 pointer, but it also seems a bit unnecessary to map again just to get
 the hctx. Since in the future we would like LLDs to use scsi-mq why not
 modify existing .queuecommand to pass hctx (or even better
 hctx-driver_data) and for now LLDs won't use it. Once they choose to,
 it will be available to them.

That'd be fine as well. The mapping is cheap, though, but it would make
sense to have an appropriate way to just pass it in like it happens for
-queue_rq() for native blk-mq drivers.


-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Sagi Grimberg
On 9/19/2014 3:56 PM, Bart Van Assche wrote:
 [PATCH 1/8] blk-mq: Use all available hardware queues

 Suppose that a system has two CPU sockets, three cores per socket,
 that it does not support hyperthreading and that four hardware
 queues are provided by a block driver. With the current algorithm
 this will lead to the following assignment of CPU cores to hardware
 queues:

HWQ 0: 0 1
HWQ 1: 2 3
HWQ 2: 4 5
HWQ 3: (none)

 This patch changes the queue assignment into:

HWQ 0: 0 1
HWQ 1: 2
HWQ 2: 3 4
HWQ 3: 5

 In other words, this patch has the following three effects:
 - All four hardware queues are used instead of only three.
 - CPU cores are spread more evenly over hardware queues. For the
above example the range of the number of CPU cores associated
with a single HWQ is reduced from [0..2] to [1..2].
 - If the number of HWQ's is a multiple of the number of CPU sockets
it is now guaranteed that all CPU cores associated with a single
HWQ reside on the same CPU socket.

 Signed-off-by: Bart Van Assche bvanass...@acm.org
 Cc: Jens Axboe ax...@fb.com
 Cc: Christoph Hellwig h...@lst.de
 Cc: Ming Lei ming@canonical.com
 ---
   block/blk-mq-cpumap.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
 index 1065d7c..8e56455 100644
 --- a/block/blk-mq-cpumap.c
 +++ b/block/blk-mq-cpumap.c
 @@ -17,7 +17,7 @@
   static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
const int cpu)
   {
 - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
 + return cpu * nr_queues / nr_cpus;
   }

   static int get_first_sibling(unsigned int cpu)


Seems reasonable enough.

Reviewed-by: Sagi Grimberg sa...@mellanox.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues

2014-09-19 Thread Sagi Grimberg
On 9/19/2014 3:57 PM, Bart Van Assche wrote:
 Allow a SCSI LLD to declare how many hardware queues it supports
 by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().

 Note: it is assumed that each hardware queue has a queue depth of
 shost-can_queue. In other words, the total queue depth per host
 is (number of hardware queues) * (shost-can_queue).

 Signed-off-by: Bart Van Assche bvanass...@acm.org
 Cc: Christoph Hellwig h...@lst.de
 ---
   drivers/scsi/scsi_lib.c  | 2 +-
   include/scsi/scsi_host.h | 4 
   2 files changed, 5 insertions(+), 1 deletion(-)

 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index d837dc1..b0b6117 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -2071,7 +2071,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)

   memset(shost-tag_set, 0, sizeof(shost-tag_set));
   shost-tag_set.ops = scsi_mq_ops;
 - shost-tag_set.nr_hw_queues = 1;
 + shost-tag_set.nr_hw_queues = shost-nr_hw_queues ? : 1;
   shost-tag_set.queue_depth = shost-can_queue;
   shost-tag_set.cmd_size = cmd_size;
   shost-tag_set.numa_node = NUMA_NO_NODE;
 diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
 index ba20347..0a867d9 100644
 --- a/include/scsi/scsi_host.h
 +++ b/include/scsi/scsi_host.h
 @@ -638,6 +638,10 @@ struct Scsi_Host {
   short unsigned int sg_prot_tablesize;
   unsigned int max_sectors;
   unsigned long dma_boundary;
 + /*
 + * In scsi-mq mode, the number of hardware queues supported by the LLD.
 + */
 + unsigned nr_hw_queues;
   /*
   * Used to assign serial numbers to the cmds.
   * Protected by the host lock.


I think this patch should be squashed with passing LLD hctx patch (in
whatever form it ends up).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Christoph Hellwig
On Fri, Sep 19, 2014 at 11:33:15AM -0600, Jens Axboe wrote:
 That'd be fine as well. The mapping is cheap, though, but it would make
 sense to have an appropriate way to just pass it in like it happens for
 -queue_rq() for native blk-mq drivers.

I think just passing the hw_ctx is fine.  But I don't want drivers to
have to implement both methods, so we should make sure the new method
works for both the blk-mq and !blk-mq case.  Note that once thing the
new method should get is a bool last paramter similar to the one I
added to the queue_rq method in the block tree tree.

It might be worth it to simply do a global search and replace and pass
a hw_ctx method to all instances, too.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues

2014-09-19 Thread Christoph Hellwig
On Fri, Sep 19, 2014 at 09:05:53PM +0300, Sagi Grimberg wrote:
 I think this patch should be squashed with passing LLD hctx patch (in
 whatever form it ends up).

Agreed.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] IB/srp: Remove stale connection retry mechanism

2014-09-19 Thread Sagi Grimberg
On 9/19/2014 3:58 PM, Bart Van Assche wrote:
 Attempting to connect three times may be insufficient after an
 initiator system that was using multiple RDMA channels tries to
 relogin. Additionally, this login retry mechanism is a workaround
 for particular behavior of the IB/CM. Since the srp_daemon retries
 a failed login attempt anyway, remove the stale connection retry
 mechanism.


I agree, this work-around doesn't even always work for some targets. In
case the RDMA stack restarted while IB ports were down and srp initiator
didn't manage to teardown the connections, some targets are keeping
cm_ids online returning stale connection rejects longer then expected.

let user-space retry from scratch...

Reviewed-by: Sagi Grimberg sa...@mellanox.com

 Signed-off-by: Bart Van Assche bvanass...@acm.org
 ---
   drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-
   1 file changed, 3 insertions(+), 13 deletions(-)

 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
 b/drivers/infiniband/ulp/srp/ib_srp.c
 index d3c712f..9608e7a 100644
 --- a/drivers/infiniband/ulp/srp/ib_srp.c
 +++ b/drivers/infiniband/ulp/srp/ib_srp.c
 @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)

   static int srp_connect_target(struct srp_target_port *target)
   {
 - int retries = 3;
   int ret;

   WARN_ON_ONCE(target-connected);
 @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port 
 *target)
   break;

   case SRP_STALE_CONN:
 - /* Our current CM id was stale, and is now in timewait.
 - * Try to reconnect with a new one.
 - */
 - if (!retries-- || srp_new_cm_id(target)) {
 - shost_printk(KERN_ERR, target-scsi_host, PFX
 - giving up on stale connection\n);
 - target-status = -ECONNRESET;
 - return target-status;
 - }
 -
   shost_printk(KERN_ERR, target-scsi_host, PFX
 - retrying stale connection\n);
 - break;
 + giving up on stale connection\n);
 + target-status = -ECONNRESET;
 + return target-status;

   default:
   return target-status;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe
On 09/19/2014 06:55 AM, Bart Van Assche wrote:
 Hello,
 
 Although the SRP protocol supports multichannel operation, although
 since considerable time RDMA HCA's are available that support multiple
 completion vectors and although multichannel operation yields better
 performance than using a single channel, the Linux SRP initiator does
 not yet support multichannel operation. While adding multichannel
 support in the SRP initiator I encountered a few challenges of which I
 think these need wider discussion. The topics I would like invite wider
 discussion about are as follows:
 - How to avoid unneeded inter-socket cache traffic. Should the blk-mq
   layer e.g. assign CPU cores to hardware queues such that all CPU cores
   associated with a single hardware queue reside on the same CPU socket?
   (see also patch 1/8)

Right now these are deliberately symmetric, and hence can result in
hardware queues being left unused. Whether it makes sense to make this
change or not, I think will greatly depend on how many queues are
available. There's a crossover point where having more queues doesn't
help, and it can even make things worse (interrupt load, etc). For your
specific case or 4 queues, it probably DOES make sense to use them all,
however.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html