Re: [PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing

2010-05-12 Thread Or Gerlitz
Roland Dreier  wrote:

> I have these 3 + Dan Carpenter's fix applied now.

cool

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing

2010-05-12 Thread Roland Dreier
 > Hi Roland, as we're @ -rc7 now, I wanted to check with you if there's
 > any issue merging this patch series for 2.6.35. If you have any
 > question or anything need to be addressed/fixed, I'd like to do that
 > sooner rather then later.

No, just needed to get to it.  I have these 3 + Dan Carpenter's fix
applied now.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing

2010-05-11 Thread Or Gerlitz
Or Gerlitz  wrote:
>  [...] with this patch, multipath fail-over time is about 30 seconds, which 
> is seen here,
> when a DD over the multi-path device is done before/during/after the 
> fail-over [...] without
 > this patch, multipath fail-over time is about 130 seconds

Hi Roland, as we're @ -rc7 now, I wanted to check with you if there's
any issue merging this patch series for 2.6.35. If you have any
question or anything need to be addressed/fixed, I'd like to do that
sooner rather then later.

Or
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing

2010-05-05 Thread Or Gerlitz
The iser connection teardown flow isn't over till the underlying
Connection Manager (e.g the IB CM) delivers a disconnected or timeout
event through the RDMA-CM. When the remote (target) side isn't reachable,
e.g when some HW e.g port/hca/switch isn't functioning or taken down
administratively, the CM timeout flow is used and the event may be
generated only after relatively long time, in the order of tens of seconds.

The current iser code exposes this possibly long delay to higher layers,
specifically to the iscsid daemon and iscsi kernel stack. As a result,
the iscsi stack doesn't respond well, to the extent of this low-level CM
delay being added to the fail-over time under HA schemes such as the one
provided by DM multipath through the multipathd(8) service.

This patch enhances the reference counting scheme on iser's IB
connections such that the disconnect flow initiated by iscsid from
user space (ep_disconnect) isn't waiting for the CM to deliver the
disconnect/timeout event. On the other hand, the connection teardown
isn't done from iser's view point till the event is delivered.

The iser ib (rdma) connection object is destroyed when its reference
count reaches zero. When this happens on the RDMA-CM callback context,
extra care is taken such that the RDMA-CM does the actual destroying
of the associated ID as doing it in the callback is prohibited.

The reference count of iser ib connection would normally reach
three, where the  relations are
1. conn 
2. conn 
3. cma id 

Signed-off-by: Or Gerlitz 

---

with this patch, multipath fail-over time is about 30 seconds,
which is seen here, when a DD over the multi-path device is done
before/during/after the fail-over

regulary, before taking a port down
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 16.926 s, 1.0 GB/s

taking a port down, causing fail-over during IO
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 46.6117 s, 369 MB/s

after path-failure, back to speed
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 16.6474 s, 1.0 GB/s

13:00:09 iser: iser_event_handler:async event 10 on device mlx4_0 port 1
13:00:24 connection8:0: ping timeout of 10 secs expired, recv timeout 5, last 
rx [...]
13:00:24 connection8:0: detected conn error (1011)
13:00:24 iscsid: Kernel reported iSCSI connection 8:0 error (1011) state (3)
13:00:39 cto-1 kernel: device-mapper: multipath: Failing path 8:48.
13:00:39 cto-1 multipathd: 8:48: mark as failed
13:00:39 cto-1 multipathd: mpathd: remaining active paths: 1
--> the disconnected event is delivered after the IB CM timeout expires
--> but fail-over doesn't pend on this
13:01:56 iser: iser_cma_handler:event 10 status 0 conn 88022dcb39b0 id 
88022cf09400

without this patch, multipath fail-over time is about 130 seconds

before taking a port down
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 16.6812 s, 1.0 GB/s

taking a port down during IO
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 145.094 s, 118 MB/s

after fail-over, back to speed
# dd if=/dev/zero of=/dev/dm-0 bs=128k count=128k
17179869184 bytes (17 GB) copied, 16.8935 s, 1.0 GB/s

14:24:05 iser: iser_event_handler:async event 10 on device mlx4_0 port 1
14:24:20 connection4:0: ping timeout of 10 secs expired, recv timeout 5, last 
rx [...]
14:24:20 kernel: connection4:0: detected conn error (1011)
14:24:21 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
--> the disconnected event is delivered after the IB CM timeout expires
--> fail-over pending on this
14:25:59 iser: iser_cma_handler:event 10 conn 88022625a1b0 id 
880222537c00
14:26:14 session4: session recovery timed out after 15 secs
14:26:14 device-mapper: multipath: Failing path 8:64.
14:26:14 multipathd: mpathd: remaining active paths: 1

 drivers/infiniband/ulp/iser/iscsi_iser.c |9 ++-
 drivers/infiniband/ulp/iser/iscsi_iser.h |3 -
 drivers/infiniband/ulp/iser/iser_verbs.c |   72 +--
 3 files changed, 46 insertions(+), 38 deletions(-)

Index: linux-2.6.34-rc6/drivers/infiniband/ulp/iser/iser_verbs.c
===
--- linux-2.6.34-rc6.orig/drivers/infiniband/ulp/iser/iser_verbs.c
+++ linux-2.6.34-rc6/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -238,7 +238,7 @@ alloc_err:
  * releases the FMR pool, QP and CMA ID objects, returns 0 on success,
  * -1 on failure
  */
-static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
+static int iser_free_ib_conn_res(struct iser_conn *ib_conn, int can_destroy_id)
 {
BUG_ON(ib_conn == NULL);

@@ -253,7 +253,8 @@ static int iser_free_ib_conn_res(struct
if (ib_conn->qp != NULL)
rdma_destroy_qp(ib_conn->cma_id);

-   if (ib_conn->cma_id != NULL)
+   /* if cma handler context, the caller acts s.t the cma destroy the id */
+   if