Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
Vu Do you know? Tziporet -Original Message- From: Bart Van Assche [mailto:bvanass...@acm.org] Sent: Wednesday, October 17, 2012 9:09 AM To: Rupert Dance Cc: Tziporet Koren; Vladimir Sokolovsky; 'ewg' Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock On 10/17/12 05:12, Rupert Dance wrote: > However the Module took a long time (~1-2 minutes) to unload 2. > Message saying something to the effect of 'stale > connection...retrying' was observed That behavior is consistent with the behavior of the ib_srp driver in the latest upstream kernel (3.7-rc1), isn't it ? Bart. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
On 10/17/12 05:12, Rupert Dance wrote: However the Module took a long time (~1-2 minutes) to unload 2. Message saying something to the effect of 'stale connection...retrying' was observed That behavior is consistent with the behavior of the ib_srp driver in the latest upstream kernel (3.7-rc1), isn't it ? Bart. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
Tziporet, UNH-IOL completed the testing of the new daily build and here is what we found. +++ SL 6.3 2.6.32-279.el6.x86_64 OFED-3.5-20121016-0341.tgz 16-Oct-2012 03:42 18M The new build now allows you to load and unload successfully with no system crashes. We were also able to run the OFA Interop SRP tests successfully. However the Module took a long time (~1-2 minutes) to unload 2. Message saying something to the effect of 'stale connection...retrying' was observed The attached file is a capture from the dmesg output. This is bug 2374. Thanks Rupert -Original Message- From: Rupert Dance [mailto:rsda...@soft-forge.com] Sent: Tuesday, October 16, 2012 6:59 AM To: 'Tziporet Koren' Cc: 'Vladimir Sokolovsky' Subject: RE: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock Tziporet, I have asked them to get this done today. I will let you know as soon as I can confirm. Thanks Rupert -Original Message- From: Tziporet Koren [mailto:tzipo...@mellanox.com] Sent: Tuesday, October 16, 2012 6:36 AM To: Vladimir Sokolovsky; Rupert Dance Cc: 'Bart Van Assche'; 'ewg' Subject: RE: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock Rupert I must get your input for SRP stability to know when we can build a new RC Thanks Tziporet -Original Message- From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] Sent: Tuesday, October 16, 2012 12:34 PM To: Rupert Dance Cc: 'Bart Van Assche'; 'ewg'; Tziporet Koren Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock On 10/15/2012 04:46 PM, Rupert Dance wrote: > Vlad, > > Thanks for getting this done. Is this in today's daily build or if not > when will I have access? > > Thanks > > Rupert Hi Rupert, Yes, today's daily build includes this fix. Regards, Vladimir > > -Original Message- > From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] > Sent: Monday, October 15, 2012 9:28 AM > To: Bart Van Assche > Cc: Rupert Dance; ewg; Tziporet Koren > Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a > deadlock > > On 10/12/2012 02:03 PM, Bart Van Assche wrote: >> Avoid that scsi_remove_host() is invoked from the context of a work >> queue thread on which work has been queued that scsi_remove_host() >> might be waiting for. That avoids that module removal of ib_srp >> triggers a deadlock on a pre-2.6.36 kernel. This patch has been >> tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. >> >> Reported-by: Rupert Dance >> Signed-off-by: Bart Van Assche >> --- > > Applied, > > Regards, > Vladimir > > > scsi host4: ib_srp: new target: id_ext c19d350003c90200 ioc_guid 0002c90300359de0 pkey service_id c19d350003c90200 dgid fe80::::0002:c903:0035:9de1 scsi host4: ib_srp: REJ received scsi host4: REJ reason: stale connection scsi host4: ib_srp: retrying stale connection scsi host4: ib_srp: REJ received scsi host4: REJ reason: stale connection scsi host4: ib_srp: retrying stale connection scsi host4: ib_srp: REJ received scsi host4: REJ reason: stale connection scsi host4: ib_srp: retrying stale connection scsi host4: ib_srp: REJ received scsi host4: REJ reason: stale connection scsi host4: ib_srp: giving up on stale connection scsi host4: ib_srp: Connection failed scsi host5: ib_srp: new target: id_ext c19d350003c90200 ioc_guid 0002c90300359de0 pkey service_id c19d350003c90200 dgid fe80::::0002:c903:0035:9de2 scsi host5: ib_srp: REJ received scsi host5: REJ reason: stale connection scsi host5: ib_srp: retrying stale connection scsi host5: ib_srp: REJ received scsi host5: REJ reason: stale connection scsi host5: ib_srp: retrying stale connection scsi host5: ib_srp: REJ received scsi host5: REJ reason: stale connection scsi host5: ib_srp: retrying stale connection scsi host5: ib_srp: REJ received scsi host5: REJ reason: stale connection scsi host5: ib_srp: giving up on stale connection scsi host5: ib_srp: Connection failed scsi host6: ib_srp: new target: id_ext c09e350003c90200 ioc_guid 0002c90300359e10 pkey service_id c09e350003c90200 dgid fe80::::0002:c903:0035:9e11 scsi6 : SRP.T10:C09E350003C90200 scsi 6:0:0:0: Direct-Access DDN SFA 120001.50 PQ: 0 ANSI: 5 sd 6:0:0:0: Attached scsi generic sg2 type 0 sd 6:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments. sd 6:0:0:0: [sdb] Unit Not Ready sd 6:0:0:0: [sdb] Sense Key : Unit Attention [current] sd 6:0:0:0: [sdb] Add. Sense: Reported luns data has changed sd 6:0:0:0: [sdb] 4412407808 512-byte logical blocks: (2.25 TB/2.05 TiB) sd 6:0:0:0: [sdb] Write Protect is off sd 6:0:0:0: [sdb] Mode Sense: 6f 00 10 08 scsi 6:0:0:4: Direct-Access DDN SFA 120001.50 PQ: 0 ANSI: 5 sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
Rupert I must get your input for SRP stability to know when we can build a new RC Thanks Tziporet -Original Message- From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] Sent: Tuesday, October 16, 2012 12:34 PM To: Rupert Dance Cc: 'Bart Van Assche'; 'ewg'; Tziporet Koren Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock On 10/15/2012 04:46 PM, Rupert Dance wrote: > Vlad, > > Thanks for getting this done. Is this in today's daily build or if not > when will I have access? > > Thanks > > Rupert Hi Rupert, Yes, today's daily build includes this fix. Regards, Vladimir > > -Original Message- > From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] > Sent: Monday, October 15, 2012 9:28 AM > To: Bart Van Assche > Cc: Rupert Dance; ewg; Tziporet Koren > Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a > deadlock > > On 10/12/2012 02:03 PM, Bart Van Assche wrote: >> Avoid that scsi_remove_host() is invoked from the context of a work >> queue thread on which work has been queued that scsi_remove_host() >> might be waiting for. That avoids that module removal of ib_srp >> triggers a deadlock on a pre-2.6.36 kernel. This patch has been >> tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. >> >> Reported-by: Rupert Dance >> Signed-off-by: Bart Van Assche >> --- > > Applied, > > Regards, > Vladimir > > > ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
On 10/15/2012 04:46 PM, Rupert Dance wrote: Vlad, Thanks for getting this done. Is this in today's daily build or if not when will I have access? Thanks Rupert Hi Rupert, Yes, today's daily build includes this fix. Regards, Vladimir -Original Message- From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] Sent: Monday, October 15, 2012 9:28 AM To: Bart Van Assche Cc: Rupert Dance; ewg; Tziporet Koren Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock On 10/12/2012 02:03 PM, Bart Van Assche wrote: Avoid that scsi_remove_host() is invoked from the context of a work queue thread on which work has been queued that scsi_remove_host() might be waiting for. That avoids that module removal of ib_srp triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. Reported-by: Rupert Dance Signed-off-by: Bart Van Assche --- Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
Vlad, Thanks for getting this done. Is this in today's daily build or if not when will I have access? Thanks Rupert -Original Message- From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] Sent: Monday, October 15, 2012 9:28 AM To: Bart Van Assche Cc: Rupert Dance; ewg; Tziporet Koren Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock On 10/12/2012 02:03 PM, Bart Van Assche wrote: > Avoid that scsi_remove_host() is invoked from the context of a work > queue thread on which work has been queued that scsi_remove_host() > might be waiting for. That avoids that module removal of ib_srp > triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested > on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. > > Reported-by: Rupert Dance > Signed-off-by: Bart Van Assche > --- Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
On 10/12/2012 02:03 PM, Bart Van Assche wrote: Avoid that scsi_remove_host() is invoked from the context of a work queue thread on which work has been queued that scsi_remove_host() might be waiting for. That avoids that module removal of ib_srp triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. Reported-by: Rupert Dance Signed-off-by: Bart Van Assche --- Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock
Avoid that scsi_remove_host() is invoked from the context of a work queue thread on which work has been queued that scsi_remove_host() might be waiting for. That avoids that module removal of ib_srp triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2. Reported-by: Rupert Dance Signed-off-by: Bart Van Assche --- .../0025-ib_srp-Backport-to-older-kernels.patch| 59 +++- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/patches/0025-ib_srp-Backport-to-older-kernels.patch b/patches/0025-ib_srp-Backport-to-older-kernels.patch index 20edccf..d070430 100644 --- a/patches/0025-ib_srp-Backport-to-older-kernels.patch +++ b/patches/0025-ib_srp-Backport-to-older-kernels.patch @@ -12,7 +12,7 @@ Signed-off-by: Bart Van Assche 1 file changed, 108 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c -index bcbf22e..fab74e0 100644 +index bcbf22e..d42e9c4 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -30,8 +30,13 @@ @@ -29,7 +29,7 @@ index bcbf22e..fab74e0 100644 #include #include #include -@@ -41,21 +46,27 @@ +@@ -41,21 +46,32 @@ #include #include @@ -57,22 +57,15 @@ index bcbf22e..fab74e0 100644 +#define pr_warn pr_warning +#endif + ++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36) ++static struct workqueue_struct *srp_wq; ++#define ib_wq srp_wq ++#endif ++ MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator " "v" DRV_VERSION " (" DRV_RELDATE ")"); -@@ -675,7 +686,11 @@ err: - if (target->state == SRP_TARGET_CONNECTING) { - target->state = SRP_TARGET_DEAD; - INIT_WORK(&target->work, srp_remove_work); -+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36) - queue_work(ib_wq, &target->work); -+#else -+ schedule_work(&target->work); -+#endif - } - spin_unlock_irq(&target->lock); - -@@ -1254,7 +1269,50 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr) +@@ -1254,7 +1270,50 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr) } } @@ -124,7 +117,7 @@ index bcbf22e..fab74e0 100644 { struct srp_target_port *target = host_to_target(shost); struct srp_request *req; -@@ -1822,6 +1880,9 @@ static struct scsi_host_template srp_template = { +@@ -1822,6 +1881,9 @@ static struct scsi_host_template srp_template = { .name = "InfiniBand SRP initiator", .proc_name = DRV_NAME, .info = srp_target_info, @@ -134,18 +127,32 @@ index bcbf22e..fab74e0 100644 .queuecommand = srp_queuecommand, .eh_abort_handler = srp_abort, .eh_device_reset_handler= srp_reset_device, -@@ -2412,7 +2473,11 @@ static void srp_remove_one(struct ib_device *device) -* started before we marked our target ports as -* removed, and any target port removal tasks. -*/ -+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36) - flush_workqueue(ib_wq); -+#else -+ flush_scheduled_work(); +@@ -2491,11 +2553,25 @@ static int __init srp_init_module(void) + return ret; + } + ++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36) ++ srp_wq = create_workqueue("srp"); ++ if (IS_ERR(srp_wq)) { ++ ib_unregister_client(&srp_client); ++ ib_sa_unregister_client(&srp_sa_client); ++ class_unregister(&srp_class); ++ srp_release_transport(ib_srp_transport_template); ++ return PTR_ERR(srp_wq); ++ } +#endif ++ + return 0; + } - list_for_each_entry_safe(target, tmp_target, -&host->target_list, list) { + static void __exit srp_cleanup_module(void) + { ++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36) ++ destroy_workqueue(srp_wq); ++#endif + ib_unregister_client(&srp_client); + ib_sa_unregister_client(&srp_sa_client); + class_unregister(&srp_class); -- 1.7.9.5 -- 1.7.10.4 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg