Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-17 Thread Tziporet Koren
Vu

Do you know?

Tziporet

-Original Message-
From: Bart Van Assche [mailto:bvanass...@acm.org] 
Sent: Wednesday, October 17, 2012 9:09 AM
To: Rupert Dance
Cc: Tziporet Koren; Vladimir Sokolovsky; 'ewg'
Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

On 10/17/12 05:12, Rupert Dance wrote:
> However the Module took a long time (~1-2 minutes) to unload 2. 
> Message saying something to the effect of 'stale 
> connection...retrying' was observed

That behavior is consistent with the behavior of the ib_srp driver in the 
latest upstream kernel (3.7-rc1), isn't it ?

Bart.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-17 Thread Bart Van Assche

On 10/17/12 05:12, Rupert Dance wrote:

However the Module took a long time (~1-2 minutes) to unload 2. Message
saying something to the effect of 'stale connection...retrying' was observed


That behavior is consistent with the behavior of the ib_srp driver in 
the latest upstream kernel (3.7-rc1), isn't it ?


Bart.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-16 Thread Rupert Dance
Tziporet,

UNH-IOL completed the testing of the new daily build and here is what we
found.

+++

SL 6.3 2.6.32-279.el6.x86_64
OFED-3.5-20121016-0341.tgz 16-Oct-2012 03:42  18M   

The new build now allows you to load and unload successfully with no system
crashes. We were also able to run the OFA Interop SRP tests successfully.

However the Module took a long time (~1-2 minutes) to unload 2. Message
saying something to the effect of 'stale connection...retrying' was observed

The attached file is a capture from the dmesg output.



This is bug 2374.

Thanks

Rupert

-Original Message-
From: Rupert Dance [mailto:rsda...@soft-forge.com] 
Sent: Tuesday, October 16, 2012 6:59 AM
To: 'Tziporet Koren'
Cc: 'Vladimir Sokolovsky'
Subject: RE: [PATCH] ib_srp: Avoid that module removal can trigger a
deadlock

Tziporet,

I have asked them to get this done today. I will let you know as soon as I
can confirm.

Thanks

Rupert

-Original Message-
From: Tziporet Koren [mailto:tzipo...@mellanox.com]
Sent: Tuesday, October 16, 2012 6:36 AM
To: Vladimir Sokolovsky; Rupert Dance
Cc: 'Bart Van Assche'; 'ewg'
Subject: RE: [PATCH] ib_srp: Avoid that module removal can trigger a
deadlock

Rupert
I must get your input for SRP stability to know when we can build a new RC 

Thanks
Tziporet

-Original Message-
From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il]
Sent: Tuesday, October 16, 2012 12:34 PM
To: Rupert Dance
Cc: 'Bart Van Assche'; 'ewg'; Tziporet Koren
Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a
deadlock

On 10/15/2012 04:46 PM, Rupert Dance wrote:
> Vlad,
>
> Thanks for getting this done. Is this in today's daily build or if not 
> when will I have access?
>
> Thanks
>
> Rupert

Hi Rupert,
Yes, today's daily build includes this fix.

Regards,
Vladimir

>
> -Original Message-
> From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il]
> Sent: Monday, October 15, 2012 9:28 AM
> To: Bart Van Assche
> Cc: Rupert Dance; ewg; Tziporet Koren
> Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a 
> deadlock
>
> On 10/12/2012 02:03 PM, Bart Van Assche wrote:
>> Avoid that scsi_remove_host() is invoked from the context of a work 
>> queue thread on which work has been queued that scsi_remove_host() 
>> might be waiting for. That avoids that module removal of ib_srp 
>> triggers a deadlock on a pre-2.6.36 kernel. This patch has been 
>> tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.
>>
>> Reported-by: Rupert Dance 
>> Signed-off-by: Bart Van Assche 
>> ---
>
> Applied,
>
> Regards,
> Vladimir
>
>
>


scsi host4: ib_srp: new target: id_ext c19d350003c90200 ioc_guid 
0002c90300359de0 pkey  service_id c19d350003c90200 dgid 
fe80::::0002:c903:0035:9de1
scsi host4: ib_srp: REJ received
scsi host4:   REJ reason: stale connection
scsi host4: ib_srp: retrying stale connection
scsi host4: ib_srp: REJ received
scsi host4:   REJ reason: stale connection
scsi host4: ib_srp: retrying stale connection
scsi host4: ib_srp: REJ received
scsi host4:   REJ reason: stale connection
scsi host4: ib_srp: retrying stale connection
scsi host4: ib_srp: REJ received
scsi host4:   REJ reason: stale connection
scsi host4: ib_srp: giving up on stale connection
scsi host4: ib_srp: Connection failed
scsi host5: ib_srp: new target: id_ext c19d350003c90200 ioc_guid 
0002c90300359de0 pkey  service_id c19d350003c90200 dgid 
fe80::::0002:c903:0035:9de2
scsi host5: ib_srp: REJ received
scsi host5:   REJ reason: stale connection
scsi host5: ib_srp: retrying stale connection
scsi host5: ib_srp: REJ received
scsi host5:   REJ reason: stale connection
scsi host5: ib_srp: retrying stale connection
scsi host5: ib_srp: REJ received
scsi host5:   REJ reason: stale connection
scsi host5: ib_srp: retrying stale connection
scsi host5: ib_srp: REJ received
scsi host5:   REJ reason: stale connection
scsi host5: ib_srp: giving up on stale connection
scsi host5: ib_srp: Connection failed
scsi host6: ib_srp: new target: id_ext c09e350003c90200 ioc_guid 
0002c90300359e10 pkey  service_id c09e350003c90200 dgid 
fe80::::0002:c903:0035:9e11
scsi6 : SRP.T10:C09E350003C90200
scsi 6:0:0:0: Direct-Access DDN  SFA 120001.50 PQ: 0 ANSI: 5
sd 6:0:0:0: Attached scsi generic sg2 type 0
sd 6:0:0:0: Warning! Received an indication that the LUN assignments on this 
target have changed. The Linux SCSI layer does not automatically remap LUN 
assignments.
sd 6:0:0:0: [sdb] Unit Not Ready
sd 6:0:0:0: [sdb] Sense Key : Unit Attention [current] 
sd 6:0:0:0: [sdb] Add. Sense: Reported luns data has changed
sd 6:0:0:0: [sdb] 4412407808 512-byte logical blocks: (2.25 TB/2.05 TiB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Mode Sense: 6f 00 10 08
scsi 6:0:0:4: Direct-Access DDN  SFA 120001.50 PQ: 0 ANSI: 5
sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled

Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-16 Thread Tziporet Koren
Rupert
I must get your input for SRP stability to know when we can build a new RC 

Thanks
Tziporet

-Original Message-
From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] 
Sent: Tuesday, October 16, 2012 12:34 PM
To: Rupert Dance
Cc: 'Bart Van Assche'; 'ewg'; Tziporet Koren
Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

On 10/15/2012 04:46 PM, Rupert Dance wrote:
> Vlad,
>
> Thanks for getting this done. Is this in today's daily build or if not 
> when will I have access?
>
> Thanks
>
> Rupert

Hi Rupert,
Yes, today's daily build includes this fix.

Regards,
Vladimir

>
> -Original Message-
> From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il]
> Sent: Monday, October 15, 2012 9:28 AM
> To: Bart Van Assche
> Cc: Rupert Dance; ewg; Tziporet Koren
> Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a 
> deadlock
>
> On 10/12/2012 02:03 PM, Bart Van Assche wrote:
>> Avoid that scsi_remove_host() is invoked from the context of a work 
>> queue thread on which work has been queued that scsi_remove_host() 
>> might be waiting for. That avoids that module removal of ib_srp 
>> triggers a deadlock on a pre-2.6.36 kernel. This patch has been 
>> tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.
>>
>> Reported-by: Rupert Dance 
>> Signed-off-by: Bart Van Assche 
>> ---
>
> Applied,
>
> Regards,
> Vladimir
>
>
>

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-16 Thread Vladimir Sokolovsky

On 10/15/2012 04:46 PM, Rupert Dance wrote:

Vlad,

Thanks for getting this done. Is this in today's daily build or if not when
will I have access?

Thanks

Rupert


Hi Rupert,
Yes, today's daily build includes this fix.

Regards,
Vladimir



-Original Message-
From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il]
Sent: Monday, October 15, 2012 9:28 AM
To: Bart Van Assche
Cc: Rupert Dance; ewg; Tziporet Koren
Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a
deadlock

On 10/12/2012 02:03 PM, Bart Van Assche wrote:

Avoid that scsi_remove_host() is invoked from the context of a work
queue thread on which work has been queued that scsi_remove_host()
might be waiting for. That avoids that module removal of ib_srp
triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested
on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.

Reported-by: Rupert Dance 
Signed-off-by: Bart Van Assche 
---


Applied,

Regards,
Vladimir





___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-15 Thread Rupert Dance
Vlad,

Thanks for getting this done. Is this in today's daily build or if not when
will I have access?

Thanks

Rupert

-Original Message-
From: Vladimir Sokolovsky [mailto:v...@dev.mellanox.co.il] 
Sent: Monday, October 15, 2012 9:28 AM
To: Bart Van Assche
Cc: Rupert Dance; ewg; Tziporet Koren
Subject: Re: [PATCH] ib_srp: Avoid that module removal can trigger a
deadlock

On 10/12/2012 02:03 PM, Bart Van Assche wrote:
> Avoid that scsi_remove_host() is invoked from the context of a work 
> queue thread on which work has been queued that scsi_remove_host() 
> might be waiting for. That avoids that module removal of ib_srp 
> triggers a deadlock on a pre-2.6.36 kernel. This patch has been tested 
> on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.
>
> Reported-by: Rupert Dance 
> Signed-off-by: Bart Van Assche 
> ---

Applied,

Regards,
Vladimir


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-15 Thread Vladimir Sokolovsky

On 10/12/2012 02:03 PM, Bart Van Assche wrote:

Avoid that scsi_remove_host() is invoked from the context of a work
queue thread on which work has been queued that scsi_remove_host()
might be waiting for. That avoids that module removal of ib_srp
triggers a deadlock on a pre-2.6.36 kernel. This patch has been
tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.

Reported-by: Rupert Dance 
Signed-off-by: Bart Van Assche 
---


Applied,

Regards,
Vladimir

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] ib_srp: Avoid that module removal can trigger a deadlock

2012-10-12 Thread Bart Van Assche
Avoid that scsi_remove_host() is invoked from the context of a work
queue thread on which work has been queued that scsi_remove_host()
might be waiting for. That avoids that module removal of ib_srp
triggers a deadlock on a pre-2.6.36 kernel. This patch has been
tested on RHEL 6.1, RHEL 6.2, RHEL 6.3 and SLES 11 SP2.

Reported-by: Rupert Dance 
Signed-off-by: Bart Van Assche 
---
 .../0025-ib_srp-Backport-to-older-kernels.patch|   59 +++-
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/patches/0025-ib_srp-Backport-to-older-kernels.patch 
b/patches/0025-ib_srp-Backport-to-older-kernels.patch
index 20edccf..d070430 100644
--- a/patches/0025-ib_srp-Backport-to-older-kernels.patch
+++ b/patches/0025-ib_srp-Backport-to-older-kernels.patch
@@ -12,7 +12,7 @@ Signed-off-by: Bart Van Assche 
  1 file changed, 108 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
-index bcbf22e..fab74e0 100644
+index bcbf22e..d42e9c4 100644
 --- a/drivers/infiniband/ulp/srp/ib_srp.c
 +++ b/drivers/infiniband/ulp/srp/ib_srp.c
 @@ -30,8 +30,13 @@
@@ -29,7 +29,7 @@ index bcbf22e..fab74e0 100644
  #include 
  #include 
  #include 
-@@ -41,21 +46,27 @@
+@@ -41,21 +46,32 @@
  #include 
  #include 
  
@@ -57,22 +57,15 @@ index bcbf22e..fab74e0 100644
 +#define pr_warn pr_warning
 +#endif
 +
++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
++static struct workqueue_struct *srp_wq;
++#define ib_wq srp_wq
++#endif
++
  MODULE_AUTHOR("Roland Dreier");
  MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator "
   "v" DRV_VERSION " (" DRV_RELDATE ")");
-@@ -675,7 +686,11 @@ err:
-   if (target->state == SRP_TARGET_CONNECTING) {
-   target->state = SRP_TARGET_DEAD;
-   INIT_WORK(&target->work, srp_remove_work);
-+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36)
-   queue_work(ib_wq, &target->work);
-+#else
-+  schedule_work(&target->work);
-+#endif
-   }
-   spin_unlock_irq(&target->lock);
- 
-@@ -1254,7 +1269,50 @@ static void srp_send_completion(struct ib_cq *cq, void 
*target_ptr)
+@@ -1254,7 +1270,50 @@ static void srp_send_completion(struct ib_cq *cq, void 
*target_ptr)
}
  }
  
@@ -124,7 +117,7 @@ index bcbf22e..fab74e0 100644
  {
struct srp_target_port *target = host_to_target(shost);
struct srp_request *req;
-@@ -1822,6 +1880,9 @@ static struct scsi_host_template srp_template = {
+@@ -1822,6 +1881,9 @@ static struct scsi_host_template srp_template = {
.name   = "InfiniBand SRP initiator",
.proc_name  = DRV_NAME,
.info   = srp_target_info,
@@ -134,18 +127,32 @@ index bcbf22e..fab74e0 100644
.queuecommand   = srp_queuecommand,
.eh_abort_handler   = srp_abort,
.eh_device_reset_handler= srp_reset_device,
-@@ -2412,7 +2473,11 @@ static void srp_remove_one(struct ib_device *device)
-* started before we marked our target ports as
-* removed, and any target port removal tasks.
-*/
-+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36)
-   flush_workqueue(ib_wq);
-+#else
-+  flush_scheduled_work();
+@@ -2491,11 +2553,25 @@ static int __init srp_init_module(void)
+   return ret;
+   }
+ 
++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
++  srp_wq = create_workqueue("srp");
++  if (IS_ERR(srp_wq)) {
++  ib_unregister_client(&srp_client);
++  ib_sa_unregister_client(&srp_sa_client);
++  class_unregister(&srp_class);
++  srp_release_transport(ib_srp_transport_template);
++  return PTR_ERR(srp_wq);
++  }
 +#endif
++
+   return 0;
+ }
  
-   list_for_each_entry_safe(target, tmp_target,
-&host->target_list, list) {
+ static void __exit srp_cleanup_module(void)
+ {
++#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
++  destroy_workqueue(srp_wq);
++#endif
+   ib_unregister_client(&srp_client);
+   ib_sa_unregister_client(&srp_sa_client);
+   class_unregister(&srp_class);
 -- 
 1.7.9.5
 
-- 
1.7.10.4



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg