Re: [PATCH] IB/cm: Fix a recently introduced deadlock

2016-01-06 Thread Erez Shitrit
On Fri, Jan 1, 2016 at 2:17 PM, Bart Van Assche
 wrote:
> ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
> that can be locked from inside an interrupt handler. Hence do not
> enable interrupts inside cm_enter_timewait() if called with interrupts
> disabled.
>
> This patch fixes e.g. the following deadlock:
>
> =
> [ INFO: inconsistent lock state ]
> 4.4.0-rc7+ #1 Tainted: GE
> -
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> (&(_id_priv->lock)->rlock){?.+...}, at: [] 
> cm_establish+0x
> 74/0x1b0 [ib_cm]
> {HARDIRQ-ON-W} state was registered at:
>   [] mark_held_locks+0x71/0x90
>   [] trace_hardirqs_on_caller+0xa7/0x1c0
>   [] trace_hardirqs_on+0xd/0x10
>   [] _raw_spin_unlock_irq+0x2b/0x40
>   [] cm_enter_timewait+0xae/0x100 [ib_cm]
>   [] ib_send_cm_drep+0xb6/0x190 [ib_cm]
>   [] srp_cm_handler+0x128/0x1a0 [ib_srp]
>   [] cm_process_work+0x20/0xf0 [ib_cm]
>   [] cm_dreq_handler+0x135/0x2c0 [ib_cm]
>   [] cm_work_handler+0x75/0xd0 [ib_cm]
>   [] process_one_work+0x1bd/0x460
>   [] worker_thread+0x118/0x420
>   [] kthread+0xe4/0x100
>   [] ret_from_fork+0x3f/0x70
> irq event stamp: 1672286
> hardirqs last  enabled at (1672283): [] poll_idle+0x10/0x80
> hardirqs last disabled at (1672284): [] 
> common_interrupt+0x84/0x89
> softirqs last  enabled at (1672286): [] 
> _local_bh_enable+0x1c/0x50
> softirqs last disabled at (1672285): [] irq_enter+0x47/0x70
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>CPU0
>
>   lock(&(_id_priv->lock)->rlock);
>   
> lock(&(_id_priv->lock)->rlock);
>
>  *** DEADLOCK ***
>
> no locks held by swapper/8/0.
>
> stack backtrace:
> CPU: 8 PID: 0 Comm: swapper/8 Tainted: GE   4.4.0-rc7+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
>  88045af5e950 88046e503a88 81251c1b 0007
>  0006 0003 88045af5ddc0 88046e503ad8
>  810a32f4   0001
> Call Trace:
>[] dump_stack+0x4f/0x74
>  [] print_usage_bug+0x184/0x190
>  [] mark_lock_irq+0xf2/0x290
>  [] mark_lock+0x115/0x1b0
>  [] mark_irqflags+0x15c/0x170
>  [] __lock_acquire+0x1ef/0x560
>  [] lock_acquire+0x62/0x80
>  [] _raw_spin_lock_irqsave+0x43/0x60
>  [] cm_establish+0x74/0x1b0 [ib_cm]
>  [] ib_cm_notify+0x31/0x100 [ib_cm]
>  [] srpt_qp_event+0x54/0xd0 [ib_srpt]
>  [] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
>  [] mlx4_qp_event+0x69/0xd0 [mlx4_core]
>  [] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
>  [] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
>  [] handle_irq_event_percpu+0x40/0x110
>  [] handle_irq_event+0x3f/0x70
>  [] handle_edge_irq+0x79/0x120
>  [] handle_irq+0x5d/0x130
>  [] do_IRQ+0x6d/0x130
>  [] common_interrupt+0x89/0x89
>[] cpuidle_enter_state+0xcf/0x200
>  [] cpuidle_enter+0x12/0x20
>  [] call_cpuidle+0x36/0x60
>  [] cpuidle_idle_call+0x63/0x110
>  [] cpu_idle_loop+0xfa/0x130
>  [] cpu_startup_entry+0xe/0x10
>  [] start_secondary+0x83/0x90
>
> Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's 
> going away")
> Signed-off-by: Bart Van Assche 

Acked-by: Erez Shitrit 

> Cc: Erez Shitrit 
> Cc: stable 
> ---
>  drivers/infiniband/core/cm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 0a26dd6..d6d2b35 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -782,11 +782,11 @@ static void cm_enter_timewait(struct cm_id_private 
> *cm_id_priv)
> wait_time = cm_convert_to_ms(cm_id_priv->av.timeout);
>
> /* Check if the device started its remove_one */
> -   spin_lock_irq();
> +   spin_lock_irqsave(, flags);
> if (!cm_dev->going_down)
> queue_delayed_work(cm.wq, 
> _id_priv->timewait_info->work.work,
>msecs_to_jiffies(wait_time));
> -   spin_unlock_irq();
> +   spin_unlock_irqrestore(, flags);
>
> cm_id_priv->timewait_info = NULL;
>  }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/cm: Fix a recently introduced deadlock

2016-01-01 Thread Bart Van Assche
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:

=
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: GE
-
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(_id_priv->lock)->rlock){?.+...}, at: [] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [] mark_held_locks+0x71/0x90
  [] trace_hardirqs_on_caller+0xa7/0x1c0
  [] trace_hardirqs_on+0xd/0x10
  [] _raw_spin_unlock_irq+0x2b/0x40
  [] cm_enter_timewait+0xae/0x100 [ib_cm]
  [] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [] cm_process_work+0x20/0xf0 [ib_cm]
  [] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [] cm_work_handler+0x75/0xd0 [ib_cm]
  [] process_one_work+0x1bd/0x460
  [] worker_thread+0x118/0x420
  [] kthread+0xe4/0x100
  [] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [] 
common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [] 
_local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&(_id_priv->lock)->rlock);
  
lock(&(_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: GE   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 88045af5e950 88046e503a88 81251c1b 0007
 0006 0003 88045af5ddc0 88046e503ad8
 810a32f4   0001
Call Trace:
   [] dump_stack+0x4f/0x74
 [] print_usage_bug+0x184/0x190
 [] mark_lock_irq+0xf2/0x290
 [] mark_lock+0x115/0x1b0
 [] mark_irqflags+0x15c/0x170
 [] __lock_acquire+0x1ef/0x560
 [] lock_acquire+0x62/0x80
 [] _raw_spin_lock_irqsave+0x43/0x60
 [] cm_establish+0x74/0x1b0 [ib_cm]
 [] ib_cm_notify+0x31/0x100 [ib_cm]
 [] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [] handle_irq_event_percpu+0x40/0x110
 [] handle_irq_event+0x3f/0x70
 [] handle_edge_irq+0x79/0x120
 [] handle_irq+0x5d/0x130
 [] do_IRQ+0x6d/0x130
 [] common_interrupt+0x89/0x89
   [] cpuidle_enter_state+0xcf/0x200
 [] cpuidle_enter+0x12/0x20
 [] call_cpuidle+0x36/0x60
 [] cpuidle_idle_call+0x63/0x110
 [] cpu_idle_loop+0xfa/0x130
 [] cpu_startup_entry+0xe/0x10
 [] start_secondary+0x83/0x90

Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going 
away")
Signed-off-by: Bart Van Assche 
Cc: Erez Shitrit 
Cc: stable 
---
 drivers/infiniband/core/cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0a26dd6..d6d2b35 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -782,11 +782,11 @@ static void cm_enter_timewait(struct cm_id_private 
*cm_id_priv)
wait_time = cm_convert_to_ms(cm_id_priv->av.timeout);
 
/* Check if the device started its remove_one */
-   spin_lock_irq();
+   spin_lock_irqsave(, flags);
if (!cm_dev->going_down)
queue_delayed_work(cm.wq, _id_priv->timewait_info->work.work,
   msecs_to_jiffies(wait_time));
-   spin_unlock_irq();
+   spin_unlock_irqrestore(, flags);
 
cm_id_priv->timewait_info = NULL;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html