On Fri, Jan 1, 2016 at 2:17 PM, Bart Van Assche
wrote:
> ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
> that can be locked from inside an interrupt handler. Hence do not
> enable interrupts inside cm_enter_timewait() if called with interrupts
> disabled.
>
> This patch fixes e.g. the following deadlock:
>
> =
> [ INFO: inconsistent lock state ]
> 4.4.0-rc7+ #1 Tainted: GE
> -
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> (&(_id_priv->lock)->rlock){?.+...}, at: []
> cm_establish+0x
> 74/0x1b0 [ib_cm]
> {HARDIRQ-ON-W} state was registered at:
> [] mark_held_locks+0x71/0x90
> [] trace_hardirqs_on_caller+0xa7/0x1c0
> [] trace_hardirqs_on+0xd/0x10
> [] _raw_spin_unlock_irq+0x2b/0x40
> [] cm_enter_timewait+0xae/0x100 [ib_cm]
> [] ib_send_cm_drep+0xb6/0x190 [ib_cm]
> [] srp_cm_handler+0x128/0x1a0 [ib_srp]
> [] cm_process_work+0x20/0xf0 [ib_cm]
> [] cm_dreq_handler+0x135/0x2c0 [ib_cm]
> [] cm_work_handler+0x75/0xd0 [ib_cm]
> [] process_one_work+0x1bd/0x460
> [] worker_thread+0x118/0x420
> [] kthread+0xe4/0x100
> [] ret_from_fork+0x3f/0x70
> irq event stamp: 1672286
> hardirqs last enabled at (1672283): [] poll_idle+0x10/0x80
> hardirqs last disabled at (1672284): []
> common_interrupt+0x84/0x89
> softirqs last enabled at (1672286): []
> _local_bh_enable+0x1c/0x50
> softirqs last disabled at (1672285): [] irq_enter+0x47/0x70
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
>CPU0
>
> lock(&(_id_priv->lock)->rlock);
>
> lock(&(_id_priv->lock)->rlock);
>
> *** DEADLOCK ***
>
> no locks held by swapper/8/0.
>
> stack backtrace:
> CPU: 8 PID: 0 Comm: swapper/8 Tainted: GE 4.4.0-rc7+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> 88045af5e950 88046e503a88 81251c1b 0007
> 0006 0003 88045af5ddc0 88046e503ad8
> 810a32f4 0001
> Call Trace:
>[] dump_stack+0x4f/0x74
> [] print_usage_bug+0x184/0x190
> [] mark_lock_irq+0xf2/0x290
> [] mark_lock+0x115/0x1b0
> [] mark_irqflags+0x15c/0x170
> [] __lock_acquire+0x1ef/0x560
> [] lock_acquire+0x62/0x80
> [] _raw_spin_lock_irqsave+0x43/0x60
> [] cm_establish+0x74/0x1b0 [ib_cm]
> [] ib_cm_notify+0x31/0x100 [ib_cm]
> [] srpt_qp_event+0x54/0xd0 [ib_srpt]
> [] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
> [] mlx4_qp_event+0x69/0xd0 [mlx4_core]
> [] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
> [] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
> [] handle_irq_event_percpu+0x40/0x110
> [] handle_irq_event+0x3f/0x70
> [] handle_edge_irq+0x79/0x120
> [] handle_irq+0x5d/0x130
> [] do_IRQ+0x6d/0x130
> [] common_interrupt+0x89/0x89
>[] cpuidle_enter_state+0xcf/0x200
> [] cpuidle_enter+0x12/0x20
> [] call_cpuidle+0x36/0x60
> [] cpuidle_idle_call+0x63/0x110
> [] cpu_idle_loop+0xfa/0x130
> [] cpu_startup_entry+0xe/0x10
> [] start_secondary+0x83/0x90
>
> Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's
> going away")
> Signed-off-by: Bart Van Assche
Acked-by: Erez Shitrit
> Cc: Erez Shitrit
> Cc: stable
> ---
> drivers/infiniband/core/cm.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 0a26dd6..d6d2b35 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -782,11 +782,11 @@ static void cm_enter_timewait(struct cm_id_private
> *cm_id_priv)
> wait_time = cm_convert_to_ms(cm_id_priv->av.timeout);
>
> /* Check if the device started its remove_one */
> - spin_lock_irq();
> + spin_lock_irqsave(, flags);
> if (!cm_dev->going_down)
> queue_delayed_work(cm.wq,
> _id_priv->timewait_info->work.work,
>msecs_to_jiffies(wait_time));
> - spin_unlock_irq();
> + spin_unlock_irqrestore(, flags);
>
> cm_id_priv->timewait_info = NULL;
> }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html