Re: Possible locking bug in pm8xxx/pm8001

2013-11-26 Thread Suresh Thiagarajan
Hi Jason

On Sat, Oct 12, 2013 at 2:02 AM, Jason Seba  wrote:
> The pm8xxx driver uses a per-adapter spinlock (pm8001_ha->lock) which
> is usually acquired and released with the irqsave routines. However,
> some functions which are called with the lock held
> (mpi_sata_completion, mpi_sata_event, pm8001_chip_sata_req) will
> temporary release the lock to complete a task. However, when releasing
> and reacquiring the lock in this case, the irqsave routine are not
> used; instead spin_unlock_irq/spin_lock_irq are used. As far as I can
> tell, this is wrong and dangerous, and appears to result in the hard
> lockup shown below.
>
> It isn't obvious to me what the best way to fix this is. Suggestions?

This can be fixed by using flag variable from pm8001_hba_info
structure instead of taking it as local variable in all the functions.
Will send out a patch soon to fix this.

Regards,
Suresh
>
>
>
> [ 2048.017802] [ cut here ]
> [ 2048.022621] WARNING: CPU: 0 PID: 1606 at kernel/watchdog.c:245
> watchdog_overflow_callback+0xac/0xd0()
> [ 2048.031827] Watchdog detected hard LOCKUP on cpu 0
> [ 2048.036439] Modules linked in: ses enclosure xt_CHECKSUM
> iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge
> sunrpc fcoe 8021q mrp garp libfcoe libfc scsi_transport_fc stp llc
> scsi_tgt xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> ip6_tables binfmt_misc uinput iTCO_wdt iTCO_vendor_support mgag200 ttm
> drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea
> pm80xx libsas scsi_transport_sas joydev dcdbas pcspkr i2c_i801
> i2c_core lpc_ich mfd_core tg3 ptp pps_core [last unloaded:
> speedstep_lib]
> [ 2048.090159] CPU: 0 PID: 1606 Comm: libvirtd Not tainted 3.11.0-rc5+ #2
> [ 2048.096682] Hardware name: Dell Inc. PowerEdge T110 II/015TH9, BIOS
> 2.0.5 03/13/2012
> [ 2048.104410] 00f5 f3401828 c1580fe3 c1710301 f3401858 c10418e4
> c17061bc f3401884
> [ 2048.112277] 0646 c1710301 00f5 c10c415c c10c415c f5822800
> c10c40b0 
> [ 2048.120153] f3401870 c10419a3 0009 f3401868 c17061bc f3401884
> f3401888 c10c415c
> [ 2048.128022] Call Trace:
> [ 2048.130476] [] dump_stack+0x41/0x56
> [ 2048.134921] [] warn_slowpath_common+0x84/0xa0
> [ 2048.140231] [] ? watchdog_overflow_callback+0xac/0xd0
> [ 2048.146236] [] ? watchdog_overflow_callback+0xac/0xd0
> [ 2048.152239] [] ? watchdog_cleanup+0x10/0x10
> [ 2048.157369] [] warn_slowpath_fmt+0x33/0x40
> [ 2048.162420] [] watchdog_overflow_callback+0xac/0xd0
> [ 2048.168243] [] __perf_event_overflow+0xaf/0x280
> [ 2048.173729] [] ? x86_perf_event_set_period+0x12a/0x1e0
> [ 2048.179819] [] perf_event_overflow+0x15/0x20
> [ 2048.185043] [] intel_pmu_handle_irq+0x1bb/0x390
> [ 2048.190519] [] ? sched_clock_cpu+0x11d/0x1a0
> [ 2048.195744] [] perf_event_nmi_handler+0x31/0x50
> [ 2048.201229] [] nmi_handle+0x52/0x190
> [ 2048.205762] [] ? serial8250_modem_status+0xb0/0xb0
> [ 2048.211504] [] do_nmi+0xe2/0x3d0
> [ 2048.215681] [] nmi_stack_correct+0x2f/0x34
> [ 2048.220732] [] ? __pci_bus_size_bridges+0x868/0x890
> [ 2048.226563] [] ? _raw_spin_lock_irqsave+0x22/0x30
> [ 2048.232215] [] process_oq+0x6ae/0x1820 [pm80xx]
> [ 2048.237698] [] pm8001_chip_isr+0x23/0x40 [pm80xx]
> [ 2048.243356] [] pm8001_tasklet+0x1f/0x30 [pm80xx]
> [ 2048.248925] [] tasklet_action+0x8e/0xa0
> [ 2048.253709] [] __do_softirq+0xaf/0x200
> [ 2048.258406] [] irq_exit+0xa5/0xb0
> [ 2048.262676] [] do_IRQ+0x4b/0xc0
> [ 2048.266768] [] ? add_wait_queue+0x3b/0x50
> [ 2048.271730] [] common_interrupt+0x33/0x38
> [ 2048.276687] [] ? qi_flush_dev_iotlb+0x98/0xf0
> [ 2048.282001] [] ? poll_schedule_timeout+0x1/0xb0
> [ 2048.287483] [] ? do_sys_poll+0x4ad/0x530
> [ 2048.292352] [] ? __pollwait+0xe0/0xe0
> [ 2048.296962] [] ? __pollwait+0xe0/0xe0
> [ 2048.301581] [] ? __pollwait+0xe0/0xe0
> [ 2048.306197] [] ? __pollwait+0xe0/0xe0
> [ 2048.310808] [] ? __pollwait+0xe0/0xe0
> [ 2048.315416] [] ? __pollwait+0xe0/0xe0
> [ 2048.320026] [] ? netlink_recvmsg+0x294/0x340
> [ 2048.325244] [] ? selinux_socket_recvmsg+0x1d/0x20
> [ 2048.330901] [] ? sock_recvmsg+0xc0/0xf0
> [ 2048.335692] [] ? update_curr+0x1e7/0x290
> [ 2048.340572] [] ? move_addr_to_user+0x7d/0xb0
> [ 2048.345796] [] ? ___sys_recvmsg+0x142/0x1e0
> [ 2048.350933] [] ? kernel_sendmsg+0x50/0x50
> [ 2048.355898] [] ? __sys_recvmsg+0x5f/0x70
> [ 2048.360775] [] ? SyS_recvmsg+0x16/0x20
> [ 2048.365473] [] ? SyS_socketcall+0x107/0x2e0
> [ 2048.370609] [] SyS_poll+0x5a/0xd0
> [ 2048.374873] [] sysenter_do_call+0x12/0x22
> [ 2048.379828] ---[ end trace 723e25b4ff5b3a4f ]---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

RE: Possible locking bug in pm8xxx/pm8001

2013-12-16 Thread Suresh Thiagarajan
Jason,

I have sent you the patch for testing. Could you please help testing this patch?
Will submit it here once it is tested by you.

Regards,
Suresh

-Original Message-
From: Suresh Thiagarajan [mailto:sureshka...@gmail.com] 
Sent: Wednesday, November 27, 2013 11:11 AM
To: Jason Seba
Cc: linux-scsi@vger.kernel.org; Suresh Thiagarajan
Subject: Re: Possible locking bug in pm8xxx/pm8001

Hi Jason

On Sat, Oct 12, 2013 at 2:02 AM, Jason Seba  wrote:
> The pm8xxx driver uses a per-adapter spinlock (pm8001_ha->lock) which 
> is usually acquired and released with the irqsave routines. However, 
> some functions which are called with the lock held 
> (mpi_sata_completion, mpi_sata_event, pm8001_chip_sata_req) will 
> temporary release the lock to complete a task. However, when releasing 
> and reacquiring the lock in this case, the irqsave routine are not 
> used; instead spin_unlock_irq/spin_lock_irq are used. As far as I can 
> tell, this is wrong and dangerous, and appears to result in the hard 
> lockup shown below.
>
> It isn't obvious to me what the best way to fix this is. Suggestions?

This can be fixed by using flag variable from pm8001_hba_info structure instead 
of taking it as local variable in all the functions.
Will send out a patch soon to fix this.

Regards,
Suresh
>
>
>
> [ 2048.017802] [ cut here ] [ 2048.022621] 
> WARNING: CPU: 0 PID: 1606 at kernel/watchdog.c:245
> watchdog_overflow_callback+0xac/0xd0()
> [ 2048.031827] Watchdog detected hard LOCKUP on cpu 0 [ 2048.036439] 
> Modules linked in: ses enclosure xt_CHECKSUM iptable_mangle 
> ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge sunrpc fcoe 8021q 
> mrp garp libfcoe libfc scsi_transport_fc stp llc scsi_tgt xt_physdev 
> nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
> ip6_tables binfmt_misc uinput iTCO_wdt iTCO_vendor_support mgag200 ttm 
> drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea 
> pm80xx libsas scsi_transport_sas joydev dcdbas pcspkr i2c_i801 
> i2c_core lpc_ich mfd_core tg3 ptp pps_core [last unloaded:
> speedstep_lib]
> [ 2048.090159] CPU: 0 PID: 1606 Comm: libvirtd Not tainted 3.11.0-rc5+ 
> #2 [ 2048.096682] Hardware name: Dell Inc. PowerEdge T110 II/015TH9, 
> BIOS
> 2.0.5 03/13/2012
> [ 2048.104410] 00f5 f3401828 c1580fe3 c1710301 f3401858 c10418e4 
> c17061bc f3401884 [ 2048.112277] 0646 c1710301 00f5 c10c415c 
> c10c415c f5822800
> c10c40b0 
> [ 2048.120153] f3401870 c10419a3 0009 f3401868 c17061bc f3401884
> f3401888 c10c415c
> [ 2048.128022] Call Trace:
> [ 2048.130476] [] dump_stack+0x41/0x56 [ 2048.134921] 
> [] warn_slowpath_common+0x84/0xa0 [ 2048.140231] 
> [] ? watchdog_overflow_callback+0xac/0xd0
> [ 2048.146236] [] ? watchdog_overflow_callback+0xac/0xd0
> [ 2048.152239] [] ? watchdog_cleanup+0x10/0x10 [ 
> 2048.157369] [] warn_slowpath_fmt+0x33/0x40 [ 2048.162420] 
> [] watchdog_overflow_callback+0xac/0xd0
> [ 2048.168243] [] __perf_event_overflow+0xaf/0x280 [ 
> 2048.173729] [] ? x86_perf_event_set_period+0x12a/0x1e0
> [ 2048.179819] [] perf_event_overflow+0x15/0x20 [ 
> 2048.185043] [] intel_pmu_handle_irq+0x1bb/0x390 [ 
> 2048.190519] [] ? sched_clock_cpu+0x11d/0x1a0 [ 2048.195744] 
> [] perf_event_nmi_handler+0x31/0x50 [ 2048.201229] 
> [] nmi_handle+0x52/0x190 [ 2048.205762] [] ? 
> serial8250_modem_status+0xb0/0xb0 [ 2048.211504] [] 
> do_nmi+0xe2/0x3d0 [ 2048.215681] [] 
> nmi_stack_correct+0x2f/0x34 [ 2048.220732] [] ? 
> __pci_bus_size_bridges+0x868/0x890
> [ 2048.226563] [] ? _raw_spin_lock_irqsave+0x22/0x30 [ 
> 2048.232215] [] process_oq+0x6ae/0x1820 [pm80xx] [ 
> 2048.237698] [] pm8001_chip_isr+0x23/0x40 [pm80xx] [ 
> 2048.243356] [] pm8001_tasklet+0x1f/0x30 [pm80xx] [ 
> 2048.248925] [] tasklet_action+0x8e/0xa0 [ 2048.253709] 
> [] __do_softirq+0xaf/0x200 [ 2048.258406] [] 
> irq_exit+0xa5/0xb0 [ 2048.262676] [] do_IRQ+0x4b/0xc0 [ 
> 2048.266768] [] ? add_wait_queue+0x3b/0x50 [ 2048.271730] 
> [] common_interrupt+0x33/0x38 [ 2048.276687] [] ? 
> qi_flush_dev_iotlb+0x98/0xf0 [ 2048.282001] [] ? 
> poll_schedule_timeout+0x1/0xb0 [ 2048.287483] [] ? 
> do_sys_poll+0x4ad/0x530 [ 2048.292352] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.296962] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.301581] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.306197] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.310808] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.315416] [] ? 
> __pollwait+0xe0/0xe0 [ 2048.320026] [] ? 
> netlink_recvmsg+0x294/0x340 [ 2048.325244] [] ? 
> selinux_socket_recvmsg+0x1d/0x20 [ 2048.330901] [] ? 
> sock_recvmsg+0xc0/0xf0 [ 2048.335692] [] ? 
> update_curr+0x1e7/0x290 [ 2048.340572] [] ? 
> move_addr_to_user+0x7d/0xb0 [ 2048.345796] [] ?