Re: page fault deadlock
On Fri, Nov 29, 2013 at 3:17 AM, Greg KH wrote: > On Thu, Nov 28, 2013 at 03:28:39PM +0800, Xiaotian Feng wrote: >> On Thu, Nov 28, 2013 at 12:11 PM, Greg KH wrote: >> > On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: >> >> Hi, >> >> >> >> When I upgrade to latest kernel, I found my system hang there. It >> >> is reproducible on my virtualbox, and I found each time I mounted my >> >> RAID6 partition and tried to vi or build kernel, my whole system >> >> lockup very soon. >> >> >> >> After turning on lockdep, I found following lockdep warning: >> >> >> >> [ 27.848462] >> >> [ 27.848471] == >> >> [ 27.848477] [ INFO: possible circular locking dependency detected ] >> >> [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W >> >> [ 27.848490] --- >> >> [ 27.848496] Xorg/1268 is trying to acquire lock: >> >> [ 27.848501] (>mutex){+.+.+.}, at: [] >> >> sysfs_bin_mmap+0x4f/0x120 >> >> [ 27.848516] >> >> [ 27.848516] but task is already holding lock: >> >> [ 27.848521] (>mmap_sem){++}, at: [] >> >> vm_mmap_pgoff+0x6f/0xc0 >> >> [ 27.848534] >> >> [ 27.848534] which lock already depends on the new lock. >> >> [ 27.848534] >> >> [ 27.848541] >> >> [ 27.848541] the existing dependency chain (in reverse order) is: >> >> [ 27.848547] >> >> [ 27.848547] -> #2 (>mmap_sem){++}: >> >> [ 27.848556][] lock_acquire+0xb0/0x160 >> >> [ 27.848564][] might_fault+0x8c/0xb0 >> >> [ 27.848572][] md_ioctl+0xa78/0x19b0 >> >> [ 27.848580][] blkdev_ioctl+0x234/0x840 >> >> [ 27.848588][] block_ioctl+0x41/0x50 >> >> [ 27.848597][] do_vfs_ioctl+0x300/0x520 >> >> [ 27.848605][] SyS_ioctl+0x81/0xa0 >> >> [ 27.848613][] tracesys+0xe1/0xe6 >> >> [ 27.848622] >> >> [ 27.848622] -> #1 (>reconfig_mutex){+.+.+.}: >> >> [ 27.848630][] lock_acquire+0xb0/0x160 >> >> [ 27.848637][] >> >> mutex_lock_interruptible_nested+0x78/0x610 >> >> [ 27.848646][] rdev_attr_show+0x40/0x90 >> >> [ 27.848654][] sysfs_seq_show+0xda/0x170 >> >> [ 27.848662][] seq_read+0x164/0x3e0 >> >> [ 27.848671][] vfs_read+0x95/0x160 >> >> [ 27.848680][] SyS_read+0x49/0xa0 >> >> [ 27.848687][] tracesys+0xe1/0xe6 >> >> [ 27.848695] >> >> [ 27.848695] -> #0 (>mutex){+.+.+.}: >> >> [ 27.848703][] __lock_acquire+0x1587/0x1ca0 >> >> [ 27.848711][] lock_acquire+0xb0/0x160 >> >> [ 27.848718][] mutex_lock_nested+0x68/0x510 >> >> [ 27.848725][] sysfs_bin_mmap+0x4f/0x120 >> >> [ 27.848732][] mmap_region+0x3ed/0x5d0 >> >> [ 27.848741][] do_mmap_pgoff+0x34e/0x3d0 >> >> [ 27.848748][] vm_mmap_pgoff+0x90/0xc0 >> >> [ 27.848755][] SyS_mmap_pgoff+0x1d5/0x270 >> >> [ 27.848763][] SyS_mmap+0x22/0x30 >> >> [ 27.848771][] tracesys+0xe1/0xe6 >> >> [ 27.848778] >> >> [ 27.848778] other info that might help us debug this: >> >> [ 27.848778] >> >> [ 27.848785] Chain exists of: >> >> [ 27.848785] >mutex --> >reconfig_mutex --> >mmap_sem >> >> [ 27.848785] >> >> [ 27.848795] Possible unsafe locking scenario: >> >> [ 27.848795] >> >> [ 27.848800]CPU0CPU1 >> >> [ 27.848805] >> >> [ 27.848810] lock(>mmap_sem); >> >> [ 27.848817] >> >> lock(>reconfig_mutex); >> >> [ 27.848824]lock(>mmap_sem); >> >> [ 27.848830] lock(>mutex); >> >> [ 27.848837] >> >> [ 27.848837] *** DEADLOCK *** >> >> [ 27.848837] >> >> [ 27.848844] 1 lock held by Xorg/1268: >> >> [ 27.848849] #0: (>mmap_sem){++}, at: [] >> >> vm_mmap_pgoff+0x6f/0xc0 >> >> [ 27.848861] >> >> [ 27.848861] stack ba
Re: page fault deadlock
On Fri, Nov 29, 2013 at 3:17 AM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Nov 28, 2013 at 03:28:39PM +0800, Xiaotian Feng wrote: On Thu, Nov 28, 2013 at 12:11 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: Hi, When I upgrade to latest kernel, I found my system hang there. It is reproducible on my virtualbox, and I found each time I mounted my RAID6 partition and tried to vi or build kernel, my whole system lockup very soon. After turning on lockdep, I found following lockdep warning: [ 27.848462] [ 27.848471] == [ 27.848477] [ INFO: possible circular locking dependency detected ] [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W [ 27.848490] --- [ 27.848496] Xorg/1268 is trying to acquire lock: [ 27.848501] (of-mutex){+.+.+.}, at: [8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848516] [ 27.848516] but task is already holding lock: [ 27.848521] (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848534] [ 27.848534] which lock already depends on the new lock. [ 27.848534] [ 27.848541] [ 27.848541] the existing dependency chain (in reverse order) is: [ 27.848547] [ 27.848547] - #2 (mm-mmap_sem){++}: [ 27.848556][810c0510] lock_acquire+0xb0/0x160 [ 27.848564][8119177c] might_fault+0x8c/0xb0 [ 27.848572][815f4c08] md_ioctl+0xa78/0x19b0 [ 27.848580][813915a4] blkdev_ioctl+0x234/0x840 [ 27.848588][8121db61] block_ioctl+0x41/0x50 [ 27.848597][811f5330] do_vfs_ioctl+0x300/0x520 [ 27.848605][811f55d1] SyS_ioctl+0x81/0xa0 [ 27.848613][81784e98] tracesys+0xe1/0xe6 [ 27.848622] [ 27.848622] - #1 (mddev-reconfig_mutex){+.+.+.}: [ 27.848630][810c0510] lock_acquire+0xb0/0x160 [ 27.848637][81778568] mutex_lock_interruptible_nested+0x78/0x610 [ 27.848646][815e9750] rdev_attr_show+0x40/0x90 [ 27.848654][8125db2a] sysfs_seq_show+0xda/0x170 [ 27.848662][812076f4] seq_read+0x164/0x3e0 [ 27.848671][811e1005] vfs_read+0x95/0x160 [ 27.848680][811e1b19] SyS_read+0x49/0xa0 [ 27.848687][81784e98] tracesys+0xe1/0xe6 [ 27.848695] [ 27.848695] - #0 (of-mutex){+.+.+.}: [ 27.848703][810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848711][810c0510] lock_acquire+0xb0/0x160 [ 27.848718][81778048] mutex_lock_nested+0x68/0x510 [ 27.848725][8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848732][8119d82d] mmap_region+0x3ed/0x5d0 [ 27.848741][8119dd5e] do_mmap_pgoff+0x34e/0x3d0 [ 27.848748][811875e0] vm_mmap_pgoff+0x90/0xc0 [ 27.848755][8119c2b5] SyS_mmap_pgoff+0x1d5/0x270 [ 27.848763][8101ae52] SyS_mmap+0x22/0x30 [ 27.848771][81784e98] tracesys+0xe1/0xe6 [ 27.848778] [ 27.848778] other info that might help us debug this: [ 27.848778] [ 27.848785] Chain exists of: [ 27.848785] of-mutex -- mddev-reconfig_mutex -- mm-mmap_sem [ 27.848785] [ 27.848795] Possible unsafe locking scenario: [ 27.848795] [ 27.848800]CPU0CPU1 [ 27.848805] [ 27.848810] lock(mm-mmap_sem); [ 27.848817] lock(mddev-reconfig_mutex); [ 27.848824]lock(mm-mmap_sem); [ 27.848830] lock(of-mutex); [ 27.848837] [ 27.848837] *** DEADLOCK *** [ 27.848837] [ 27.848844] 1 lock held by Xorg/1268: [ 27.848849] #0: (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848861] [ 27.848861] stack backtrace: [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W 3.13.0-rc1+ #1 [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 27.848879] 822daa00 8800d0371bc8 817725f7 822cbdc0 [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 880115b42a78 [ 27.848909] 880115b42a78 880115b422a0 0001 [ 27.848918] Call Trace: [ 27.848930] [817725f7] dump_stack+0x4e/0x7a [ 27.848942] [8176d9eb] print_circular_bug+0x1f9/0x208 [ 27.848952] [810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848964] [8101955f] ? print_context_stack+0x8f/0x100 [ 27.848975] [810c0510] lock_acquire+0xb0/0x160 [ 27.848986] [8125d58f
Re: page fault deadlock
On Thu, Nov 28, 2013 at 12:11 PM, Greg KH wrote: > On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: >> Hi, >> >> When I upgrade to latest kernel, I found my system hang there. It >> is reproducible on my virtualbox, and I found each time I mounted my >> RAID6 partition and tried to vi or build kernel, my whole system >> lockup very soon. >> >> After turning on lockdep, I found following lockdep warning: >> >> [ 27.848462] >> [ 27.848471] == >> [ 27.848477] [ INFO: possible circular locking dependency detected ] >> [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W >> [ 27.848490] --- >> [ 27.848496] Xorg/1268 is trying to acquire lock: >> [ 27.848501] (>mutex){+.+.+.}, at: [] >> sysfs_bin_mmap+0x4f/0x120 >> [ 27.848516] >> [ 27.848516] but task is already holding lock: >> [ 27.848521] (>mmap_sem){++}, at: [] >> vm_mmap_pgoff+0x6f/0xc0 >> [ 27.848534] >> [ 27.848534] which lock already depends on the new lock. >> [ 27.848534] >> [ 27.848541] >> [ 27.848541] the existing dependency chain (in reverse order) is: >> [ 27.848547] >> [ 27.848547] -> #2 (>mmap_sem){++}: >> [ 27.848556][] lock_acquire+0xb0/0x160 >> [ 27.848564][] might_fault+0x8c/0xb0 >> [ 27.848572][] md_ioctl+0xa78/0x19b0 >> [ 27.848580][] blkdev_ioctl+0x234/0x840 >> [ 27.848588][] block_ioctl+0x41/0x50 >> [ 27.848597][] do_vfs_ioctl+0x300/0x520 >> [ 27.848605][] SyS_ioctl+0x81/0xa0 >> [ 27.848613][] tracesys+0xe1/0xe6 >> [ 27.848622] >> [ 27.848622] -> #1 (>reconfig_mutex){+.+.+.}: >> [ 27.848630][] lock_acquire+0xb0/0x160 >> [ 27.848637][] >> mutex_lock_interruptible_nested+0x78/0x610 >> [ 27.848646][] rdev_attr_show+0x40/0x90 >> [ 27.848654][] sysfs_seq_show+0xda/0x170 >> [ 27.848662][] seq_read+0x164/0x3e0 >> [ 27.848671][] vfs_read+0x95/0x160 >> [ 27.848680][] SyS_read+0x49/0xa0 >> [ 27.848687][] tracesys+0xe1/0xe6 >> [ 27.848695] >> [ 27.848695] -> #0 (>mutex){+.+.+.}: >> [ 27.848703][] __lock_acquire+0x1587/0x1ca0 >> [ 27.848711][] lock_acquire+0xb0/0x160 >> [ 27.848718][] mutex_lock_nested+0x68/0x510 >> [ 27.848725][] sysfs_bin_mmap+0x4f/0x120 >> [ 27.848732][] mmap_region+0x3ed/0x5d0 >> [ 27.848741][] do_mmap_pgoff+0x34e/0x3d0 >> [ 27.848748][] vm_mmap_pgoff+0x90/0xc0 >> [ 27.848755][] SyS_mmap_pgoff+0x1d5/0x270 >> [ 27.848763][] SyS_mmap+0x22/0x30 >> [ 27.848771][] tracesys+0xe1/0xe6 >> [ 27.848778] >> [ 27.848778] other info that might help us debug this: >> [ 27.848778] >> [ 27.848785] Chain exists of: >> [ 27.848785] >mutex --> >reconfig_mutex --> >mmap_sem >> [ 27.848785] >> [ 27.848795] Possible unsafe locking scenario: >> [ 27.848795] >> [ 27.848800]CPU0CPU1 >> [ 27.848805] >> [ 27.848810] lock(>mmap_sem); >> [ 27.848817]lock(>reconfig_mutex); >> [ 27.848824]lock(>mmap_sem); >> [ 27.848830] lock(>mutex); >> [ 27.848837] >> [ 27.848837] *** DEADLOCK *** >> [ 27.848837] >> [ 27.848844] 1 lock held by Xorg/1268: >> [ 27.848849] #0: (>mmap_sem){++}, at: [] >> vm_mmap_pgoff+0x6f/0xc0 >> [ 27.848861] >> [ 27.848861] stack backtrace: >> [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W >> 3.13.0-rc1+ #1 >> [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS >> VirtualBox 12/01/2006 >> [ 27.848879] 822daa00 8800d0371bc8 817725f7 >> 822cbdc0 >> [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 >> 880115b42a78 >> [ 27.848909] 880115b42a78 880115b422a0 >> 0001 >> [ 27.848918] Call Trace: >> [ 27.848930] [] dump_stack+0x4e/0x7a >> [ 27.848942] [] print_circular_bug+0x1f9/0x208 >> [ 27.848952] [] __lock_acquire+0x1587/0x1ca0 >> [ 27.848964] [] ? print_context_stack+0x8f/0x100 >> [ 27.848975] [] lock_acquire+0xb0/0x160 >> [ 27.848986] [] ? sysfs_bin_mmap+0
Re: page fault deadlock
On Thu, Nov 28, 2013 at 12:11 PM, Greg KH wrote: > On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: >> Hi, >> >> When I upgrade to latest kernel, I found my system hang there. It >> is reproducible on my virtualbox, and I found each time I mounted my >> RAID6 partition and tried to vi or build kernel, my whole system >> lockup very soon. >> >> After turning on lockdep, I found following lockdep warning: >> >> [ 27.848462] >> [ 27.848471] == >> [ 27.848477] [ INFO: possible circular locking dependency detected ] >> [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W >> [ 27.848490] --- >> [ 27.848496] Xorg/1268 is trying to acquire lock: >> [ 27.848501] (>mutex){+.+.+.}, at: [] >> sysfs_bin_mmap+0x4f/0x120 >> [ 27.848516] >> [ 27.848516] but task is already holding lock: >> [ 27.848521] (>mmap_sem){++}, at: [] >> vm_mmap_pgoff+0x6f/0xc0 >> [ 27.848534] >> [ 27.848534] which lock already depends on the new lock. >> [ 27.848534] >> [ 27.848541] >> [ 27.848541] the existing dependency chain (in reverse order) is: >> [ 27.848547] >> [ 27.848547] -> #2 (>mmap_sem){++}: >> [ 27.848556][] lock_acquire+0xb0/0x160 >> [ 27.848564][] might_fault+0x8c/0xb0 >> [ 27.848572][] md_ioctl+0xa78/0x19b0 >> [ 27.848580][] blkdev_ioctl+0x234/0x840 >> [ 27.848588][] block_ioctl+0x41/0x50 >> [ 27.848597][] do_vfs_ioctl+0x300/0x520 >> [ 27.848605][] SyS_ioctl+0x81/0xa0 >> [ 27.848613][] tracesys+0xe1/0xe6 >> [ 27.848622] >> [ 27.848622] -> #1 (>reconfig_mutex){+.+.+.}: >> [ 27.848630][] lock_acquire+0xb0/0x160 >> [ 27.848637][] >> mutex_lock_interruptible_nested+0x78/0x610 >> [ 27.848646][] rdev_attr_show+0x40/0x90 >> [ 27.848654][] sysfs_seq_show+0xda/0x170 >> [ 27.848662][] seq_read+0x164/0x3e0 >> [ 27.848671][] vfs_read+0x95/0x160 >> [ 27.848680][] SyS_read+0x49/0xa0 >> [ 27.848687][] tracesys+0xe1/0xe6 >> [ 27.848695] >> [ 27.848695] -> #0 (>mutex){+.+.+.}: >> [ 27.848703][] __lock_acquire+0x1587/0x1ca0 >> [ 27.848711][] lock_acquire+0xb0/0x160 >> [ 27.848718][] mutex_lock_nested+0x68/0x510 >> [ 27.848725][] sysfs_bin_mmap+0x4f/0x120 >> [ 27.848732][] mmap_region+0x3ed/0x5d0 >> [ 27.848741][] do_mmap_pgoff+0x34e/0x3d0 >> [ 27.848748][] vm_mmap_pgoff+0x90/0xc0 >> [ 27.848755][] SyS_mmap_pgoff+0x1d5/0x270 >> [ 27.848763][] SyS_mmap+0x22/0x30 >> [ 27.848771][] tracesys+0xe1/0xe6 >> [ 27.848778] >> [ 27.848778] other info that might help us debug this: >> [ 27.848778] >> [ 27.848785] Chain exists of: >> [ 27.848785] >mutex --> >reconfig_mutex --> >mmap_sem >> [ 27.848785] >> [ 27.848795] Possible unsafe locking scenario: >> [ 27.848795] >> [ 27.848800]CPU0CPU1 >> [ 27.848805] >> [ 27.848810] lock(>mmap_sem); >> [ 27.848817]lock(>reconfig_mutex); >> [ 27.848824]lock(>mmap_sem); >> [ 27.848830] lock(>mutex); >> [ 27.848837] >> [ 27.848837] *** DEADLOCK *** >> [ 27.848837] >> [ 27.848844] 1 lock held by Xorg/1268: >> [ 27.848849] #0: (>mmap_sem){++}, at: [] >> vm_mmap_pgoff+0x6f/0xc0 >> [ 27.848861] >> [ 27.848861] stack backtrace: >> [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W >> 3.13.0-rc1+ #1 >> [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS >> VirtualBox 12/01/2006 >> [ 27.848879] 822daa00 8800d0371bc8 817725f7 >> 822cbdc0 >> [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 >> 880115b42a78 >> [ 27.848909] 880115b42a78 880115b422a0 >> 0001 >> [ 27.848918] Call Trace: >> [ 27.848930] [] dump_stack+0x4e/0x7a >> [ 27.848942] [] print_circular_bug+0x1f9/0x208 >> [ 27.848952] [] __lock_acquire+0x1587/0x1ca0 >> [ 27.848964] [] ? print_context_stack+0x8f/0x100 >> [ 27.848975] [] lock_acquire+0xb0/0x160 >> [ 27.848986] [] ? sysfs_bin_mmap+0
page fault deadlock
Hi, When I upgrade to latest kernel, I found my system hang there. It is reproducible on my virtualbox, and I found each time I mounted my RAID6 partition and tried to vi or build kernel, my whole system lockup very soon. After turning on lockdep, I found following lockdep warning: [ 27.848462] [ 27.848471] == [ 27.848477] [ INFO: possible circular locking dependency detected ] [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W [ 27.848490] --- [ 27.848496] Xorg/1268 is trying to acquire lock: [ 27.848501] (>mutex){+.+.+.}, at: [] sysfs_bin_mmap+0x4f/0x120 [ 27.848516] [ 27.848516] but task is already holding lock: [ 27.848521] (>mmap_sem){++}, at: [] vm_mmap_pgoff+0x6f/0xc0 [ 27.848534] [ 27.848534] which lock already depends on the new lock. [ 27.848534] [ 27.848541] [ 27.848541] the existing dependency chain (in reverse order) is: [ 27.848547] [ 27.848547] -> #2 (>mmap_sem){++}: [ 27.848556][] lock_acquire+0xb0/0x160 [ 27.848564][] might_fault+0x8c/0xb0 [ 27.848572][] md_ioctl+0xa78/0x19b0 [ 27.848580][] blkdev_ioctl+0x234/0x840 [ 27.848588][] block_ioctl+0x41/0x50 [ 27.848597][] do_vfs_ioctl+0x300/0x520 [ 27.848605][] SyS_ioctl+0x81/0xa0 [ 27.848613][] tracesys+0xe1/0xe6 [ 27.848622] [ 27.848622] -> #1 (>reconfig_mutex){+.+.+.}: [ 27.848630][] lock_acquire+0xb0/0x160 [ 27.848637][] mutex_lock_interruptible_nested+0x78/0x610 [ 27.848646][] rdev_attr_show+0x40/0x90 [ 27.848654][] sysfs_seq_show+0xda/0x170 [ 27.848662][] seq_read+0x164/0x3e0 [ 27.848671][] vfs_read+0x95/0x160 [ 27.848680][] SyS_read+0x49/0xa0 [ 27.848687][] tracesys+0xe1/0xe6 [ 27.848695] [ 27.848695] -> #0 (>mutex){+.+.+.}: [ 27.848703][] __lock_acquire+0x1587/0x1ca0 [ 27.848711][] lock_acquire+0xb0/0x160 [ 27.848718][] mutex_lock_nested+0x68/0x510 [ 27.848725][] sysfs_bin_mmap+0x4f/0x120 [ 27.848732][] mmap_region+0x3ed/0x5d0 [ 27.848741][] do_mmap_pgoff+0x34e/0x3d0 [ 27.848748][] vm_mmap_pgoff+0x90/0xc0 [ 27.848755][] SyS_mmap_pgoff+0x1d5/0x270 [ 27.848763][] SyS_mmap+0x22/0x30 [ 27.848771][] tracesys+0xe1/0xe6 [ 27.848778] [ 27.848778] other info that might help us debug this: [ 27.848778] [ 27.848785] Chain exists of: [ 27.848785] >mutex --> >reconfig_mutex --> >mmap_sem [ 27.848785] [ 27.848795] Possible unsafe locking scenario: [ 27.848795] [ 27.848800]CPU0CPU1 [ 27.848805] [ 27.848810] lock(>mmap_sem); [ 27.848817]lock(>reconfig_mutex); [ 27.848824]lock(>mmap_sem); [ 27.848830] lock(>mutex); [ 27.848837] [ 27.848837] *** DEADLOCK *** [ 27.848837] [ 27.848844] 1 lock held by Xorg/1268: [ 27.848849] #0: (>mmap_sem){++}, at: [] vm_mmap_pgoff+0x6f/0xc0 [ 27.848861] [ 27.848861] stack backtrace: [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W3.13.0-rc1+ #1 [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 27.848879] 822daa00 8800d0371bc8 817725f7 822cbdc0 [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 880115b42a78 [ 27.848909] 880115b42a78 880115b422a0 0001 [ 27.848918] Call Trace: [ 27.848930] [] dump_stack+0x4e/0x7a [ 27.848942] [] print_circular_bug+0x1f9/0x208 [ 27.848952] [] __lock_acquire+0x1587/0x1ca0 [ 27.848964] [] ? print_context_stack+0x8f/0x100 [ 27.848975] [] lock_acquire+0xb0/0x160 [ 27.848986] [] ? sysfs_bin_mmap+0x4f/0x120 [ 27.848996] [] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849007] [] mutex_lock_nested+0x68/0x510 [ 27.849016] [] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849027] [] ? kmemleak_alloc+0x4e/0xb0 [ 27.849038] [] sysfs_bin_mmap+0x4f/0x120 [ 27.849048] [] mmap_region+0x3ed/0x5d0 [ 27.849058] [] do_mmap_pgoff+0x34e/0x3d0 [ 27.849070] [] vm_mmap_pgoff+0x90/0xc0 [ 27.849080] [] SyS_mmap_pgoff+0x1d5/0x270 [ 27.849092] [] ? syscall_trace_enter+0x145/0x270 [ 27.849102] [] SyS_mmap+0x22/0x30 [ 27.849112] [] tracesys+0xe1/0xe6 I think it is a real deadlock, and it is caused by commit 3124eb1679b28726 "sysfs: merge regular and bin file handling". With that commit, sysfs_bin_mmap will hold of->mutex. So assume cpu0 called sysfs_bin_mmap, acquired mmap_sem and trying to get of->mutex. CPU1 called sysfs_seq_show, acqured of->mutex and trying to get mddev->reconfig_mutex. CPU2 called md_ioctl, acquired mddev->reconfig_mutex, and later call copy_from_user and page fault trying to get mmap_sem.
page fault deadlock
Hi, When I upgrade to latest kernel, I found my system hang there. It is reproducible on my virtualbox, and I found each time I mounted my RAID6 partition and tried to vi or build kernel, my whole system lockup very soon. After turning on lockdep, I found following lockdep warning: [ 27.848462] [ 27.848471] == [ 27.848477] [ INFO: possible circular locking dependency detected ] [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W [ 27.848490] --- [ 27.848496] Xorg/1268 is trying to acquire lock: [ 27.848501] (of-mutex){+.+.+.}, at: [8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848516] [ 27.848516] but task is already holding lock: [ 27.848521] (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848534] [ 27.848534] which lock already depends on the new lock. [ 27.848534] [ 27.848541] [ 27.848541] the existing dependency chain (in reverse order) is: [ 27.848547] [ 27.848547] - #2 (mm-mmap_sem){++}: [ 27.848556][810c0510] lock_acquire+0xb0/0x160 [ 27.848564][8119177c] might_fault+0x8c/0xb0 [ 27.848572][815f4c08] md_ioctl+0xa78/0x19b0 [ 27.848580][813915a4] blkdev_ioctl+0x234/0x840 [ 27.848588][8121db61] block_ioctl+0x41/0x50 [ 27.848597][811f5330] do_vfs_ioctl+0x300/0x520 [ 27.848605][811f55d1] SyS_ioctl+0x81/0xa0 [ 27.848613][81784e98] tracesys+0xe1/0xe6 [ 27.848622] [ 27.848622] - #1 (mddev-reconfig_mutex){+.+.+.}: [ 27.848630][810c0510] lock_acquire+0xb0/0x160 [ 27.848637][81778568] mutex_lock_interruptible_nested+0x78/0x610 [ 27.848646][815e9750] rdev_attr_show+0x40/0x90 [ 27.848654][8125db2a] sysfs_seq_show+0xda/0x170 [ 27.848662][812076f4] seq_read+0x164/0x3e0 [ 27.848671][811e1005] vfs_read+0x95/0x160 [ 27.848680][811e1b19] SyS_read+0x49/0xa0 [ 27.848687][81784e98] tracesys+0xe1/0xe6 [ 27.848695] [ 27.848695] - #0 (of-mutex){+.+.+.}: [ 27.848703][810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848711][810c0510] lock_acquire+0xb0/0x160 [ 27.848718][81778048] mutex_lock_nested+0x68/0x510 [ 27.848725][8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848732][8119d82d] mmap_region+0x3ed/0x5d0 [ 27.848741][8119dd5e] do_mmap_pgoff+0x34e/0x3d0 [ 27.848748][811875e0] vm_mmap_pgoff+0x90/0xc0 [ 27.848755][8119c2b5] SyS_mmap_pgoff+0x1d5/0x270 [ 27.848763][8101ae52] SyS_mmap+0x22/0x30 [ 27.848771][81784e98] tracesys+0xe1/0xe6 [ 27.848778] [ 27.848778] other info that might help us debug this: [ 27.848778] [ 27.848785] Chain exists of: [ 27.848785] of-mutex -- mddev-reconfig_mutex -- mm-mmap_sem [ 27.848785] [ 27.848795] Possible unsafe locking scenario: [ 27.848795] [ 27.848800]CPU0CPU1 [ 27.848805] [ 27.848810] lock(mm-mmap_sem); [ 27.848817]lock(mddev-reconfig_mutex); [ 27.848824]lock(mm-mmap_sem); [ 27.848830] lock(of-mutex); [ 27.848837] [ 27.848837] *** DEADLOCK *** [ 27.848837] [ 27.848844] 1 lock held by Xorg/1268: [ 27.848849] #0: (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848861] [ 27.848861] stack backtrace: [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W3.13.0-rc1+ #1 [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 27.848879] 822daa00 8800d0371bc8 817725f7 822cbdc0 [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 880115b42a78 [ 27.848909] 880115b42a78 880115b422a0 0001 [ 27.848918] Call Trace: [ 27.848930] [817725f7] dump_stack+0x4e/0x7a [ 27.848942] [8176d9eb] print_circular_bug+0x1f9/0x208 [ 27.848952] [810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848964] [8101955f] ? print_context_stack+0x8f/0x100 [ 27.848975] [810c0510] lock_acquire+0xb0/0x160 [ 27.848986] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.848996] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849007] [81778048] mutex_lock_nested+0x68/0x510 [ 27.849016] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849027] [8176456e] ? kmemleak_alloc+0x4e/0xb0 [ 27.849038] [8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.849048] [8119d82d] mmap_region+0x3ed/0x5d0 [ 27.849058] [8119dd5e] do_mmap_pgoff+0x34e/0x3d0 [ 27.849070]
Re: page fault deadlock
On Thu, Nov 28, 2013 at 12:11 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: Hi, When I upgrade to latest kernel, I found my system hang there. It is reproducible on my virtualbox, and I found each time I mounted my RAID6 partition and tried to vi or build kernel, my whole system lockup very soon. After turning on lockdep, I found following lockdep warning: [ 27.848462] [ 27.848471] == [ 27.848477] [ INFO: possible circular locking dependency detected ] [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W [ 27.848490] --- [ 27.848496] Xorg/1268 is trying to acquire lock: [ 27.848501] (of-mutex){+.+.+.}, at: [8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848516] [ 27.848516] but task is already holding lock: [ 27.848521] (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848534] [ 27.848534] which lock already depends on the new lock. [ 27.848534] [ 27.848541] [ 27.848541] the existing dependency chain (in reverse order) is: [ 27.848547] [ 27.848547] - #2 (mm-mmap_sem){++}: [ 27.848556][810c0510] lock_acquire+0xb0/0x160 [ 27.848564][8119177c] might_fault+0x8c/0xb0 [ 27.848572][815f4c08] md_ioctl+0xa78/0x19b0 [ 27.848580][813915a4] blkdev_ioctl+0x234/0x840 [ 27.848588][8121db61] block_ioctl+0x41/0x50 [ 27.848597][811f5330] do_vfs_ioctl+0x300/0x520 [ 27.848605][811f55d1] SyS_ioctl+0x81/0xa0 [ 27.848613][81784e98] tracesys+0xe1/0xe6 [ 27.848622] [ 27.848622] - #1 (mddev-reconfig_mutex){+.+.+.}: [ 27.848630][810c0510] lock_acquire+0xb0/0x160 [ 27.848637][81778568] mutex_lock_interruptible_nested+0x78/0x610 [ 27.848646][815e9750] rdev_attr_show+0x40/0x90 [ 27.848654][8125db2a] sysfs_seq_show+0xda/0x170 [ 27.848662][812076f4] seq_read+0x164/0x3e0 [ 27.848671][811e1005] vfs_read+0x95/0x160 [ 27.848680][811e1b19] SyS_read+0x49/0xa0 [ 27.848687][81784e98] tracesys+0xe1/0xe6 [ 27.848695] [ 27.848695] - #0 (of-mutex){+.+.+.}: [ 27.848703][810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848711][810c0510] lock_acquire+0xb0/0x160 [ 27.848718][81778048] mutex_lock_nested+0x68/0x510 [ 27.848725][8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848732][8119d82d] mmap_region+0x3ed/0x5d0 [ 27.848741][8119dd5e] do_mmap_pgoff+0x34e/0x3d0 [ 27.848748][811875e0] vm_mmap_pgoff+0x90/0xc0 [ 27.848755][8119c2b5] SyS_mmap_pgoff+0x1d5/0x270 [ 27.848763][8101ae52] SyS_mmap+0x22/0x30 [ 27.848771][81784e98] tracesys+0xe1/0xe6 [ 27.848778] [ 27.848778] other info that might help us debug this: [ 27.848778] [ 27.848785] Chain exists of: [ 27.848785] of-mutex -- mddev-reconfig_mutex -- mm-mmap_sem [ 27.848785] [ 27.848795] Possible unsafe locking scenario: [ 27.848795] [ 27.848800]CPU0CPU1 [ 27.848805] [ 27.848810] lock(mm-mmap_sem); [ 27.848817]lock(mddev-reconfig_mutex); [ 27.848824]lock(mm-mmap_sem); [ 27.848830] lock(of-mutex); [ 27.848837] [ 27.848837] *** DEADLOCK *** [ 27.848837] [ 27.848844] 1 lock held by Xorg/1268: [ 27.848849] #0: (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848861] [ 27.848861] stack backtrace: [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W 3.13.0-rc1+ #1 [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 27.848879] 822daa00 8800d0371bc8 817725f7 822cbdc0 [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 880115b42a78 [ 27.848909] 880115b42a78 880115b422a0 0001 [ 27.848918] Call Trace: [ 27.848930] [817725f7] dump_stack+0x4e/0x7a [ 27.848942] [8176d9eb] print_circular_bug+0x1f9/0x208 [ 27.848952] [810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848964] [8101955f] ? print_context_stack+0x8f/0x100 [ 27.848975] [810c0510] lock_acquire+0xb0/0x160 [ 27.848986] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.848996] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849007] [81778048] mutex_lock_nested+0x68/0x510 [ 27.849016] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849027
Re: page fault deadlock
On Thu, Nov 28, 2013 at 12:11 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: Hi, When I upgrade to latest kernel, I found my system hang there. It is reproducible on my virtualbox, and I found each time I mounted my RAID6 partition and tried to vi or build kernel, my whole system lockup very soon. After turning on lockdep, I found following lockdep warning: [ 27.848462] [ 27.848471] == [ 27.848477] [ INFO: possible circular locking dependency detected ] [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W [ 27.848490] --- [ 27.848496] Xorg/1268 is trying to acquire lock: [ 27.848501] (of-mutex){+.+.+.}, at: [8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848516] [ 27.848516] but task is already holding lock: [ 27.848521] (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848534] [ 27.848534] which lock already depends on the new lock. [ 27.848534] [ 27.848541] [ 27.848541] the existing dependency chain (in reverse order) is: [ 27.848547] [ 27.848547] - #2 (mm-mmap_sem){++}: [ 27.848556][810c0510] lock_acquire+0xb0/0x160 [ 27.848564][8119177c] might_fault+0x8c/0xb0 [ 27.848572][815f4c08] md_ioctl+0xa78/0x19b0 [ 27.848580][813915a4] blkdev_ioctl+0x234/0x840 [ 27.848588][8121db61] block_ioctl+0x41/0x50 [ 27.848597][811f5330] do_vfs_ioctl+0x300/0x520 [ 27.848605][811f55d1] SyS_ioctl+0x81/0xa0 [ 27.848613][81784e98] tracesys+0xe1/0xe6 [ 27.848622] [ 27.848622] - #1 (mddev-reconfig_mutex){+.+.+.}: [ 27.848630][810c0510] lock_acquire+0xb0/0x160 [ 27.848637][81778568] mutex_lock_interruptible_nested+0x78/0x610 [ 27.848646][815e9750] rdev_attr_show+0x40/0x90 [ 27.848654][8125db2a] sysfs_seq_show+0xda/0x170 [ 27.848662][812076f4] seq_read+0x164/0x3e0 [ 27.848671][811e1005] vfs_read+0x95/0x160 [ 27.848680][811e1b19] SyS_read+0x49/0xa0 [ 27.848687][81784e98] tracesys+0xe1/0xe6 [ 27.848695] [ 27.848695] - #0 (of-mutex){+.+.+.}: [ 27.848703][810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848711][810c0510] lock_acquire+0xb0/0x160 [ 27.848718][81778048] mutex_lock_nested+0x68/0x510 [ 27.848725][8125d58f] sysfs_bin_mmap+0x4f/0x120 [ 27.848732][8119d82d] mmap_region+0x3ed/0x5d0 [ 27.848741][8119dd5e] do_mmap_pgoff+0x34e/0x3d0 [ 27.848748][811875e0] vm_mmap_pgoff+0x90/0xc0 [ 27.848755][8119c2b5] SyS_mmap_pgoff+0x1d5/0x270 [ 27.848763][8101ae52] SyS_mmap+0x22/0x30 [ 27.848771][81784e98] tracesys+0xe1/0xe6 [ 27.848778] [ 27.848778] other info that might help us debug this: [ 27.848778] [ 27.848785] Chain exists of: [ 27.848785] of-mutex -- mddev-reconfig_mutex -- mm-mmap_sem [ 27.848785] [ 27.848795] Possible unsafe locking scenario: [ 27.848795] [ 27.848800]CPU0CPU1 [ 27.848805] [ 27.848810] lock(mm-mmap_sem); [ 27.848817]lock(mddev-reconfig_mutex); [ 27.848824]lock(mm-mmap_sem); [ 27.848830] lock(of-mutex); [ 27.848837] [ 27.848837] *** DEADLOCK *** [ 27.848837] [ 27.848844] 1 lock held by Xorg/1268: [ 27.848849] #0: (mm-mmap_sem){++}, at: [811875bf] vm_mmap_pgoff+0x6f/0xc0 [ 27.848861] [ 27.848861] stack backtrace: [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W 3.13.0-rc1+ #1 [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 27.848879] 822daa00 8800d0371bc8 817725f7 822cbdc0 [ 27.848901] 8800d0371c08 8176d9eb 8800d0371c60 880115b42a78 [ 27.848909] 880115b42a78 880115b422a0 0001 [ 27.848918] Call Trace: [ 27.848930] [817725f7] dump_stack+0x4e/0x7a [ 27.848942] [8176d9eb] print_circular_bug+0x1f9/0x208 [ 27.848952] [810bfd47] __lock_acquire+0x1587/0x1ca0 [ 27.848964] [8101955f] ? print_context_stack+0x8f/0x100 [ 27.848975] [810c0510] lock_acquire+0xb0/0x160 [ 27.848986] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.848996] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849007] [81778048] mutex_lock_nested+0x68/0x510 [ 27.849016] [8125d58f] ? sysfs_bin_mmap+0x4f/0x120 [ 27.849027
[tip:sched/core] cpumask: Fix cpumask leak in partition_sched_domains()
Commit-ID: c8d2d47a9cbb4222ae4e993aa0e3703430c8193c Gitweb: http://git.kernel.org/tip/c8d2d47a9cbb4222ae4e993aa0e3703430c8193c Author: Xiaotian Feng AuthorDate: Tue, 6 Aug 2013 20:06:42 +0800 Committer: Ingo Molnar CommitDate: Fri, 16 Aug 2013 17:44:27 +0200 cpumask: Fix cpumask leak in partition_sched_domains() If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng Cc: Rusty Russell Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1375790802-11857-1-git-send-email-xtf...@gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7415cf..cf8f100 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n = ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = _doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i < ndoms_new; i++) { - for (j = 0; j < ndoms_cur && !new_topology; j++) { + for (j = 0; j < n && !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) && dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] cpumask: Fix cpumask leak in partition_sched_domains()
Commit-ID: c8d2d47a9cbb4222ae4e993aa0e3703430c8193c Gitweb: http://git.kernel.org/tip/c8d2d47a9cbb4222ae4e993aa0e3703430c8193c Author: Xiaotian Feng xtf...@gmail.com AuthorDate: Tue, 6 Aug 2013 20:06:42 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 16 Aug 2013 17:44:27 +0200 cpumask: Fix cpumask leak in partition_sched_domains() If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1375790802-11857-1-git-send-email-xtf...@gmail.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7415cf..cf8f100 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n = ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Tue, Aug 6, 2013 at 8:06 PM, Xiaotian Feng wrote: > If doms_new is NULL, partition_sched_domains() will reset ndoms_cur > to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). > As ndoms_cur is 0, the cpumask will not be freed. > > Signed-off-by: Xiaotian Feng > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Rusty Russell > Cc: Thomas Gleixner > Cc: linux-kernel@vger.kernel.org Any comments on this patch? Without this patch, I can see following with kmemleak. unreferenced object 0x880118d26aa8 (size 512): comm "swapper/0", pid 1, jiffies 4294892366 (age 287.736s) hex dump (first 32 bytes): 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [] kmemleak_alloc+0x26/0x50 [] kmem_cache_alloc_node_trace+0x116/0x2d0 [] alloc_cpumask_var_node+0x1f/0x90 [] alloc_cpumask_var+0xe/0x10 [] alloc_sched_domains+0x5c/0x80 [] sched_init_smp+0x365/0x47d [] kernel_init_freeable+0xe3/0x1ef [] kernel_init+0xe/0xf0 [] ret_from_fork+0x7c/0xb0 [] 0x > --- > kernel/sched/core.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b7c32cb..3d6c57b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6184,8 +6184,9 @@ match1: > ; > } > > + n = ndoms_cur; > if (doms_new == NULL) { > - ndoms_cur = 0; > + n = 0; > doms_new = _doms; > cpumask_andnot(doms_new[0], cpu_active_mask, > cpu_isolated_map); > WARN_ON_ONCE(dattr_new); > @@ -6193,7 +6194,7 @@ match1: > > /* Build new domains */ > for (i = 0; i < ndoms_new; i++) { > - for (j = 0; j < ndoms_cur && !new_topology; j++) { > + for (j = 0; j < n && !new_topology; j++) { > if (cpumask_equal(doms_new[i], doms_cur[j]) > && dattrs_equal(dattr_new, i, dattr_cur, j)) > goto match2; > -- > 1.7.9.6 (Apple Git-31.1) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Tue, Aug 6, 2013 at 8:06 PM, Xiaotian Feng xtf...@gmail.com wrote: If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: Rusty Russell ru...@rustcorp.com.au Cc: Thomas Gleixner t...@linutronix.de Cc: linux-kernel@vger.kernel.org Any comments on this patch? Without this patch, I can see following with kmemleak. unreferenced object 0x880118d26aa8 (size 512): comm swapper/0, pid 1, jiffies 4294892366 (age 287.736s) hex dump (first 32 bytes): 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [817350e6] kmemleak_alloc+0x26/0x50 [811b92c6] kmem_cache_alloc_node_trace+0x116/0x2d0 [8139e66f] alloc_cpumask_var_node+0x1f/0x90 [8139e6ee] alloc_cpumask_var+0xe/0x10 [810a328c] alloc_sched_domains+0x5c/0x80 [81daf8c6] sched_init_smp+0x365/0x47d [81d8f01e] kernel_init_freeable+0xe3/0x1ef [81731b1e] kernel_init+0xe/0xf0 [817543ac] ret_from_fork+0x7c/0xb0 [] 0x --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n = ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpumask: fix cpumask leak in partition_sched_domains
If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rusty Russell Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n = ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = _doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i < ndoms_new; i++) { - for (j = 0; j < ndoms_cur && !new_topology; j++) { + for (j = 0; j < n && !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) && dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpumask: fix cpumask leak in partition_sched_domains
If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: Rusty Russell ru...@rustcorp.com.au Cc: Thomas Gleixner t...@linutronix.de Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n = ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Tue, Aug 6, 2013 at 12:37 PM, Rusty Russell wrote: > Xiaotian Feng writes: >> On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng wrote: >>> If doms_new is NULL, partition_sched_domains() will reset ndoms_cur >>> to 0, and free old sched domains with free_sched_domains(doms_cur, >>> ndoms_cur). >>> As ndoms_cur is 0, the cpumask will not be freed. >>> >>> Signed-off-by: Xiaotian Feng >>> Cc: Ingo Molnar >>> Cc: Peter Zijlstra >>> Cc: linux-kernel@vger.kernel.org >> >> Any comments? Cc'ed Rusty. > > The code is a little convoluted, but your fix is logical. > Yes, it's quite convoluted :( >>> --- >>> kernel/sched/core.c |5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index b7c32cb..3d6c57b 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -6184,8 +6184,9 @@ match1: >>> ; >>> } >>> >>> + n= ndoms_cur; > > You're missing a ' ' here: > n = ndoms_cur; > I'll update this, thanks :) >>> if (doms_new == NULL) { >>> - ndoms_cur = 0; >>> + n = 0; >>> doms_new = _doms; >>> cpumask_andnot(doms_new[0], cpu_active_mask, >>> cpu_isolated_map); >>> WARN_ON_ONCE(dattr_new); >>> @@ -6193,7 +6194,7 @@ match1: >>> >>> /* Build new domains */ >>> for (i = 0; i < ndoms_new; i++) { >>> - for (j = 0; j < ndoms_cur && !new_topology; j++) { >>> + for (j = 0; j < n && !new_topology; j++) { >>> if (cpumask_equal(doms_new[i], doms_cur[j]) >>> && dattrs_equal(dattr_new, i, dattr_cur, j)) >>> goto match2; >>> -- >>> 1.7.9.6 (Apple Git-31.1) >>> > > Cheers, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng wrote: > If doms_new is NULL, partition_sched_domains() will reset ndoms_cur > to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). > As ndoms_cur is 0, the cpumask will not be freed. > > Signed-off-by: Xiaotian Feng > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: linux-kernel@vger.kernel.org Any comments? Cc'ed Rusty. > --- > kernel/sched/core.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b7c32cb..3d6c57b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6184,8 +6184,9 @@ match1: > ; > } > > + n= ndoms_cur; > if (doms_new == NULL) { > - ndoms_cur = 0; > + n = 0; > doms_new = _doms; > cpumask_andnot(doms_new[0], cpu_active_mask, > cpu_isolated_map); > WARN_ON_ONCE(dattr_new); > @@ -6193,7 +6194,7 @@ match1: > > /* Build new domains */ > for (i = 0; i < ndoms_new; i++) { > - for (j = 0; j < ndoms_cur && !new_topology; j++) { > + for (j = 0; j < n && !new_topology; j++) { > if (cpumask_equal(doms_new[i], doms_cur[j]) > && dattrs_equal(dattr_new, i, dattr_cur, j)) > goto match2; > -- > 1.7.9.6 (Apple Git-31.1) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng xtf...@gmail.com wrote: If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: linux-kernel@vger.kernel.org Any comments? Cc'ed Rusty. --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n= ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains
On Tue, Aug 6, 2013 at 12:37 PM, Rusty Russell ru...@rustcorp.com.au wrote: Xiaotian Feng xtf...@gmail.com writes: On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng xtf...@gmail.com wrote: If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: linux-kernel@vger.kernel.org Any comments? Cc'ed Rusty. The code is a little convoluted, but your fix is logical. Yes, it's quite convoluted :( --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n= ndoms_cur; You're missing a ' ' here: n = ndoms_cur; I'll update this, thanks :) if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpumask: fix cpumask leak in partition_sched_domains
If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng Cc: Ingo Molnar Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n= ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = _doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i < ndoms_new; i++) { - for (j = 0; j < ndoms_cur && !new_topology; j++) { + for (j = 0; j < n && !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) && dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpumask: fix cpumask leak in partition_sched_domains
If doms_new is NULL, partition_sched_domains() will reset ndoms_cur to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur). As ndoms_cur is 0, the cpumask will not be freed. Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7c32cb..3d6c57b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6184,8 +6184,9 @@ match1: ; } + n= ndoms_cur; if (doms_new == NULL) { - ndoms_cur = 0; + n = 0; doms_new = fallback_doms; cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map); WARN_ON_ONCE(dattr_new); @@ -6193,7 +6194,7 @@ match1: /* Build new domains */ for (i = 0; i ndoms_new; i++) { - for (j = 0; j ndoms_cur !new_topology; j++) { + for (j = 0; j n !new_topology; j++) { if (cpumask_equal(doms_new[i], doms_cur[j]) dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 (AHCI: Make distinct names for ports in /proc/interrupts) introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support ->port_start. changes in v2: use pp to check dummy ports, update comments Reported-and-tested-by: Alex Williamson Signed-off-by: Xiaotian Feng Cc: Alexander Gordeev Cc: Tejun Heo Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..db4380d 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1146,11 +1146,18 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) return rc; for (i = 0; i < host->n_ports; i++) { + const char* desc; struct ahci_port_priv *pp = host->ports[i]->private_data; + /* pp is NULL for dummy ports */ + if (pp) + desc = pp->irq_desc; + else + desc = dev_driver_string(host->dev); + rc = devm_request_threaded_irq(host->dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp->irq_desc, host->ports[i]); + desc, host->ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] AHCI: fix Null pointer dereference in achi_host_active()
On Tue, Jul 23, 2013 at 5:28 AM, Tejun Heo wrote: > Hello, Xiaotian. > > Thanks for the fix. A couple comments below. > > On Wed, Jul 17, 2013 at 02:10:39PM +0800, Xiaotian Feng wrote: >> for (i = 0; i < host->n_ports; i++) { >> struct ahci_port_priv *pp = host->ports[i]->private_data; >> + const char *desc; >> >> + if (ata_port_is_dummy(host->ports[i])) >> + desc = dev_driver_string(host->dev); >> + else >> + desc = pp->irq_desc; > > I think it'd be better to branch on pp. ie. do "if (pp) desc = > pp->... " instead and then add a comment saying "pp is NULL for > dummies". > Okay, I'll update v2 patch, thanks :) > Thanks! > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] AHCI: fix Null pointer dereference in achi_host_active()
On Tue, Jul 23, 2013 at 5:28 AM, Tejun Heo t...@kernel.org wrote: Hello, Xiaotian. Thanks for the fix. A couple comments below. On Wed, Jul 17, 2013 at 02:10:39PM +0800, Xiaotian Feng wrote: for (i = 0; i host-n_ports; i++) { struct ahci_port_priv *pp = host-ports[i]-private_data; + const char *desc; + if (ata_port_is_dummy(host-ports[i])) + desc = dev_driver_string(host-dev); + else + desc = pp-irq_desc; I think it'd be better to branch on pp. ie. do if (pp) desc = pp-... instead and then add a comment saying pp is NULL for dummies. Okay, I'll update v2 patch, thanks :) Thanks! -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 (AHCI: Make distinct names for ports in /proc/interrupts) introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support -port_start. changes in v2: use pp to check dummy ports, update comments Reported-and-tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Alexander Gordeev agord...@redhat.com Cc: Tejun Heo t...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..db4380d 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1146,11 +1146,18 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) return rc; for (i = 0; i host-n_ports; i++) { + const char* desc; struct ahci_port_priv *pp = host-ports[i]-private_data; + /* pp is NULL for dummy ports */ + if (pp) + desc = pp-irq_desc; + else + desc = dev_driver_string(host-dev); + rc = devm_request_threaded_irq(host-dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp-irq_desc, host-ports[i]); + desc, host-ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ipc,shm] BUG: lock held when returning to user space!
On Sat, Jul 20, 2013 at 9:13 PM, Fengguang Wu wrote: > Greetings, > > I got the below dmesg and the first bad commit is > > commit c5d0282a0405b0a81fa3390e4230e4cbb3ced7a2 > Author: Davidlohr Bueso > Date: Fri Jul 19 09:56:58 2013 +1000 > > ipc,shm: shorten critical region for shmat > > Similar to other system calls, acquire the kern_ipc_perm lock after doing > the initial permission and security checks. > > Signed-off-by: Davidlohr Bueso > Tested-by: Sedat Dilek > Cc: Rik van Riel > Cc: Manfred Spraul > Signed-off-by: Andrew Morton > > [ 20.702156] > [ 20.702493] > [ 20.703511] [ BUG: lock held when returning to user space! ] > [ 20.704532] 3.11.0-rc1-next-20130719 #50 Not tainted > [ 20.705416] > [ 20.706425] trinity-child0/174 is leaving the kernel with locks still held! > [ 20.707638] 1 lock held by trinity-child0/174: > [ 20.708475] #0: (rcu_read_lock){.+.+..}, at: [] > do_shmat+0xe1/0x500 > ns = current->nsproxy->ipc_ns; - shp = shm_lock_check(ns, shmid); + rcu_read_lock(); + shp = shm_obtain_object_check(ns, shmid); if (IS_ERR(shp)) { err = PTR_ERR(shp); goto out; If shm_obtain_object_check() failed, goto out will return with rcu_read_lock() held. I think following patch should cure this. diff --git a/ipc/shm.c b/ipc/shm.c index 59f2194..cb2ceda 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1093,7 +1093,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr, shp = shm_obtain_object_check(ns, shmid); if (IS_ERR(shp)) { err = PTR_ERR(shp); - goto out; + goto out_unlock; } err = -EACCES; > git bisect start c1f631b9a68251007a6353041ae90f9f7dca771c > d03792f9db9b892f494d3aa19d767ddf0365d1ff -- > git bisect good 10a3f1f902465ae1320cc95a3284fd3697e05dd8 # 11:14 65+ > binfmt_elf.c: use get_random_int() to fix entropy depleting > git bisect bad dac28788378838efb63e37a7eabd7729d97aba6b # 11:32 0- > dcache: remove dentries from LRU before putting on dispose list > git bisect good 3140b2ed6dfe5c9e5eca371c77ca85dca05321d4 # 11:50 65+ > ipc,shm: introduce shmctl_nolock > git bisect bad 48a91248649fa3327bd8a31c114ee9149a07f3a7 # 12:04 0- > staging/lustre/ldlm: convert to shrinkers to count/scan API > git bisect good 98b78126a51aa5d3ee6d5dae5768e0d16deeeaa3 # 12:14 65+ > ipc,shm: cleanup do_shmat pasta > git bisect bad 36ccfd799cad33e2edd5c14ac8776b33e63d195b # 12:14 0- > ipc: rename ids->rw_mutex > git bisect bad c5d0282a0405b0a81fa3390e4230e4cbb3ced7a2 # 12:14 0- > ipc,shm: shorten critical region for shmat > git bisect good 98b78126a51aa5d3ee6d5dae5768e0d16deeeaa3 # 15:34195+ > ipc,shm: cleanup do_shmat pasta > git bisect bad c1f631b9a68251007a6353041ae90f9f7dca771c # 15:34 0- > Add linux-next specific files for 20130719 > git bisect good 709b465ee655387c4ec056383fa27f16c64f48db # 18:21195+ > Revert "ipc,shm: shorten critical region for shmat" > git bisect good d471ce53b1fab60110e4e9f647a345cea31752de # 18:44195+ > Merge branch 'for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml > git bisect bad c1f631b9a68251007a6353041ae90f9f7dca771c # 18:44 0- > Add linux-next specific files for 20130719 > > Thanks, > Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ipc,shm] BUG: lock held when returning to user space!
On Sat, Jul 20, 2013 at 9:13 PM, Fengguang Wu fengguang...@intel.com wrote: Greetings, I got the below dmesg and the first bad commit is commit c5d0282a0405b0a81fa3390e4230e4cbb3ced7a2 Author: Davidlohr Bueso davidlohr.bu...@hp.com Date: Fri Jul 19 09:56:58 2013 +1000 ipc,shm: shorten critical region for shmat Similar to other system calls, acquire the kern_ipc_perm lock after doing the initial permission and security checks. Signed-off-by: Davidlohr Bueso davidlohr.bu...@hp.com Tested-by: Sedat Dilek sedat.di...@gmail.com Cc: Rik van Riel r...@redhat.com Cc: Manfred Spraul manf...@colorfullife.com Signed-off-by: Andrew Morton a...@linux-foundation.org [ 20.702156] [ 20.702493] [ 20.703511] [ BUG: lock held when returning to user space! ] [ 20.704532] 3.11.0-rc1-next-20130719 #50 Not tainted [ 20.705416] [ 20.706425] trinity-child0/174 is leaving the kernel with locks still held! [ 20.707638] 1 lock held by trinity-child0/174: [ 20.708475] #0: (rcu_read_lock){.+.+..}, at: [814a8491] do_shmat+0xe1/0x500 ns = current-nsproxy-ipc_ns; - shp = shm_lock_check(ns, shmid); + rcu_read_lock(); + shp = shm_obtain_object_check(ns, shmid); if (IS_ERR(shp)) { err = PTR_ERR(shp); goto out; If shm_obtain_object_check() failed, goto out will return with rcu_read_lock() held. I think following patch should cure this. diff --git a/ipc/shm.c b/ipc/shm.c index 59f2194..cb2ceda 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1093,7 +1093,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr, shp = shm_obtain_object_check(ns, shmid); if (IS_ERR(shp)) { err = PTR_ERR(shp); - goto out; + goto out_unlock; } err = -EACCES; git bisect start c1f631b9a68251007a6353041ae90f9f7dca771c d03792f9db9b892f494d3aa19d767ddf0365d1ff -- git bisect good 10a3f1f902465ae1320cc95a3284fd3697e05dd8 # 11:14 65+ binfmt_elf.c: use get_random_int() to fix entropy depleting git bisect bad dac28788378838efb63e37a7eabd7729d97aba6b # 11:32 0- dcache: remove dentries from LRU before putting on dispose list git bisect good 3140b2ed6dfe5c9e5eca371c77ca85dca05321d4 # 11:50 65+ ipc,shm: introduce shmctl_nolock git bisect bad 48a91248649fa3327bd8a31c114ee9149a07f3a7 # 12:04 0- staging/lustre/ldlm: convert to shrinkers to count/scan API git bisect good 98b78126a51aa5d3ee6d5dae5768e0d16deeeaa3 # 12:14 65+ ipc,shm: cleanup do_shmat pasta git bisect bad 36ccfd799cad33e2edd5c14ac8776b33e63d195b # 12:14 0- ipc: rename ids-rw_mutex git bisect bad c5d0282a0405b0a81fa3390e4230e4cbb3ced7a2 # 12:14 0- ipc,shm: shorten critical region for shmat git bisect good 98b78126a51aa5d3ee6d5dae5768e0d16deeeaa3 # 15:34195+ ipc,shm: cleanup do_shmat pasta git bisect bad c1f631b9a68251007a6353041ae90f9f7dca771c # 15:34 0- Add linux-next specific files for 20130719 git bisect good 709b465ee655387c4ec056383fa27f16c64f48db # 18:21195+ Revert ipc,shm: shorten critical region for shmat git bisect good d471ce53b1fab60110e4e9f647a345cea31752de # 18:44195+ Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml git bisect bad c1f631b9a68251007a6353041ae90f9f7dca771c # 18:44 0- Add linux-next specific files for 20130719 Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 (AHCI: Make distinct names for ports in /proc/interrupts) introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support ->port_start. Reported-and-tested-by: Alex Williamson Signed-off-by: Xiaotian Feng Cc: Alexander Gordeev Cc: Tejun Heo Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..f1de689 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1147,10 +1147,16 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) for (i = 0; i < host->n_ports; i++) { struct ahci_port_priv *pp = host->ports[i]->private_data; + const char *desc; + if (ata_port_is_dummy(host->ports[i])) + desc = dev_driver_string(host->dev); + else + desc = pp->irq_desc; + rc = devm_request_threaded_irq(host->dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp->irq_desc, host->ports[i]); + desc, host->ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 (AHCI: Make distinct names for ports in /proc/interrupts) introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support -port_start. Reported-and-tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Alexander Gordeev agord...@redhat.com Cc: Tejun Heo t...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..f1de689 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1147,10 +1147,16 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) for (i = 0; i host-n_ports; i++) { struct ahci_port_priv *pp = host-ports[i]-private_data; + const char *desc; + if (ata_port_is_dummy(host-ports[i])) + desc = dev_driver_string(host-dev); + else + desc = pp-irq_desc; + rc = devm_request_threaded_irq(host-dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp-irq_desc, host-ports[i]); + desc, host-ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support ->port_start. Reported-and-tested-by: Alex Williamson Signed-off-by: Xiaotian Feng Cc: Alexander Gordeev Cc: Tejun Heo Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..f1de689 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1147,10 +1147,16 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) for (i = 0; i < host->n_ports; i++) { struct ahci_port_priv *pp = host->ports[i]->private_data; + const char *desc; + if (ata_port_is_dummy(host->ports[i])) + desc = dev_driver_string(host->dev); + else + desc = pp->irq_desc; + rc = devm_request_threaded_irq(host->dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp->irq_desc, host->ports[i]); + desc, host->ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ahci_host_activate NULL pointer (was Re: Linux 3.11-rc1)
On Mon, Jul 15, 2013 at 3:44 PM, Alex Williamson wrote: > On Mon, 2013-07-15 at 13:23 -0600, Alex Williamson wrote: >> On Mon, 2013-07-15 at 14:46 -0400, Xiaotian Feng wrote: >> > On Tue, Jul 16, 2013 at 1:38 AM, Alex Williamson >> > > > > wrote: >> > >> > > On Mon, 2013-07-15 at 10:49 -0600, Alex Williamson wrote: >> > > > On Sun, 2013-07-14 at 16:57 -0700, Linus Torvalds wrote: >> > > > > It's been two weeks, and the merge window has closed. If I missed >> > > > > anything, holler, but I don't have anything pending that I am aware >> > > > > of. >> > > > > >> > > > > This merge window was smaller in terms of number of commits than the >> > > > > 3.10 merge window, but we actually have more new lines. Most of that >> > > > > seems to be in staging - a full third of all changes by line-count is >> > > > > staging, and merging in Lustre is the bulk of that. Let's see how >> > > > > that >> > > > > all turns out, I have to say that we don't have a great track record >> > > > > on merging filesystems through staging. >> > > > > >> > > > > Ignoring the lustre merge, I think this really was a somewhat calmer >> > > > > merge window. We had a few trees with problems, and we have an >> > > > > on-going debate about stable patches that was triggered largely >> > > > > thanks >> > > > > to this merge window, so now we'll have something to discuss for the >> > > > > kernel summit. But on the whole, I suspect we might be starting to >> > > > > see >> > > > > the traditional summer slump (Australia notwithstanding). >> > > > > >> > > > > Despite being a bit smaller than the last merge window, it's not like >> > > > > this was a _tiny_ one, and so as usual I'm only summarizing with the >> > > > > normal -rc1 mergelog: and as usual the people credited here are *not* >> > > > > the people who actually wrote the code (although in some cases that >> > > > > is >> > > > > true), they are the people who I merged the code from. >> > > > > >> > > > > Hey, let's all start testing, >> > > > >> > > > Anyone else seeing this: >> > > > >> > > > [2.212548] ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 >> > > > Gbps >> > > 0x29 impl SATA mode >> > > > [2.220732] ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio >> > > slum part ccc sxs >> > > > [2.228997] BUG: unable to handle kernel NULL pointer dereference at >> > > 0508 >> > > > [2.236850] IP: [] ahci_host_activate+0x87/0x140 >> > > > [2.243047] PGD 0 >> > > > [2.245077] Oops: [#1] SMP >> > > > [2.248335] Modules linked in: >> > > > [2.251405] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1+ >> > > > #574 >> > > > [2.257929] Hardware name: LENOVO 4157CTO/LENOVO, BIOS 60KT41AUS >> > > 01/04/2011 >> > > > [2.264880] task: 880371508000 ti: 88037151 task.ti: >> > > 88037151 >> > > > [2.272353] RIP: 0010:[] [] >> > > ahci_host_activate+0x87/0x140 >> > > > [2.280969] RSP: 0018:880371511b38 EFLAGS: 00010293 >> > > > [2.286273] RAX: 88036e724000 RBX: 88036e71c028 RCX: >> > > 8140bce0 >> > > > [2.293397] RDX: RSI: 002f RDI: >> > > 88037122f098 >> > > > [2.300521] RBP: 880371511b68 R08: 0080 R09: >> > > 0001 >> > > > [2.307645] R10: R11: R12: >> > > 0001 >> > > > [2.314772] R13: 002e R14: R15: >> > > 88037122f000 >> > > > [2.321896] FS: () GS:88037fdc() >> > > knlGS: >> > > > [2.329973] CS: 0010 DS: ES: CR0: 8005003b >> > > > [2.335711] CR2: 0508 CR3: 01c0b000 CR4: >> > > 07e0 >> > > > [2.342835] Stack: >> > > > [2.344848] 88036e72 8803
Re: ahci_host_activate NULL pointer (was Re: Linux 3.11-rc1)
On Mon, Jul 15, 2013 at 3:44 PM, Alex Williamson wrote: > On Mon, 2013-07-15 at 13:23 -0600, Alex Williamson wrote: >> On Mon, 2013-07-15 at 14:46 -0400, Xiaotian Feng wrote: >> > On Tue, Jul 16, 2013 at 1:38 AM, Alex Williamson >> > > > > wrote: >> > >> > > On Mon, 2013-07-15 at 10:49 -0600, Alex Williamson wrote: >> > > > On Sun, 2013-07-14 at 16:57 -0700, Linus Torvalds wrote: >> > > > > It's been two weeks, and the merge window has closed. If I missed >> > > > > anything, holler, but I don't have anything pending that I am aware >> > > > > of. >> > > > > >> > > > > This merge window was smaller in terms of number of commits than the >> > > > > 3.10 merge window, but we actually have more new lines. Most of that >> > > > > seems to be in staging - a full third of all changes by line-count is >> > > > > staging, and merging in Lustre is the bulk of that. Let's see how >> > > > > that >> > > > > all turns out, I have to say that we don't have a great track record >> > > > > on merging filesystems through staging. >> > > > > >> > > > > Ignoring the lustre merge, I think this really was a somewhat calmer >> > > > > merge window. We had a few trees with problems, and we have an >> > > > > on-going debate about stable patches that was triggered largely >> > > > > thanks >> > > > > to this merge window, so now we'll have something to discuss for the >> > > > > kernel summit. But on the whole, I suspect we might be starting to >> > > > > see >> > > > > the traditional summer slump (Australia notwithstanding). >> > > > > >> > > > > Despite being a bit smaller than the last merge window, it's not like >> > > > > this was a _tiny_ one, and so as usual I'm only summarizing with the >> > > > > normal -rc1 mergelog: and as usual the people credited here are *not* >> > > > > the people who actually wrote the code (although in some cases that >> > > > > is >> > > > > true), they are the people who I merged the code from. >> > > > > >> > > > > Hey, let's all start testing, >> > > > >> > > > Anyone else seeing this: >> > > > >> > > > [2.212548] ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 >> > > > Gbps >> > > 0x29 impl SATA mode >> > > > [2.220732] ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio >> > > slum part ccc sxs >> > > > [2.228997] BUG: unable to handle kernel NULL pointer dereference at >> > > 0508 >> > > > [2.236850] IP: [] ahci_host_activate+0x87/0x140 >> > > > [2.243047] PGD 0 >> > > > [2.245077] Oops: [#1] SMP >> > > > [2.248335] Modules linked in: >> > > > [2.251405] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1+ >> > > > #574 >> > > > [2.257929] Hardware name: LENOVO 4157CTO/LENOVO, BIOS 60KT41AUS >> > > 01/04/2011 >> > > > [2.264880] task: 880371508000 ti: 88037151 task.ti: >> > > 88037151 >> > > > [2.272353] RIP: 0010:[] [] >> > > ahci_host_activate+0x87/0x140 >> > > > [2.280969] RSP: 0018:880371511b38 EFLAGS: 00010293 >> > > > [2.286273] RAX: 88036e724000 RBX: 88036e71c028 RCX: >> > > 8140bce0 >> > > > [2.293397] RDX: RSI: 002f RDI: >> > > 88037122f098 >> > > > [2.300521] RBP: 880371511b68 R08: 0080 R09: >> > > 0001 >> > > > [2.307645] R10: R11: R12: >> > > 0001 >> > > > [2.314772] R13: 002e R14: R15: >> > > 88037122f000 >> > > > [2.321896] FS: () GS:88037fdc() >> > > knlGS: >> > > > [2.329973] CS: 0010 DS: ES: CR0: 8005003b >> > > > [2.335711] CR2: 0508 CR3: 01c0b000 CR4: >> > > 07e0 >> > > > [2.342835] Stack: >> > > > [2.344848] 88036e72 8803
Re: ahci_host_activate NULL pointer (was Re: Linux 3.11-rc1)
On Mon, Jul 15, 2013 at 3:44 PM, Alex Williamson alex.william...@redhat.com wrote: On Mon, 2013-07-15 at 13:23 -0600, Alex Williamson wrote: On Mon, 2013-07-15 at 14:46 -0400, Xiaotian Feng wrote: On Tue, Jul 16, 2013 at 1:38 AM, Alex Williamson alex.william...@redhat.com wrote: On Mon, 2013-07-15 at 10:49 -0600, Alex Williamson wrote: On Sun, 2013-07-14 at 16:57 -0700, Linus Torvalds wrote: It's been two weeks, and the merge window has closed. If I missed anything, holler, but I don't have anything pending that I am aware of. This merge window was smaller in terms of number of commits than the 3.10 merge window, but we actually have more new lines. Most of that seems to be in staging - a full third of all changes by line-count is staging, and merging in Lustre is the bulk of that. Let's see how that all turns out, I have to say that we don't have a great track record on merging filesystems through staging. Ignoring the lustre merge, I think this really was a somewhat calmer merge window. We had a few trees with problems, and we have an on-going debate about stable patches that was triggered largely thanks to this merge window, so now we'll have something to discuss for the kernel summit. But on the whole, I suspect we might be starting to see the traditional summer slump (Australia notwithstanding). Despite being a bit smaller than the last merge window, it's not like this was a _tiny_ one, and so as usual I'm only summarizing with the normal -rc1 mergelog: and as usual the people credited here are *not* the people who actually wrote the code (although in some cases that is true), they are the people who I merged the code from. Hey, let's all start testing, Anyone else seeing this: [2.212548] ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x29 impl SATA mode [2.220732] ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio slum part ccc sxs [2.228997] BUG: unable to handle kernel NULL pointer dereference at 0508 [2.236850] IP: [814084f7] ahci_host_activate+0x87/0x140 [2.243047] PGD 0 [2.245077] Oops: [#1] SMP [2.248335] Modules linked in: [2.251405] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1+ #574 [2.257929] Hardware name: LENOVO 4157CTO/LENOVO, BIOS 60KT41AUS 01/04/2011 [2.264880] task: 880371508000 ti: 88037151 task.ti: 88037151 [2.272353] RIP: 0010:[814084f7] [814084f7] ahci_host_activate+0x87/0x140 [2.280969] RSP: 0018:880371511b38 EFLAGS: 00010293 [2.286273] RAX: 88036e724000 RBX: 88036e71c028 RCX: 8140bce0 [2.293397] RDX: RSI: 002f RDI: 88037122f098 [2.300521] RBP: 880371511b68 R08: 0080 R09: 0001 [2.307645] R10: R11: R12: 0001 [2.314772] R13: 002e R14: R15: 88037122f000 [2.321896] FS: () GS:88037fdc() knlGS: [2.329973] CS: 0010 DS: ES: CR0: 8005003b [2.335711] CR2: 0508 CR3: 01c0b000 CR4: 07e0 [2.342835] Stack: [2.344848] 88036e72 88037122f000 0005 88037122f098 [2.352301] 88036e734000 88036e71c028 880371511c38 81408eae [2.359756] 8803 88037122f098 8803710be7a8 88030010 [2.367210] Call Trace: [2.369655] [81408eae] ahci_init_one+0x8fe/0xaa0 [2.375221] [816141cf] ? _raw_spin_unlock_irqrestore+0x3f/0x70 [2.382006] [8132529b] local_pci_probe+0x4b/0x80 [2.387571] [81325501] pci_device_probe+0x111/0x120 [2.393405] [813bf43b] driver_probe_device+0x8b/0x390 [2.399411] [813bf7eb] __driver_attach+0xab/0xb0 [2.404984] [813bf740] ? driver_probe_device+0x390/0x390 [2.411241] [813bd2cd] bus_for_each_dev+0x5d/0xa0 [2.416893] [813bed7e] driver_attach+0x1e/0x20 [2.422284] [813be917] bus_add_driver+0x117/0x290 [2.427937] [81d54755] ? libata_transport_init+0x5e/0x5e [2.434201] [813bfc8a] driver_register+0x7a/0x170 [2.439853] [81d54755] ? libata_transport_init+0x5e/0x5e [2.446110] [81324ce4] __pci_register_driver+0x64/0x70 [2.452196] [81d5476e] ahci_pci_driver_init+0x19/0x1b [2.458196] [810002fa] do_one_initcall+0xfa/0x1b0 [2.463853] [8107b100] ? parse_args+0x1f0/0x450
Re: ahci_host_activate NULL pointer (was Re: Linux 3.11-rc1)
On Mon, Jul 15, 2013 at 3:44 PM, Alex Williamson alex.william...@redhat.com wrote: On Mon, 2013-07-15 at 13:23 -0600, Alex Williamson wrote: On Mon, 2013-07-15 at 14:46 -0400, Xiaotian Feng wrote: On Tue, Jul 16, 2013 at 1:38 AM, Alex Williamson alex.william...@redhat.com wrote: On Mon, 2013-07-15 at 10:49 -0600, Alex Williamson wrote: On Sun, 2013-07-14 at 16:57 -0700, Linus Torvalds wrote: It's been two weeks, and the merge window has closed. If I missed anything, holler, but I don't have anything pending that I am aware of. This merge window was smaller in terms of number of commits than the 3.10 merge window, but we actually have more new lines. Most of that seems to be in staging - a full third of all changes by line-count is staging, and merging in Lustre is the bulk of that. Let's see how that all turns out, I have to say that we don't have a great track record on merging filesystems through staging. Ignoring the lustre merge, I think this really was a somewhat calmer merge window. We had a few trees with problems, and we have an on-going debate about stable patches that was triggered largely thanks to this merge window, so now we'll have something to discuss for the kernel summit. But on the whole, I suspect we might be starting to see the traditional summer slump (Australia notwithstanding). Despite being a bit smaller than the last merge window, it's not like this was a _tiny_ one, and so as usual I'm only summarizing with the normal -rc1 mergelog: and as usual the people credited here are *not* the people who actually wrote the code (although in some cases that is true), they are the people who I merged the code from. Hey, let's all start testing, Anyone else seeing this: [2.212548] ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x29 impl SATA mode [2.220732] ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio slum part ccc sxs [2.228997] BUG: unable to handle kernel NULL pointer dereference at 0508 [2.236850] IP: [814084f7] ahci_host_activate+0x87/0x140 [2.243047] PGD 0 [2.245077] Oops: [#1] SMP [2.248335] Modules linked in: [2.251405] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1+ #574 [2.257929] Hardware name: LENOVO 4157CTO/LENOVO, BIOS 60KT41AUS 01/04/2011 [2.264880] task: 880371508000 ti: 88037151 task.ti: 88037151 [2.272353] RIP: 0010:[814084f7] [814084f7] ahci_host_activate+0x87/0x140 [2.280969] RSP: 0018:880371511b38 EFLAGS: 00010293 [2.286273] RAX: 88036e724000 RBX: 88036e71c028 RCX: 8140bce0 [2.293397] RDX: RSI: 002f RDI: 88037122f098 [2.300521] RBP: 880371511b68 R08: 0080 R09: 0001 [2.307645] R10: R11: R12: 0001 [2.314772] R13: 002e R14: R15: 88037122f000 [2.321896] FS: () GS:88037fdc() knlGS: [2.329973] CS: 0010 DS: ES: CR0: 8005003b [2.335711] CR2: 0508 CR3: 01c0b000 CR4: 07e0 [2.342835] Stack: [2.344848] 88036e72 88037122f000 0005 88037122f098 [2.352301] 88036e734000 88036e71c028 880371511c38 81408eae [2.359756] 8803 88037122f098 8803710be7a8 88030010 [2.367210] Call Trace: [2.369655] [81408eae] ahci_init_one+0x8fe/0xaa0 [2.375221] [816141cf] ? _raw_spin_unlock_irqrestore+0x3f/0x70 [2.382006] [8132529b] local_pci_probe+0x4b/0x80 [2.387571] [81325501] pci_device_probe+0x111/0x120 [2.393405] [813bf43b] driver_probe_device+0x8b/0x390 [2.399411] [813bf7eb] __driver_attach+0xab/0xb0 [2.404984] [813bf740] ? driver_probe_device+0x390/0x390 [2.411241] [813bd2cd] bus_for_each_dev+0x5d/0xa0 [2.416893] [813bed7e] driver_attach+0x1e/0x20 [2.422284] [813be917] bus_add_driver+0x117/0x290 [2.427937] [81d54755] ? libata_transport_init+0x5e/0x5e [2.434201] [813bfc8a] driver_register+0x7a/0x170 [2.439853] [81d54755] ? libata_transport_init+0x5e/0x5e [2.446110] [81324ce4] __pci_register_driver+0x64/0x70 [2.452196] [81d5476e] ahci_pci_driver_init+0x19/0x1b [2.458196] [810002fa] do_one_initcall+0xfa/0x1b0 [2.463853] [8107b100] ? parse_args+0x1f0/0x450
[PATCH] AHCI: fix Null pointer dereference in achi_host_active()
commit b29900e6 introuded a regression, which resulted Null pointer dereference for achi host with dummy ports. For ahci ports, when the port is dummy port, its private_data will be NULL, as ata_dummy_port_ops doesn't support -port_start. Reported-and-tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Xiaotian Feng xtf...@gmail.com Cc: Alexander Gordeev agord...@redhat.com Cc: Tejun Heo t...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/ahci.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5064f3e..f1de689 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1147,10 +1147,16 @@ int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis) for (i = 0; i host-n_ports; i++) { struct ahci_port_priv *pp = host-ports[i]-private_data; + const char *desc; + if (ata_port_is_dummy(host-ports[i])) + desc = dev_driver_string(host-dev); + else + desc = pp-irq_desc; + rc = devm_request_threaded_irq(host-dev, irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED, - pp-irq_desc, host-ports[i]); + desc, host-ports[i]); if (rc) goto out_free_irqs; } -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [SCSI] mptfusion: delete rphy info in mptsas_del_end_device()
Otherwise, we might get wrong rphy attributes through sysfs when a end device is not responding. Signed-off-by: Xiaotian Feng Cc: Nagalakshmi Nandigama Cc: Sreekanth Reddy Cc: supp...@lsi.com Cc: dl-mptfusionli...@lsi.com Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/message/fusion/mptsas.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index fa43c39..4845d18 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1572,6 +1572,8 @@ mptsas_del_end_device(MPT_ADAPTER *ioc, struct mptsas_phyinfo *phy_info) port->port_identifier, (unsigned long long)sas_address); sas_port_delete(port); mptsas_set_port(ioc, phy_info, NULL); + sas_rphy_delete(rphy); + mptsas_set_rphy(ioc, phy_info, NULL); mptsas_port_delete(ioc, phy_info->port_details); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched/core: remove obsolete nr_uninterruptible()
On Wed, Feb 20, 2013 at 3:37 PM, Sha Zhengju wrote: > From: Sha Zhengju > > Signed-off-by: Sha Zhengju > --- > kernel/sched/core.c | 17 - > 1 file changed, 17 deletions(-) You missed include/linux/sched.h :) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index c0b07c3..792f6fc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1957,23 +1957,6 @@ unsigned long nr_running(void) > return sum; > } > > -unsigned long nr_uninterruptible(void) > -{ > - unsigned long i, sum = 0; > - > - for_each_possible_cpu(i) > - sum += cpu_rq(i)->nr_uninterruptible; > - > - /* > -* Since we read the counters lockless, it might be slightly > -* inaccurate. Do not allow it to go below zero though: > -*/ > - if (unlikely((long)sum < 0)) > - sum = 0; > - > - return sum; > -} > - > unsigned long long nr_context_switches(void) > { > int i; > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched/core: remove obsolete nr_uninterruptible()
On Wed, Feb 20, 2013 at 3:37 PM, Sha Zhengju handai@gmail.com wrote: From: Sha Zhengju handai@taobao.com Signed-off-by: Sha Zhengju handai@taobao.com --- kernel/sched/core.c | 17 - 1 file changed, 17 deletions(-) You missed include/linux/sched.h :) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c0b07c3..792f6fc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1957,23 +1957,6 @@ unsigned long nr_running(void) return sum; } -unsigned long nr_uninterruptible(void) -{ - unsigned long i, sum = 0; - - for_each_possible_cpu(i) - sum += cpu_rq(i)-nr_uninterruptible; - - /* -* Since we read the counters lockless, it might be slightly -* inaccurate. Do not allow it to go below zero though: -*/ - if (unlikely((long)sum 0)) - sum = 0; - - return sum; -} - unsigned long long nr_context_switches(void) { int i; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [SCSI] mptfusion: delete rphy info in mptsas_del_end_device()
Otherwise, we might get wrong rphy attributes through sysfs when a end device is not responding. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Nagalakshmi Nandigama nagalakshmi.nandig...@lsi.com Cc: Sreekanth Reddy sreekanth.re...@lsi.com Cc: supp...@lsi.com Cc: dl-mptfusionli...@lsi.com Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/message/fusion/mptsas.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index fa43c39..4845d18 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1572,6 +1572,8 @@ mptsas_del_end_device(MPT_ADAPTER *ioc, struct mptsas_phyinfo *phy_info) port-port_identifier, (unsigned long long)sas_address); sas_port_delete(port); mptsas_set_port(ioc, phy_info, NULL); + sas_rphy_delete(rphy); + mptsas_set_rphy(ioc, phy_info, NULL); mptsas_port_delete(ioc, phy_info-port_details); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 repost] sched: cputime: avoid multiplication overflow (in common cases)
On Thu, Jan 10, 2013 at 2:33 AM, Frederic Weisbecker wrote: > On Mon, Jan 07, 2013 at 12:31:45PM +0100, Stanislaw Gruszka wrote: >> We scale stime, utime values based on rtime (sum_exec_runtime converted >> to jiffies). During scaling we multiple rtime * utime, what seems to be >> fine, since both values are converted to u64, but is not. >> >> Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads, run >> for 1 day, threads utilize 100% cpu on user space. Machine has 64 cpus. >> >> Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies, what is >> 0x14997. Multiplication rtime * utime result is 0x1a8557711, >> which can not be covered in 64 bits. >> >> Result of overflow is stall of utime values visible in user space >> (prev_utime in kernel), even if application still consume lot of CPU >> time. >> >> Probably good fix for the problem, will be using 128 bit variable and >> proper mul128 and div_u128_u64 primitives. While mul128 is on it's >> way to kernel, there is no 128 bit division yet. I'm not sure, if we >> want to add it to kernel. Perhaps we could also change the way how >> stime and utime are calculated, but I don't know how, so I come with >> the below solution for the problem. >> >> To avoid overflow patch change value we scale to min(stime, utime). This >> is more like workaround, but will work for processes, which perform >> mostly on user space or mostly on kernel space. Unfortunately processes, >> which perform on kernel and user space equally, and additionally utilize >> lot of CPU time, still will hit this overflow pretty quickly. However >> such processes seems to be uncommon. >> >> Signed-off-by: Stanislaw Gruszka > > I can easily imagine that overflow to happen with user time on intensive > CPU bound loads, or may be guests. > > But can we easily reach the same for system time? Even on intensive I/O bound > loads we shouldn't spend that much time in the kernel. Most of it probably > goes > to idle. > > What do you think? > > If that assumption is right in most cases, the following patch should solve > the > issue: > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 293b202..0650dd4 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -509,11 +509,11 @@ EXPORT_SYMBOL_GPL(vtime_account); > # define nsecs_to_cputime(__nsecs) nsecs_to_jiffies(__nsecs) > #endif > > -static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t > total) > +static cputime_t scale_utime(cputime_t stime, cputime_t rtime, cputime_t > total) s/scale_utime/scale_stime. > { > u64 temp = (__force u64) rtime; > > - temp *= (__force u64) utime; > + temp *= (__force u64) stime; > > if (sizeof(cputime_t) == 4) > temp = div_u64(temp, (__force u32) total); > @@ -531,10 +531,10 @@ static void cputime_adjust(struct task_cputime *curr, >struct cputime *prev, >cputime_t *ut, cputime_t *st) > { > - cputime_t rtime, utime, total; > + cputime_t rtime, stime, total; > > - utime = curr->utime; > - total = utime + curr->stime; > + stime = curr->stime; > + total = stime + curr->utime; > > /* > * Tick based cputime accounting depend on random scheduling > @@ -549,17 +549,17 @@ static void cputime_adjust(struct task_cputime *curr, > rtime = nsecs_to_cputime(curr->sum_exec_runtime); > > if (total) > - utime = scale_utime(utime, rtime, total); > + stime = scale_stime(stime, rtime, total); > else > - utime = rtime; > + stime = rtime; > > /* > * If the tick based count grows faster than the scheduler one, > * the result of the scaling may go backward. > * Let's enforce monotonicity. > */ > - prev->utime = max(prev->utime, utime); > - prev->stime = max(prev->stime, rtime - prev->utime); > + prev->stime = max(prev->stime, stime); > + prev->utime = max(prev->utime, rtime - prev->stime); > > *ut = prev->utime; > *st = prev->stime; > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 repost] sched: cputime: avoid multiplication overflow (in common cases)
On Thu, Jan 10, 2013 at 2:33 AM, Frederic Weisbecker fweis...@gmail.com wrote: On Mon, Jan 07, 2013 at 12:31:45PM +0100, Stanislaw Gruszka wrote: We scale stime, utime values based on rtime (sum_exec_runtime converted to jiffies). During scaling we multiple rtime * utime, what seems to be fine, since both values are converted to u64, but is not. Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads, run for 1 day, threads utilize 100% cpu on user space. Machine has 64 cpus. Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies, what is 0x14997. Multiplication rtime * utime result is 0x1a8557711, which can not be covered in 64 bits. Result of overflow is stall of utime values visible in user space (prev_utime in kernel), even if application still consume lot of CPU time. Probably good fix for the problem, will be using 128 bit variable and proper mul128 and div_u128_u64 primitives. While mul128 is on it's way to kernel, there is no 128 bit division yet. I'm not sure, if we want to add it to kernel. Perhaps we could also change the way how stime and utime are calculated, but I don't know how, so I come with the below solution for the problem. To avoid overflow patch change value we scale to min(stime, utime). This is more like workaround, but will work for processes, which perform mostly on user space or mostly on kernel space. Unfortunately processes, which perform on kernel and user space equally, and additionally utilize lot of CPU time, still will hit this overflow pretty quickly. However such processes seems to be uncommon. Signed-off-by: Stanislaw Gruszka sgrus...@redhat.com I can easily imagine that overflow to happen with user time on intensive CPU bound loads, or may be guests. But can we easily reach the same for system time? Even on intensive I/O bound loads we shouldn't spend that much time in the kernel. Most of it probably goes to idle. What do you think? If that assumption is right in most cases, the following patch should solve the issue: diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 293b202..0650dd4 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -509,11 +509,11 @@ EXPORT_SYMBOL_GPL(vtime_account); # define nsecs_to_cputime(__nsecs) nsecs_to_jiffies(__nsecs) #endif -static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t total) +static cputime_t scale_utime(cputime_t stime, cputime_t rtime, cputime_t total) s/scale_utime/scale_stime. { u64 temp = (__force u64) rtime; - temp *= (__force u64) utime; + temp *= (__force u64) stime; if (sizeof(cputime_t) == 4) temp = div_u64(temp, (__force u32) total); @@ -531,10 +531,10 @@ static void cputime_adjust(struct task_cputime *curr, struct cputime *prev, cputime_t *ut, cputime_t *st) { - cputime_t rtime, utime, total; + cputime_t rtime, stime, total; - utime = curr-utime; - total = utime + curr-stime; + stime = curr-stime; + total = stime + curr-utime; /* * Tick based cputime accounting depend on random scheduling @@ -549,17 +549,17 @@ static void cputime_adjust(struct task_cputime *curr, rtime = nsecs_to_cputime(curr-sum_exec_runtime); if (total) - utime = scale_utime(utime, rtime, total); + stime = scale_stime(stime, rtime, total); else - utime = rtime; + stime = rtime; /* * If the tick based count grows faster than the scheduler one, * the result of the scaling may go backward. * Let's enforce monotonicity. */ - prev-utime = max(prev-utime, utime); - prev-stime = max(prev-stime, rtime - prev-utime); + prev-stime = max(prev-stime, stime); + prev-utime = max(prev-utime, rtime - prev-stime); *ut = prev-utime; *st = prev-stime; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: fix inconsistent lock state
Lockdep found an inconsistent lock state when rcu is processing delayed work in softirq. Currently, kernel is using spin_lock/spin_unlock to protect proc_inum_ida, but proc_free_inum is called by rcu in softirq context. Use spin_lock_bh/spin_unlock_bh fix following lockdep warning. [ 49.709127] = [ 49.709360] [ INFO: inconsistent lock state ] [ 49.709593] 3.7.0 #36 Not tainted [ 49.709846] - [ 49.710080] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 49.710377] swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 49.710640] (proc_inum_lock){+.?...}, at: [] proc_free_inum+0x1c/0x50 [ 49.711287] {SOFTIRQ-ON-W} state was registered at: [ 49.711540] [] __lock_acquire+0x8ae/0xca0 [ 49.711896] [] lock_acquire+0x199/0x200 [ 49.712242] [] _raw_spin_lock+0x41/0x50 [ 49.712590] [] proc_alloc_inum+0x4c/0xd0 [ 49.712941] [] alloc_mnt_ns+0x49/0xc0 [ 49.713282] [] create_mnt_ns+0x25/0x70 [ 49.713625] [] mnt_init+0x161/0x1c7 [ 49.713962] [] vfs_caches_init+0x107/0x11a [ 49.714392] [] start_kernel+0x348/0x38c [ 49.714815] [] x86_64_start_reservations+0x131/0x136 [ 49.715279] [] x86_64_start_kernel+0x103/0x112 [ 49.715728] irq event stamp: 2993422 [ 49.716006] hardirqs last enabled at (2993422): [] _raw_spin_unlock_irqrestore+0x55/0x80 [ 49.716661] hardirqs last disabled at (2993421): [] _raw_spin_lock_irqsave+0x29/0x70 [ 49.717300] softirqs last enabled at (2993394): [] _local_bh_enable+0x13/0x20 [ 49.717920] softirqs last disabled at (2993395): [] call_softirq+0x1c/0x30 [ 49.718528] [ 49.718528] other info that might help us debug this: [ 49.718992] Possible unsafe locking scenario: [ 49.718992] [ 49.719433]CPU0 [ 49.719669] [ 49.719902] lock(proc_inum_lock); [ 49.720308] [ 49.720548] lock(proc_inum_lock); [ 49.720961] [ 49.720961] *** DEADLOCK *** [ 49.720961] [ 49.721477] no locks held by swapper/1/0. [ 49.721769] [ 49.721769] stack backtrace: [ 49.722150] Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36 [ 49.722555] Call Trace: [ 49.722787][] ? vprintk_emit+0x471/0x510 [ 49.723307] [] print_usage_bug+0x2a5/0x2c0 [ 49.723671] [] mark_lock+0x33b/0x5e0 [ 49.724014] [] __lock_acquire+0x813/0xca0 [ 49.724374] [] ? trace_hardirqs_on+0xd/0x10 [ 49.724741] [] lock_acquire+0x199/0x200 [ 49.725095] [] ? proc_free_inum+0x1c/0x50 [ 49.725455] [] _raw_spin_lock+0x41/0x50 [ 49.725806] [] ? proc_free_inum+0x1c/0x50 [ 49.726165] [] proc_free_inum+0x1c/0x50 [ 49.726519] [] ? put_pid+0x42/0x60 [ 49.726857] [] free_pid_ns+0x1c/0x50 [ 49.727201] [] put_pid_ns+0x2e/0x50 [ 49.727540] [] put_pid+0x4a/0x60 [ 49.727868] [] delayed_put_pid+0x12/0x20 [ 49.728225] [] rcu_process_callbacks+0x462/0x790 [ 49.728606] [] __do_softirq+0x1b4/0x3b0 [ 49.728959] [] ? clockevents_program_event+0xc2/0xf0 [ 49.729354] [] call_softirq+0x1c/0x30 [ 49.729702] [] do_softirq+0x59/0xd0 [ 49.730039] [] irq_exit+0x54/0xd0 [ 49.730370] [] smp_apic_timer_interrupt+0x95/0xa3 [ 49.730755] [] apic_timer_interrupt+0x72/0x80 [ 49.731124][] ? retint_restore_args+0x13/0x13 [ 49.731658] [] ? cpuidle_wrap_enter+0x55/0xa0 [ 49.732029] [] ? cpuidle_wrap_enter+0x51/0xa0 [ 49.732399] [] cpuidle_enter_tk+0x10/0x20 [ 49.732759] [] cpuidle_enter_state+0x17/0x50 [ 49.733128] [] cpuidle_idle_call+0x287/0x520 [ 49.733496] [] cpu_idle+0xba/0x130 [ 49.733830] [] start_secondary+0x2b3/0x2bc Signed-off-by: Xiaotian Feng Cc: Andrew Morton Cc: Al Viro Cc: "Eric W. Biederman" Cc: linux-kernel@vger.kernel.org --- fs/proc/generic.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 7b3ae3c..e659a0f 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -359,18 +359,18 @@ retry: if (!ida_pre_get(_inum_ida, GFP_KERNEL)) return -ENOMEM; - spin_lock(_inum_lock); + spin_lock_bh(_inum_lock); error = ida_get_new(_inum_ida, ); - spin_unlock(_inum_lock); + spin_unlock_bh(_inum_lock); if (error == -EAGAIN) goto retry; else if (error) return error; if (i > UINT_MAX - PROC_DYNAMIC_FIRST) { - spin_lock(_inum_lock); + spin_lock_bh(_inum_lock); ida_remove(_inum_ida, i); - spin_unlock(_inum_lock); + spin_unlock_bh(_inum_lock); return -ENOSPC; } *inum = PROC_DYNAMIC_FIRST + i; @@ -379,9 +379,9 @@ retry: void proc_free_inum(unsigned int inum) { - spin_lock(_inum_lock); + spin_lock_bh(_inum_lock); ida_remove(_inum_ida, inum - PROC_DYNAMIC_FIRST); - spin_unlock(_inum_lock); + spin_unlock_bh(_inum_lock); } static void *proc_follow_link(struct dentry *dentry, stru
[PATCH] proc: fix inconsistent lock state
Lockdep found an inconsistent lock state when rcu is processing delayed work in softirq. Currently, kernel is using spin_lock/spin_unlock to protect proc_inum_ida, but proc_free_inum is called by rcu in softirq context. Use spin_lock_bh/spin_unlock_bh fix following lockdep warning. [ 49.709127] = [ 49.709360] [ INFO: inconsistent lock state ] [ 49.709593] 3.7.0 #36 Not tainted [ 49.709846] - [ 49.710080] inconsistent {SOFTIRQ-ON-W} - {IN-SOFTIRQ-W} usage. [ 49.710377] swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 49.710640] (proc_inum_lock){+.?...}, at: [8124280c] proc_free_inum+0x1c/0x50 [ 49.711287] {SOFTIRQ-ON-W} state was registered at: [ 49.711540] [8110dc4e] __lock_acquire+0x8ae/0xca0 [ 49.711896] [8110e1d9] lock_acquire+0x199/0x200 [ 49.712242] [81b295e1] _raw_spin_lock+0x41/0x50 [ 49.712590] [81242efc] proc_alloc_inum+0x4c/0xd0 [ 49.712941] [811f1799] alloc_mnt_ns+0x49/0xc0 [ 49.713282] [811f3365] create_mnt_ns+0x25/0x70 [ 49.713625] [82115443] mnt_init+0x161/0x1c7 [ 49.713962] [82114f51] vfs_caches_init+0x107/0x11a [ 49.714392] [820f3c67] start_kernel+0x348/0x38c [ 49.714815] [820f3335] x86_64_start_reservations+0x131/0x136 [ 49.715279] [820f343d] x86_64_start_kernel+0x103/0x112 [ 49.715728] irq event stamp: 2993422 [ 49.716006] hardirqs last enabled at (2993422): [81b29f75] _raw_spin_unlock_irqrestore+0x55/0x80 [ 49.716661] hardirqs last disabled at (2993421): [81b296f9] _raw_spin_lock_irqsave+0x29/0x70 [ 49.717300] softirqs last enabled at (2993394): [810ab0b3] _local_bh_enable+0x13/0x20 [ 49.717920] softirqs last disabled at (2993395): [81b2c33c] call_softirq+0x1c/0x30 [ 49.718528] [ 49.718528] other info that might help us debug this: [ 49.718992] Possible unsafe locking scenario: [ 49.718992] [ 49.719433]CPU0 [ 49.719669] [ 49.719902] lock(proc_inum_lock); [ 49.720308] Interrupt [ 49.720548] lock(proc_inum_lock); [ 49.720961] [ 49.720961] *** DEADLOCK *** [ 49.720961] [ 49.721477] no locks held by swapper/1/0. [ 49.721769] [ 49.721769] stack backtrace: [ 49.722150] Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36 [ 49.722555] Call Trace: [ 49.722787] IRQ [810a40f1] ? vprintk_emit+0x471/0x510 [ 49.723307] [81109ce5] print_usage_bug+0x2a5/0x2c0 [ 49.723671] [8110a03b] mark_lock+0x33b/0x5e0 [ 49.724014] [8110dbb3] __lock_acquire+0x813/0xca0 [ 49.724374] [8110a8ad] ? trace_hardirqs_on+0xd/0x10 [ 49.724741] [8110e1d9] lock_acquire+0x199/0x200 [ 49.725095] [8124280c] ? proc_free_inum+0x1c/0x50 [ 49.725455] [81b295e1] _raw_spin_lock+0x41/0x50 [ 49.725806] [8124280c] ? proc_free_inum+0x1c/0x50 [ 49.726165] [8124280c] proc_free_inum+0x1c/0x50 [ 49.726519] [810c76a2] ? put_pid+0x42/0x60 [ 49.726857] [81128cac] free_pid_ns+0x1c/0x50 [ 49.727201] [81128d0e] put_pid_ns+0x2e/0x50 [ 49.727540] [810c76aa] put_pid+0x4a/0x60 [ 49.727868] [810c76d2] delayed_put_pid+0x12/0x20 [ 49.728225] [81132ed2] rcu_process_callbacks+0x462/0x790 [ 49.728606] [810ab4e4] __do_softirq+0x1b4/0x3b0 [ 49.728959] [81104132] ? clockevents_program_event+0xc2/0xf0 [ 49.729354] [81b2c33c] call_softirq+0x1c/0x30 [ 49.729702] [81057bf9] do_softirq+0x59/0xd0 [ 49.730039] [810ab024] irq_exit+0x54/0xd0 [ 49.730370] [81b2ca05] smp_apic_timer_interrupt+0x95/0xa3 [ 49.730755] [81b2bbf2] apic_timer_interrupt+0x72/0x80 [ 49.731124] EOI [81b2a473] ? retint_restore_args+0x13/0x13 [ 49.731658] [8199d125] ? cpuidle_wrap_enter+0x55/0xa0 [ 49.732029] [8199d121] ? cpuidle_wrap_enter+0x51/0xa0 [ 49.732399] [8199d180] cpuidle_enter_tk+0x10/0x20 [ 49.732759] [8199cb57] cpuidle_enter_state+0x17/0x50 [ 49.733128] [8199d507] cpuidle_idle_call+0x287/0x520 [ 49.733496] [8105feca] cpu_idle+0xba/0x130 [ 49.733830] [81b2059f] start_secondary+0x2b3/0x2bc Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Andrew Morton a...@linux-foundation.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Eric W. Biederman ebied...@xmission.com Cc: linux-kernel@vger.kernel.org --- fs/proc/generic.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 7b3ae3c..e659a0f 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -359,18 +359,18 @@ retry: if (!ida_pre_get(proc_inum_ida, GFP_KERNEL)) return -ENOMEM; - spin_lock(proc_inum_lock); + spin_lock_bh(proc_inum_lock); error = ida_get_new(proc_inum_ida, i
[PATCH] megaraid: convert to work_struct
There's no need to use delayed work, convert to use work_struct and cancel_work_sync(). Requested-by: Tejun Heo Signed-off-by: Xiaotian Feng Cc: Neela Syam Kolli Cc: "James E.J. Bottomley" Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/scsi/megaraid/megaraid_sas.h |2 +- drivers/scsi/megaraid/megaraid_sas_base.c | 15 +++ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 3b2365c..16b7a72 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1276,7 +1276,7 @@ struct megasas_evt_detail { } __attribute__ ((packed)); struct megasas_aen_event { - struct delayed_work hotplug_work; + struct work_struct hotplug_work; struct megasas_instance *instance; }; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index e4f2baa..416b280 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2060,9 +2060,8 @@ megasas_service_aen(struct megasas_instance *instance, struct megasas_cmd *cmd) } else { ev->instance = instance; instance->ev = ev; - INIT_DELAYED_WORK(>hotplug_work, - megasas_aen_polling); - schedule_delayed_work(>hotplug_work, 0); + INIT_WORK(>hotplug_work, megasas_aen_polling); + schedule_work(>hotplug_work); } } } @@ -4349,10 +4348,10 @@ megasas_suspend(struct pci_dev *pdev, pm_message_t state) megasas_flush_cache(instance); megasas_shutdown_controller(instance, MR_DCMD_HIBERNATE_SHUTDOWN); - /* cancel the delayed work if this work still in queue */ + /* cancel the work if this work still in queue */ if (instance->ev != NULL) { struct megasas_aen_event *ev = instance->ev; - cancel_delayed_work_sync(>hotplug_work); + cancel_work_sync(>hotplug_work); instance->ev = NULL; } @@ -4541,10 +4540,10 @@ static void __devexit megasas_detach_one(struct pci_dev *pdev) megasas_flush_cache(instance); megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN); - /* cancel the delayed work if this work still in queue*/ + /* cancel the work if this work still in queue*/ if (instance->ev != NULL) { struct megasas_aen_event *ev = instance->ev; - cancel_delayed_work_sync(>hotplug_work); + cancel_work_sync(>hotplug_work); instance->ev = NULL; } @@ -5188,7 +5187,7 @@ static void megasas_aen_polling(struct work_struct *work) { struct megasas_aen_event *ev = - container_of(work, struct megasas_aen_event, hotplug_work.work); + container_of(work, struct megasas_aen_event, hotplug_work); struct megasas_instance *instance = ev->instance; union megasas_evt_class_locale class_locale; struct Scsi_Host *host; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [libata] scsi: fix Null pointer dereference on disk error
Following oops were observed when disk error happened: [ 4272.896937] sd 0:0:0:0: [sda] Unhandled error code [ 4272.896939] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 4272.896942] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 5a de a7 00 00 08 00 [ 4272.896951] end_request: I/O error, dev sda, sector 5955239 [ 4291.574947] BUG: unable to handle kernel NULL pointer dereference at (null) [ 4291.658305] IP: [] ahci_activity_show+0x1/0x40 [ 4291.730090] PGD 76dbbc067 PUD 6c4fba067 PMD 0 [ 4291.783408] Oops: [#1] SMP [ 4291.822100] last sysfs file: /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/sw_activity [ 4291.934235] CPU 9 [ 4291.958301] Pid: 27942, comm: hwinfo .. ata_scsi_find_dev could return NULL, so ata_scsi_activity_{show,store} should check if atadev is NULL. Signed-off-by: Xiaotian Feng Cc: Jeff Garzik Cc: James Bottomley Cc: sta...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/libata-scsi.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index a6df6a3..6407f05 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -309,7 +309,8 @@ ata_scsi_activity_show(struct device *dev, struct device_attribute *attr, struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *atadev = ata_scsi_find_dev(ap, sdev); - if (ap->ops->sw_activity_show && (ap->flags & ATA_FLAG_SW_ACTIVITY)) + if (atadev && ap->ops->sw_activity_show && + (ap->flags & ATA_FLAG_SW_ACTIVITY)) return ap->ops->sw_activity_show(atadev, buf); return -EINVAL; } @@ -324,7 +325,8 @@ ata_scsi_activity_store(struct device *dev, struct device_attribute *attr, enum sw_activity val; int rc; - if (ap->ops->sw_activity_store && (ap->flags & ATA_FLAG_SW_ACTIVITY)) { + if (atadev && ap->ops->sw_activity_store && + (ap->flags & ATA_FLAG_SW_ACTIVITY)) { val = simple_strtoul(buf, NULL, 0); switch (val) { case OFF: case BLINK_ON: case BLINK_OFF: -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [libata] scsi: fix Null pointer dereference on disk error
Following oops were observed when disk error happened: [ 4272.896937] sd 0:0:0:0: [sda] Unhandled error code [ 4272.896939] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 4272.896942] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 5a de a7 00 00 08 00 [ 4272.896951] end_request: I/O error, dev sda, sector 5955239 [ 4291.574947] BUG: unable to handle kernel NULL pointer dereference at (null) [ 4291.658305] IP: [] ahci_activity_show+0x1/0x40 [ 4291.730090] PGD 76dbbc067 PUD 6c4fba067 PMD 0 [ 4291.783408] Oops: [#1] SMP [ 4291.822100] last sysfs file: /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/sw_activity [ 4291.934235] CPU 9 [ 4291.958301] Pid: 27942, comm: hwinfo .. ata_scsi_find_dev could return NULL, so ata_scsi_activity_{show,store} should check if atadev is NULL. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Jeff Garzik jgar...@redhat.com Cc: James Bottomley jbottom...@parallels.com Cc: sta...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/ata/libata-scsi.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index a6df6a3..6407f05 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -309,7 +309,8 @@ ata_scsi_activity_show(struct device *dev, struct device_attribute *attr, struct ata_port *ap = ata_shost_to_port(sdev-host); struct ata_device *atadev = ata_scsi_find_dev(ap, sdev); - if (ap-ops-sw_activity_show (ap-flags ATA_FLAG_SW_ACTIVITY)) + if (atadev ap-ops-sw_activity_show + (ap-flags ATA_FLAG_SW_ACTIVITY)) return ap-ops-sw_activity_show(atadev, buf); return -EINVAL; } @@ -324,7 +325,8 @@ ata_scsi_activity_store(struct device *dev, struct device_attribute *attr, enum sw_activity val; int rc; - if (ap-ops-sw_activity_store (ap-flags ATA_FLAG_SW_ACTIVITY)) { + if (atadev ap-ops-sw_activity_store + (ap-flags ATA_FLAG_SW_ACTIVITY)) { val = simple_strtoul(buf, NULL, 0); switch (val) { case OFF: case BLINK_ON: case BLINK_OFF: -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] megaraid: convert to work_struct
There's no need to use delayed work, convert to use work_struct and cancel_work_sync(). Requested-by: Tejun Heo t...@kernel.org Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Neela Syam Kolli megaraidli...@lsi.com Cc: James E.J. Bottomley jbottom...@parallels.com Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/scsi/megaraid/megaraid_sas.h |2 +- drivers/scsi/megaraid/megaraid_sas_base.c | 15 +++ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 3b2365c..16b7a72 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1276,7 +1276,7 @@ struct megasas_evt_detail { } __attribute__ ((packed)); struct megasas_aen_event { - struct delayed_work hotplug_work; + struct work_struct hotplug_work; struct megasas_instance *instance; }; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index e4f2baa..416b280 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2060,9 +2060,8 @@ megasas_service_aen(struct megasas_instance *instance, struct megasas_cmd *cmd) } else { ev-instance = instance; instance-ev = ev; - INIT_DELAYED_WORK(ev-hotplug_work, - megasas_aen_polling); - schedule_delayed_work(ev-hotplug_work, 0); + INIT_WORK(ev-hotplug_work, megasas_aen_polling); + schedule_work(ev-hotplug_work); } } } @@ -4349,10 +4348,10 @@ megasas_suspend(struct pci_dev *pdev, pm_message_t state) megasas_flush_cache(instance); megasas_shutdown_controller(instance, MR_DCMD_HIBERNATE_SHUTDOWN); - /* cancel the delayed work if this work still in queue */ + /* cancel the work if this work still in queue */ if (instance-ev != NULL) { struct megasas_aen_event *ev = instance-ev; - cancel_delayed_work_sync(ev-hotplug_work); + cancel_work_sync(ev-hotplug_work); instance-ev = NULL; } @@ -4541,10 +4540,10 @@ static void __devexit megasas_detach_one(struct pci_dev *pdev) megasas_flush_cache(instance); megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN); - /* cancel the delayed work if this work still in queue*/ + /* cancel the work if this work still in queue*/ if (instance-ev != NULL) { struct megasas_aen_event *ev = instance-ev; - cancel_delayed_work_sync(ev-hotplug_work); + cancel_work_sync(ev-hotplug_work); instance-ev = NULL; } @@ -5188,7 +5187,7 @@ static void megasas_aen_polling(struct work_struct *work) { struct megasas_aen_event *ev = - container_of(work, struct megasas_aen_event, hotplug_work.work); + container_of(work, struct megasas_aen_event, hotplug_work); struct megasas_instance *instance = ev-instance; union megasas_evt_class_locale class_locale; struct Scsi_Host *host; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] megaraid: fix use of delayed work
megaraid use INIT_WORK to declare a hotplug_work, but cast the hotplug_work from work_struct to delayed_work and schedule_delayed_work on it. This is very dangerous, as other part of delayed_work might be kernel memories allocated by others. With commit 8852aac, schedule_delayed_work() will check dwork->timer before queue_work, this will cause megaraid code to hit the BUG_ON in workqueue code. Change megaraid code to use delayed work. Signed-off-by: Xiaotian Feng Cc: Tejun Heo Cc: Neela Syam Kolli Cc: "James E.J. Bottomley" Cc: linux-s...@vger.kernel.org --- drivers/scsi/megaraid/megaraid_sas.h |2 +- drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++ 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 16b7a72..3b2365c 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1276,7 +1276,7 @@ struct megasas_evt_detail { } __attribute__ ((packed)); struct megasas_aen_event { - struct work_struct hotplug_work; + struct delayed_work hotplug_work; struct megasas_instance *instance; }; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index d2c5366..e4f2baa 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2060,9 +2060,9 @@ megasas_service_aen(struct megasas_instance *instance, struct megasas_cmd *cmd) } else { ev->instance = instance; instance->ev = ev; - INIT_WORK(>hotplug_work, megasas_aen_polling); - schedule_delayed_work( - (struct delayed_work *)>hotplug_work, 0); + INIT_DELAYED_WORK(>hotplug_work, + megasas_aen_polling); + schedule_delayed_work(>hotplug_work, 0); } } } @@ -4352,8 +4352,7 @@ megasas_suspend(struct pci_dev *pdev, pm_message_t state) /* cancel the delayed work if this work still in queue */ if (instance->ev != NULL) { struct megasas_aen_event *ev = instance->ev; - cancel_delayed_work_sync( - (struct delayed_work *)>hotplug_work); + cancel_delayed_work_sync(>hotplug_work); instance->ev = NULL; } @@ -4545,8 +4544,7 @@ static void __devexit megasas_detach_one(struct pci_dev *pdev) /* cancel the delayed work if this work still in queue*/ if (instance->ev != NULL) { struct megasas_aen_event *ev = instance->ev; - cancel_delayed_work_sync( - (struct delayed_work *)>hotplug_work); + cancel_delayed_work_sync(>hotplug_work); instance->ev = NULL; } @@ -5190,7 +5188,7 @@ static void megasas_aen_polling(struct work_struct *work) { struct megasas_aen_event *ev = - container_of(work, struct megasas_aen_event, hotplug_work); + container_of(work, struct megasas_aen_event, hotplug_work.work); struct megasas_instance *instance = ev->instance; union megasas_evt_class_locale class_locale; struct Scsi_Host *host; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] megaraid: fix use of delayed work
megaraid use INIT_WORK to declare a hotplug_work, but cast the hotplug_work from work_struct to delayed_work and schedule_delayed_work on it. This is very dangerous, as other part of delayed_work might be kernel memories allocated by others. With commit 8852aac, schedule_delayed_work() will check dwork-timer before queue_work, this will cause megaraid code to hit the BUG_ON in workqueue code. Change megaraid code to use delayed work. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Tejun Heo t...@kernel.org Cc: Neela Syam Kolli megaraidli...@lsi.com Cc: James E.J. Bottomley jbottom...@parallels.com Cc: linux-s...@vger.kernel.org --- drivers/scsi/megaraid/megaraid_sas.h |2 +- drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++ 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 16b7a72..3b2365c 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1276,7 +1276,7 @@ struct megasas_evt_detail { } __attribute__ ((packed)); struct megasas_aen_event { - struct work_struct hotplug_work; + struct delayed_work hotplug_work; struct megasas_instance *instance; }; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index d2c5366..e4f2baa 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2060,9 +2060,9 @@ megasas_service_aen(struct megasas_instance *instance, struct megasas_cmd *cmd) } else { ev-instance = instance; instance-ev = ev; - INIT_WORK(ev-hotplug_work, megasas_aen_polling); - schedule_delayed_work( - (struct delayed_work *)ev-hotplug_work, 0); + INIT_DELAYED_WORK(ev-hotplug_work, + megasas_aen_polling); + schedule_delayed_work(ev-hotplug_work, 0); } } } @@ -4352,8 +4352,7 @@ megasas_suspend(struct pci_dev *pdev, pm_message_t state) /* cancel the delayed work if this work still in queue */ if (instance-ev != NULL) { struct megasas_aen_event *ev = instance-ev; - cancel_delayed_work_sync( - (struct delayed_work *)ev-hotplug_work); + cancel_delayed_work_sync(ev-hotplug_work); instance-ev = NULL; } @@ -4545,8 +4544,7 @@ static void __devexit megasas_detach_one(struct pci_dev *pdev) /* cancel the delayed work if this work still in queue*/ if (instance-ev != NULL) { struct megasas_aen_event *ev = instance-ev; - cancel_delayed_work_sync( - (struct delayed_work *)ev-hotplug_work); + cancel_delayed_work_sync(ev-hotplug_work); instance-ev = NULL; } @@ -5190,7 +5188,7 @@ static void megasas_aen_polling(struct work_struct *work) { struct megasas_aen_event *ev = - container_of(work, struct megasas_aen_event, hotplug_work); + container_of(work, struct megasas_aen_event, hotplug_work.work); struct megasas_instance *instance = ev-instance; union megasas_evt_class_locale class_locale; struct Scsi_Host *host; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the akpm tree with Linus' tree
On Mon, Nov 26, 2012 at 8:48 PM, Stephen Rothwell wrote: > Hi Andrew, > > Today's linux-next merge of the akpm tree got a conflict in > drivers/net/ethernet/jme.c between commit 71c6c837a0fe ("drivers/net: fix > tasklet misuse issue") from Linus' tree and commit "tasklet: ignore > disabled tasklet in tasklet_action()" from the akpm tree. > You can simply remove the following part of the patch @@ -1862,8 +1862,8 @@ jme_open(struct net_device *netdev) tasklet_enable(>linkch_task); tasklet_enable(>txclean_task); - tasklet_hi_enable(>rxclean_task); - tasklet_hi_enable(>rxempty_task); + tasklet_enable(>rxclean_task); + tasklet_enable(>rxempty_task); rc = jme_request_irq(jme); if (rc) Do you want me to re-generate a patch for you? > I am not sure what to do here, so I have dropped the akpm patch. > > -- > Cheers, > Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the akpm tree with Linus' tree
On Mon, Nov 26, 2012 at 8:48 PM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi Andrew, Today's linux-next merge of the akpm tree got a conflict in drivers/net/ethernet/jme.c between commit 71c6c837a0fe (drivers/net: fix tasklet misuse issue) from Linus' tree and commit tasklet: ignore disabled tasklet in tasklet_action() from the akpm tree. You can simply remove the following part of the patch @@ -1862,8 +1862,8 @@ jme_open(struct net_device *netdev) tasklet_enable(jme-linkch_task); tasklet_enable(jme-txclean_task); - tasklet_hi_enable(jme-rxclean_task); - tasklet_hi_enable(jme-rxempty_task); + tasklet_enable(jme-rxclean_task); + tasklet_enable(jme-rxempty_task); rc = jme_request_irq(jme); if (rc) Do you want me to re-generate a patch for you? I am not sure what to do here, so I have dropped the akpm patch. -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/net: fix tasklet misuse issue
In commit 175c0dff, drivers uses tasklet_kill to avoid put disabled tasklet on the tasklet vec. But some of the drivers uses tasklet_init & tasklet_disable in the driver init code, then tasklet_enable when it is opened. This makes tasklet_enable on a killed tasklet and make ksoftirqd crazy then. Normally, drivers should use tasklet_init/tasklet_kill on device open/remove, and use tasklet_disable/tasklet_enable on device suspend/resume. Reported-by: Peter Wu Tested-by: Peter Wu Signed-off-by: Xiaotian Feng Cc: "David S. Miller" Cc: net...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/net/ethernet/jme.c| 28 ++--- drivers/net/ethernet/micrel/ksz884x.c | 16 +++- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 12 - 3 files changed, 18 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c index 92317e9..c0314c1 100644 --- a/drivers/net/ethernet/jme.c +++ b/drivers/net/ethernet/jme.c @@ -1860,10 +1860,14 @@ jme_open(struct net_device *netdev) jme_clear_pm(jme); JME_NAPI_ENABLE(jme); - tasklet_enable(>linkch_task); - tasklet_enable(>txclean_task); - tasklet_hi_enable(>rxclean_task); - tasklet_hi_enable(>rxempty_task); + tasklet_init(>linkch_task, jme_link_change_tasklet, +(unsigned long) jme); + tasklet_init(>txclean_task, jme_tx_clean_tasklet, +(unsigned long) jme); + tasklet_init(>rxclean_task, jme_rx_clean_tasklet, +(unsigned long) jme); + tasklet_init(>rxempty_task, jme_rx_empty_tasklet, +(unsigned long) jme); rc = jme_request_irq(jme); if (rc) @@ -3079,22 +3083,6 @@ jme_init_one(struct pci_dev *pdev, tasklet_init(>pcc_task, jme_pcc_tasklet, (unsigned long) jme); - tasklet_init(>linkch_task, -jme_link_change_tasklet, -(unsigned long) jme); - tasklet_init(>txclean_task, -jme_tx_clean_tasklet, -(unsigned long) jme); - tasklet_init(>rxclean_task, -jme_rx_clean_tasklet, -(unsigned long) jme); - tasklet_init(>rxempty_task, -jme_rx_empty_tasklet, -(unsigned long) jme); - tasklet_disable_nosync(>linkch_task); - tasklet_disable_nosync(>txclean_task); - tasklet_disable_nosync(>rxclean_task); - tasklet_disable_nosync(>rxempty_task); jme->dpi.cur = PCC_P1; jme->reg_ghc = 0; diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c index e558edd..b66b285 100644 --- a/drivers/net/ethernet/micrel/ksz884x.c +++ b/drivers/net/ethernet/micrel/ksz884x.c @@ -5459,8 +5459,10 @@ static int prepare_hardware(struct net_device *dev) rc = request_irq(dev->irq, netdev_intr, IRQF_SHARED, dev->name, dev); if (rc) return rc; - tasklet_enable(_priv->rx_tasklet); - tasklet_enable(_priv->tx_tasklet); + tasklet_init(_priv->rx_tasklet, rx_proc_task, +(unsigned long) hw_priv); + tasklet_init(_priv->tx_tasklet, tx_proc_task, +(unsigned long) hw_priv); hw->promiscuous = 0; hw->all_multi = 0; @@ -7033,16 +7035,6 @@ static int __devinit pcidev_init(struct pci_dev *pdev, spin_lock_init(_priv->hwlock); mutex_init(_priv->lock); - /* tasklet is enabled. */ - tasklet_init(_priv->rx_tasklet, rx_proc_task, - (unsigned long) hw_priv); - tasklet_init(_priv->tx_tasklet, tx_proc_task, - (unsigned long) hw_priv); - - /* tasklet_enable will decrement the atomic counter. */ - tasklet_disable(_priv->rx_tasklet); - tasklet_disable(_priv->tx_tasklet); - for (i = 0; i < TOTAL_PORT_NUM; i++) init_waitqueue_head(_priv->counter[i].counter); diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 1d04754..7740f4b 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -942,6 +942,10 @@ static int axienet_open(struct net_device *ndev) phy_start(lp->phy_dev); } + /* Enable tasklets for Axi DMA error handling */ + tasklet_init(>dma_err_tasklet, axienet_dma_err_handler, +(unsigned long) lp); + /* Enable interrupts for Axi DMA Tx */ ret = request_irq(lp->tx_irq, axienet_tx_irq, 0, ndev->name, ndev); if (ret) @@ -950,8 +954,7 @@ static int axienet_open(struct net_device *ndev) ret = request_irq(lp-
[PATCH] drivers/net: fix tasklet misuse issue
In commit 175c0dff, drivers uses tasklet_kill to avoid put disabled tasklet on the tasklet vec. But some of the drivers uses tasklet_init tasklet_disable in the driver init code, then tasklet_enable when it is opened. This makes tasklet_enable on a killed tasklet and make ksoftirqd crazy then. Normally, drivers should use tasklet_init/tasklet_kill on device open/remove, and use tasklet_disable/tasklet_enable on device suspend/resume. Reported-by: Peter Wu lekenst...@gmail.com Tested-by: Peter Wu lekenst...@gmail.com Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/net/ethernet/jme.c| 28 ++--- drivers/net/ethernet/micrel/ksz884x.c | 16 +++- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 12 - 3 files changed, 18 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c index 92317e9..c0314c1 100644 --- a/drivers/net/ethernet/jme.c +++ b/drivers/net/ethernet/jme.c @@ -1860,10 +1860,14 @@ jme_open(struct net_device *netdev) jme_clear_pm(jme); JME_NAPI_ENABLE(jme); - tasklet_enable(jme-linkch_task); - tasklet_enable(jme-txclean_task); - tasklet_hi_enable(jme-rxclean_task); - tasklet_hi_enable(jme-rxempty_task); + tasklet_init(jme-linkch_task, jme_link_change_tasklet, +(unsigned long) jme); + tasklet_init(jme-txclean_task, jme_tx_clean_tasklet, +(unsigned long) jme); + tasklet_init(jme-rxclean_task, jme_rx_clean_tasklet, +(unsigned long) jme); + tasklet_init(jme-rxempty_task, jme_rx_empty_tasklet, +(unsigned long) jme); rc = jme_request_irq(jme); if (rc) @@ -3079,22 +3083,6 @@ jme_init_one(struct pci_dev *pdev, tasklet_init(jme-pcc_task, jme_pcc_tasklet, (unsigned long) jme); - tasklet_init(jme-linkch_task, -jme_link_change_tasklet, -(unsigned long) jme); - tasklet_init(jme-txclean_task, -jme_tx_clean_tasklet, -(unsigned long) jme); - tasklet_init(jme-rxclean_task, -jme_rx_clean_tasklet, -(unsigned long) jme); - tasklet_init(jme-rxempty_task, -jme_rx_empty_tasklet, -(unsigned long) jme); - tasklet_disable_nosync(jme-linkch_task); - tasklet_disable_nosync(jme-txclean_task); - tasklet_disable_nosync(jme-rxclean_task); - tasklet_disable_nosync(jme-rxempty_task); jme-dpi.cur = PCC_P1; jme-reg_ghc = 0; diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c index e558edd..b66b285 100644 --- a/drivers/net/ethernet/micrel/ksz884x.c +++ b/drivers/net/ethernet/micrel/ksz884x.c @@ -5459,8 +5459,10 @@ static int prepare_hardware(struct net_device *dev) rc = request_irq(dev-irq, netdev_intr, IRQF_SHARED, dev-name, dev); if (rc) return rc; - tasklet_enable(hw_priv-rx_tasklet); - tasklet_enable(hw_priv-tx_tasklet); + tasklet_init(hw_priv-rx_tasklet, rx_proc_task, +(unsigned long) hw_priv); + tasklet_init(hw_priv-tx_tasklet, tx_proc_task, +(unsigned long) hw_priv); hw-promiscuous = 0; hw-all_multi = 0; @@ -7033,16 +7035,6 @@ static int __devinit pcidev_init(struct pci_dev *pdev, spin_lock_init(hw_priv-hwlock); mutex_init(hw_priv-lock); - /* tasklet is enabled. */ - tasklet_init(hw_priv-rx_tasklet, rx_proc_task, - (unsigned long) hw_priv); - tasklet_init(hw_priv-tx_tasklet, tx_proc_task, - (unsigned long) hw_priv); - - /* tasklet_enable will decrement the atomic counter. */ - tasklet_disable(hw_priv-rx_tasklet); - tasklet_disable(hw_priv-tx_tasklet); - for (i = 0; i TOTAL_PORT_NUM; i++) init_waitqueue_head(hw_priv-counter[i].counter); diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 1d04754..7740f4b 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -942,6 +942,10 @@ static int axienet_open(struct net_device *ndev) phy_start(lp-phy_dev); } + /* Enable tasklets for Axi DMA error handling */ + tasklet_init(lp-dma_err_tasklet, axienet_dma_err_handler, +(unsigned long) lp); + /* Enable interrupts for Axi DMA Tx */ ret = request_irq(lp-tx_irq, axienet_tx_irq, 0, ndev-name, ndev); if (ret) @@ -950,8 +954,7 @@ static int axienet_open(struct net_device *ndev) ret = request_irq(lp
Re: [RFC PATCH] tasklet: ignore disabled tasklet in tasklet_action
On Tue, Nov 6, 2012 at 6:52 AM, Andrew Morton wrote: > On Fri, 2 Nov 2012 10:48:54 +0800 > Xiaotian Feng wrote: > >> We met a ksoftirqd 100% issue, the perf top shows kernel is busy >> with tasklet_action(), but no actual action is shown. From dumped >> kernel, there's only one disabled tasklet on the tasklet_vec. >> >> tasklet_action might be handled after tasklet is disabled, this will >> make disabled tasklet stayed on tasklet_vec. tasklet_action will not >> handle disabled tasklet, but place it on the tail of tasklet_vec, >> still raise softirq for this tasklet. Things will become worse if >> device driver uses tasklet_disable on its device remove/close code. >> The disabled tasklet will stay on the vec, frequently __raise_softirq_off() >> and make ksoftirqd wakeup even if no tasklets need to be handled. >> >> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ, >> in tasklet_action(), simply ignore the disabled tasklet and don't raise >> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since >> it is the same as tasklet_enable(). So only tasklet_enable() needs to be >> modified, if tasklet state is changed from disable to enable, use >> __tasklet_schedule() to put it on the right vec. > > gee, I haven't looked at the tasklet code in 100 years. I think I'll > send this in Thomas's direction ;) > > The race description seems real and the patch looks sane to me. Are > you sure we can get away with never clearing TASKLET_STATE_HI? For > example, what would happen if code does a tasklet_hi_schedule(t) and > later does a tasklet_schedule(t)? hmm, that will be a nightmare... tasklet_schedule(t)/tasklet_hi_schedule(t) doesn't use list_head, they simply make t->next = NULL, then put t on the tail of tasklet_vec/tasklet_hi_vec. If the code does a tasklet_hi_schedule() and then a tasklet_schedule, the tasklet will stay on tasklet_vec and tasklet_hi_vec tasklet_hi_action will handle it first and clear the TASKLET_SCHED_SCHED bit, later, in tasklet_action, it will be handled again and hit a BUG_ON ... But if code does a tasklet_hi_schedule(), then tasklet_kil and later does a tasklet_schedule(), we do need clear the TASKLET_STATE_HI. Also we need to remove the tasklet_hi_enable() as it is the same as tasklet_enable() and there's only one user.. I'll send you V2 patch soon, thanks. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] tasklet: ignore disabled tasklet in tasklet_action
On Tue, Nov 6, 2012 at 6:52 AM, Andrew Morton a...@linux-foundation.org wrote: On Fri, 2 Nov 2012 10:48:54 +0800 Xiaotian Feng xtf...@gmail.com wrote: We met a ksoftirqd 100% issue, the perf top shows kernel is busy with tasklet_action(), but no actual action is shown. From dumped kernel, there's only one disabled tasklet on the tasklet_vec. tasklet_action might be handled after tasklet is disabled, this will make disabled tasklet stayed on tasklet_vec. tasklet_action will not handle disabled tasklet, but place it on the tail of tasklet_vec, still raise softirq for this tasklet. Things will become worse if device driver uses tasklet_disable on its device remove/close code. The disabled tasklet will stay on the vec, frequently __raise_softirq_off() and make ksoftirqd wakeup even if no tasklets need to be handled. This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ, in tasklet_action(), simply ignore the disabled tasklet and don't raise the softirq nr. In my previous patch, I remove tasklet_hi_enable() since it is the same as tasklet_enable(). So only tasklet_enable() needs to be modified, if tasklet state is changed from disable to enable, use __tasklet_schedule() to put it on the right vec. gee, I haven't looked at the tasklet code in 100 years. I think I'll send this in Thomas's direction ;) The race description seems real and the patch looks sane to me. Are you sure we can get away with never clearing TASKLET_STATE_HI? For example, what would happen if code does a tasklet_hi_schedule(t) and later does a tasklet_schedule(t)? hmm, that will be a nightmare... tasklet_schedule(t)/tasklet_hi_schedule(t) doesn't use list_head, they simply make t-next = NULL, then put t on the tail of tasklet_vec/tasklet_hi_vec. If the code does a tasklet_hi_schedule() and then a tasklet_schedule, the tasklet will stay on tasklet_vec and tasklet_hi_vec tasklet_hi_action will handle it first and clear the TASKLET_SCHED_SCHED bit, later, in tasklet_action, it will be handled again and hit a BUG_ON ... But if code does a tasklet_hi_schedule(), then tasklet_kil and later does a tasklet_schedule(), we do need clear the TASKLET_STATE_HI. Also we need to remove the tasklet_hi_enable() as it is the same as tasklet_enable() and there's only one user.. I'll send you V2 patch soon, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] tasklet: ignore disabled tasklet in tasklet_action
We met a ksoftirqd 100% issue, the perf top shows kernel is busy with tasklet_action(), but no actual action is shown. From dumped kernel, there's only one disabled tasklet on the tasklet_vec. tasklet_action might be handled after tasklet is disabled, this will make disabled tasklet stayed on tasklet_vec. tasklet_action will not handle disabled tasklet, but place it on the tail of tasklet_vec, still raise softirq for this tasklet. Things will become worse if device driver uses tasklet_disable on its device remove/close code. The disabled tasklet will stay on the vec, frequently __raise_softirq_off() and make ksoftirqd wakeup even if no tasklets need to be handled. This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ, in tasklet_action(), simply ignore the disabled tasklet and don't raise the softirq nr. In my previous patch, I remove tasklet_hi_enable() since it is the same as tasklet_enable(). So only tasklet_enable() needs to be modified, if tasklet state is changed from disable to enable, use __tasklet_schedule() to put it on the right vec. Signed-off-by: Xiaotian Feng Cc: Andrew Morton --- include/linux/interrupt.h | 12 ++-- kernel/softirq.c | 10 +- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 5e4e617..7e5bb00 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -521,7 +521,8 @@ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } enum { TASKLET_STATE_SCHED,/* Tasklet is scheduled for execution */ - TASKLET_STATE_RUN /* Tasklet is running (SMP only) */ + TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */ + TASKLET_STATE_HI/* Tasklet is HI_SOFTIRQ */ }; #ifdef CONFIG_SMP @@ -593,7 +594,14 @@ static inline void tasklet_disable(struct tasklet_struct *t) static inline void tasklet_enable(struct tasklet_struct *t) { smp_mb__before_atomic_dec(); - atomic_dec(>count); + if (atomic_dec_and_test(>count)) { + if (!test_bit(TASKLET_STATE_SCHED, >state)) + return; + if (test_bit(TASKLET_STATE_HI, >state)) + __tasklet_hi_schedule(t); + else + __tasklet_schedule(t); + } } static inline void tasklet_hi_enable(struct tasklet_struct *t) diff --git a/kernel/softirq.c b/kernel/softirq.c index cc96bdc..6d77aef 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -431,6 +431,7 @@ void __tasklet_hi_schedule(struct tasklet_struct *t) *__this_cpu_read(tasklet_hi_vec.tail) = t; __this_cpu_write(tasklet_hi_vec.tail, &(t->next)); raise_softirq_irqoff(HI_SOFTIRQ); + set_bit(TASKLET_STATE_HI, >state); local_irq_restore(flags); } @@ -442,6 +443,7 @@ void __tasklet_hi_schedule_first(struct tasklet_struct *t) t->next = __this_cpu_read(tasklet_hi_vec.head); __this_cpu_write(tasklet_hi_vec.head, t); + set_bit(TASKLET_STATE_HI, >state); __raise_softirq_irqoff(HI_SOFTIRQ); } @@ -467,10 +469,9 @@ static void tasklet_action(struct softirq_action *a) if (!test_and_clear_bit(TASKLET_STATE_SCHED, >state)) BUG(); t->func(t->data); - tasklet_unlock(t); - continue; - } + } tasklet_unlock(t); + continue; } local_irq_disable(); @@ -502,10 +503,9 @@ static void tasklet_hi_action(struct softirq_action *a) if (!test_and_clear_bit(TASKLET_STATE_SCHED, >state)) BUG(); t->func(t->data); - tasklet_unlock(t); - continue; } tasklet_unlock(t); + continue; } local_irq_disable(); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swapfile: fix name leak in swapoff
there's a name leak introduced by commit 91a27b2, add the missing putname. Signed-off-by: Xiaotian Feng Cc: Jeff Layton Cc: Al Viro --- mm/swapfile.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 71cd288..459fe30 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1608,6 +1608,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) out_dput: filp_close(victim, NULL); out: + if (pathname && !IS_ERR(pathname)) + putname(pathname); return err; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swapfile: fix name leak in swapoff
there's a name leak introduced by commit 91a27b2, add the missing putname. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Jeff Layton jlay...@redhat.com Cc: Al Viro v...@zeniv.linux.org.uk --- mm/swapfile.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 71cd288..459fe30 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1608,6 +1608,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) out_dput: filp_close(victim, NULL); out: + if (pathname !IS_ERR(pathname)) + putname(pathname); return err; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] tasklet: ignore disabled tasklet in tasklet_action
We met a ksoftirqd 100% issue, the perf top shows kernel is busy with tasklet_action(), but no actual action is shown. From dumped kernel, there's only one disabled tasklet on the tasklet_vec. tasklet_action might be handled after tasklet is disabled, this will make disabled tasklet stayed on tasklet_vec. tasklet_action will not handle disabled tasklet, but place it on the tail of tasklet_vec, still raise softirq for this tasklet. Things will become worse if device driver uses tasklet_disable on its device remove/close code. The disabled tasklet will stay on the vec, frequently __raise_softirq_off() and make ksoftirqd wakeup even if no tasklets need to be handled. This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ, in tasklet_action(), simply ignore the disabled tasklet and don't raise the softirq nr. In my previous patch, I remove tasklet_hi_enable() since it is the same as tasklet_enable(). So only tasklet_enable() needs to be modified, if tasklet state is changed from disable to enable, use __tasklet_schedule() to put it on the right vec. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Andrew Morton a...@linux-foundation.org --- include/linux/interrupt.h | 12 ++-- kernel/softirq.c | 10 +- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 5e4e617..7e5bb00 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -521,7 +521,8 @@ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } enum { TASKLET_STATE_SCHED,/* Tasklet is scheduled for execution */ - TASKLET_STATE_RUN /* Tasklet is running (SMP only) */ + TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */ + TASKLET_STATE_HI/* Tasklet is HI_SOFTIRQ */ }; #ifdef CONFIG_SMP @@ -593,7 +594,14 @@ static inline void tasklet_disable(struct tasklet_struct *t) static inline void tasklet_enable(struct tasklet_struct *t) { smp_mb__before_atomic_dec(); - atomic_dec(t-count); + if (atomic_dec_and_test(t-count)) { + if (!test_bit(TASKLET_STATE_SCHED, t-state)) + return; + if (test_bit(TASKLET_STATE_HI, t-state)) + __tasklet_hi_schedule(t); + else + __tasklet_schedule(t); + } } static inline void tasklet_hi_enable(struct tasklet_struct *t) diff --git a/kernel/softirq.c b/kernel/softirq.c index cc96bdc..6d77aef 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -431,6 +431,7 @@ void __tasklet_hi_schedule(struct tasklet_struct *t) *__this_cpu_read(tasklet_hi_vec.tail) = t; __this_cpu_write(tasklet_hi_vec.tail, (t-next)); raise_softirq_irqoff(HI_SOFTIRQ); + set_bit(TASKLET_STATE_HI, t-state); local_irq_restore(flags); } @@ -442,6 +443,7 @@ void __tasklet_hi_schedule_first(struct tasklet_struct *t) t-next = __this_cpu_read(tasklet_hi_vec.head); __this_cpu_write(tasklet_hi_vec.head, t); + set_bit(TASKLET_STATE_HI, t-state); __raise_softirq_irqoff(HI_SOFTIRQ); } @@ -467,10 +469,9 @@ static void tasklet_action(struct softirq_action *a) if (!test_and_clear_bit(TASKLET_STATE_SCHED, t-state)) BUG(); t-func(t-data); - tasklet_unlock(t); - continue; - } + } tasklet_unlock(t); + continue; } local_irq_disable(); @@ -502,10 +503,9 @@ static void tasklet_hi_action(struct softirq_action *a) if (!test_and_clear_bit(TASKLET_STATE_SCHED, t-state)) BUG(); t-func(t-data); - tasklet_unlock(t); - continue; } tasklet_unlock(t); + continue; } local_irq_disable(); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: ioat: use tasklet_kill in device remove/release process
On Wed, Oct 31, 2012 at 6:41 PM, Xiaotian Feng wrote: > Some driver uses tasklet_disable in device remove/release process, > tasklet_disable will inc tasklet->count and return. If the tasklet > is not handled yet under some softirq pressure, the tasklet will be > placed on the tasklet_vec, never have a chance to be excuted. This might > lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but > tasklet is disabled. tasklet_kill should be used in this case. > > Signed-off-by: Xiaotian Feng > Cc: Vinod Koul > Cc: Dan Williams Please ignore this one, I made a mistake here. > --- > drivers/dma/ioat/dma.c|2 +- > drivers/dma/ioat/dma_v2.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c > index 73b2b65..8bec4a2 100644 > --- a/drivers/dma/ioat/dma.c > +++ b/drivers/dma/ioat/dma.c > @@ -379,7 +379,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan > *c) > if (ioat->desccount == 0) > return; > > - tasklet_disable(>cleanup_task); > + tasklet_kill(>cleanup_task); > del_timer_sync(>timer); > ioat1_cleanup(ioat); > > diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c > index b9d6678..df3ccb0 100644 > --- a/drivers/dma/ioat/dma_v2.c > +++ b/drivers/dma/ioat/dma_v2.c > @@ -794,7 +794,7 @@ void ioat2_free_chan_resources(struct dma_chan *c) > if (!ioat->ring) > return; > > - tasklet_disable(>cleanup_task); > + tasklet_kill(>cleanup_task); > del_timer_sync(>timer); > device->cleanup_fn((unsigned long) c); > device->reset_hw(chan); > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging: ozwpan: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Rupesh Gujare Cc: Chris Kelly Cc: Greg Kroah-Hartman Cc: de...@driverdev.osuosl.org --- drivers/staging/ozwpan/ozhcd.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/ozwpan/ozhcd.c b/drivers/staging/ozwpan/ozhcd.c index 2e087ac..33c0009 100644 --- a/drivers/staging/ozwpan/ozhcd.c +++ b/drivers/staging/ozwpan/ozhcd.c @@ -2304,8 +2304,8 @@ error: */ void oz_hcd_term(void) { - tasklet_disable(_urb_process_tasklet); - tasklet_disable(_urb_cancel_tasklet); + tasklet_kill(_urb_process_tasklet); + tasklet_kill(_urb_cancel_tasklet); platform_device_unregister(g_plat_dev); platform_driver_unregister(_oz_plat_drv); oz_trace("Pending urbs:%d\n", atomic_read(_pending_urbs)); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rapidio/tsi721: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Matt Porter Cc: Alexandre Bounine Cc: Andrew Morton --- drivers/rapidio/devices/tsi721_dma.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c index 92e06a5..a098fbc 100644 --- a/drivers/rapidio/devices/tsi721_dma.c +++ b/drivers/rapidio/devices/tsi721_dma.c @@ -589,7 +589,7 @@ static void tsi721_free_chan_resources(struct dma_chan *dchan) BUG_ON(!list_empty(_chan->active_list)); BUG_ON(!list_empty(_chan->queue)); - tasklet_disable(_chan->tasklet); + tasklet_kill(_chan->tasklet); spin_lock_bh(_chan->lock); list_splice_init(_chan->free_list, ); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mmc: s3cmci: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Ben Dooks Cc: Chris Ball Cc: linux-arm-ker...@lists.infradead.org Cc: linux-...@vger.kernel.org --- drivers/mmc/host/s3cmci.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c index 4638dda..d70d0cc 100644 --- a/drivers/mmc/host/s3cmci.c +++ b/drivers/mmc/host/s3cmci.c @@ -1830,7 +1830,7 @@ static int __devexit s3cmci_remove(struct platform_device *pdev) clk_put(host->clk); - tasklet_disable(>pio_tasklet); + tasklet_kill(>pio_tasklet); if (s3cmci_host_usedma(host)) s3c2410_dma_free(host->dma, _dma_client); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] isdn/gigaset: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Hansjoerg Lipp Cc: Tilman Schmidt Cc: Karsten Keil Cc: gigaset307x-com...@lists.sourceforge.net --- drivers/isdn/gigaset/interface.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/isdn/gigaset/interface.c b/drivers/isdn/gigaset/interface.c index 67abf3f..284c7f3 100644 --- a/drivers/isdn/gigaset/interface.c +++ b/drivers/isdn/gigaset/interface.c @@ -544,7 +544,6 @@ void gigaset_if_free(struct cardstate *cs) if (!drv->have_tty) return; - tasklet_disable(>if_wake_tasklet); tasklet_kill(>if_wake_tasklet); cs->tty_dev = NULL; tty_unregister_device(drv->tty, cs->minor_index); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dma: ioat: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Vinod Koul Cc: Dan Williams --- drivers/dma/ioat/dma.c|2 +- drivers/dma/ioat/dma_v2.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index 73b2b65..8bec4a2 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -379,7 +379,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan *c) if (ioat->desccount == 0) return; - tasklet_disable(>cleanup_task); + tasklet_kill(>cleanup_task); del_timer_sync(>timer); ioat1_cleanup(ioat); diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c index b9d6678..df3ccb0 100644 --- a/drivers/dma/ioat/dma_v2.c +++ b/drivers/dma/ioat/dma_v2.c @@ -794,7 +794,7 @@ void ioat2_free_chan_resources(struct dma_chan *c) if (!ioat->ring) return; - tasklet_disable(>cleanup_task); + tasklet_kill(>cleanup_task); del_timer_sync(>timer); device->cleanup_fn((unsigned long) c); device->reset_hw(chan); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] atm: he: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: Chas Williams Cc: linux-atm-gene...@lists.sourceforge.net --- drivers/atm/he.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/atm/he.c b/drivers/atm/he.c index b182c2f..1dfcc9a 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -1556,7 +1556,7 @@ he_stop(struct he_dev *he_dev) gen_cntl_0 &= ~(INT_PROC_ENBL | INIT_ENB); pci_write_config_dword(pci_dev, GEN_CNTL_0, gen_cntl_0); - tasklet_disable(_dev->tasklet); + tasklet_kill(_dev->tasklet); /* disable recv and transmit */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/net: use tasklet_kill in device remove/close process
Some driver uses tasklet_disable in device remove/close process, tasklet_disable will inc tasklet->count and return. If the tasklet is not handled yet because some softirq pressure, the tasklet will placed on the tasklet_vec, never have a chance to excute. This might lead to ksoftirqd heavy loaded, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng Cc: "David S. Miller" Cc: net...@vger.kernel.org --- drivers/net/ethernet/jme.c|8 drivers/net/ethernet/marvell/skge.c |2 +- drivers/net/ethernet/micrel/ksz884x.c |4 ++-- drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +- drivers/net/wireless/b43legacy/pio.c |2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c index 8ddf9ca..76a91f6 100644 --- a/drivers/net/ethernet/jme.c +++ b/drivers/net/ethernet/jme.c @@ -1948,10 +1948,10 @@ jme_close(struct net_device *netdev) JME_NAPI_DISABLE(jme); - tasklet_disable(>linkch_task); - tasklet_disable(>txclean_task); - tasklet_disable(>rxclean_task); - tasklet_disable(>rxempty_task); + tasklet_kill(>linkch_task); + tasklet_kill(>txclean_task); + tasklet_kill(>rxclean_task); + tasklet_kill(>rxempty_task); jme_disable_rx_engine(jme); jme_disable_tx_engine(jme); diff --git a/drivers/net/ethernet/marvell/skge.c b/drivers/net/ethernet/marvell/skge.c index 9b9c2ac..d19a143 100644 --- a/drivers/net/ethernet/marvell/skge.c +++ b/drivers/net/ethernet/marvell/skge.c @@ -4026,7 +4026,7 @@ static void __devexit skge_remove(struct pci_dev *pdev) dev0 = hw->dev[0]; unregister_netdev(dev0); - tasklet_disable(>phy_task); + tasklet_kill(>phy_task); spin_lock_irq(>hw_lock); hw->intr_mask = 0; diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c index 318fee9..e558edd 100644 --- a/drivers/net/ethernet/micrel/ksz884x.c +++ b/drivers/net/ethernet/micrel/ksz884x.c @@ -5407,8 +5407,8 @@ static int netdev_close(struct net_device *dev) /* Delay for receive task to stop scheduling itself. */ msleep(2000 / HZ); - tasklet_disable(_priv->rx_tasklet); - tasklet_disable(_priv->tx_tasklet); + tasklet_kill(_priv->rx_tasklet); + tasklet_kill(_priv->tx_tasklet); free_irq(dev->irq, hw_priv->dev); transmit_cleanup(hw_priv, 0); diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 0793299..1d04754 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -990,7 +990,7 @@ static int axienet_stop(struct net_device *ndev) axienet_setoptions(ndev, lp->options & ~(XAE_OPTION_TXEN | XAE_OPTION_RXEN)); - tasklet_disable(>dma_err_tasklet); + tasklet_kill(>dma_err_tasklet); free_irq(lp->tx_irq, ndev); free_irq(lp->rx_irq, ndev); diff --git a/drivers/net/wireless/b43legacy/pio.c b/drivers/net/wireless/b43legacy/pio.c index 192251a..282eede 100644 --- a/drivers/net/wireless/b43legacy/pio.c +++ b/drivers/net/wireless/b43legacy/pio.c @@ -382,7 +382,7 @@ static void cancel_transfers(struct b43legacy_pioqueue *queue) { struct b43legacy_pio_txpacket *packet, *tmp_packet; - tasklet_disable(>txtask); + tasklet_kill(>txtask); list_for_each_entry_safe(packet, tmp_packet, >txrunning, list) free_txpacket(packet, 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] tipc: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng Cc: Jon Maloy Cc: Allan Stephens Cc: "David S. Miller" Cc: net...@vger.kernel.org Cc: tipc-discuss...@lists.sourceforge.net --- net/tipc/handler.c |1 - 1 file changed, 1 deletion(-) diff --git a/net/tipc/handler.c b/net/tipc/handler.c index 111ff83..b36f0fc 100644 --- a/net/tipc/handler.c +++ b/net/tipc/handler.c @@ -116,7 +116,6 @@ void tipc_handler_stop(void) return; handler_enabled = 0; - tasklet_disable(_tasklet); tasklet_kill(_tasklet); spin_lock_bh(_lock); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] usb: gadget: fsl_qe_udc: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng Cc: Li Yang Cc: Felipe Balbi Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- drivers/usb/gadget/fsl_qe_udc.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/gadget/fsl_qe_udc.c b/drivers/usb/gadget/fsl_qe_udc.c index b09452d..4ad3b82 100644 --- a/drivers/usb/gadget/fsl_qe_udc.c +++ b/drivers/usb/gadget/fsl_qe_udc.c @@ -2661,7 +2661,7 @@ static int __devexit qe_udc_remove(struct platform_device *ofdev) usb_del_gadget_udc(>gadget); udc->done = - tasklet_disable(>rx_tasklet); + tasklet_kill(>rx_tasklet); if (udc->nullmap) { dma_unmap_single(udc->gadget.dev.parent, @@ -2698,8 +2698,6 @@ static int __devexit qe_udc_remove(struct platform_device *ofdev) free_irq(udc->usb_irq, udc); irq_dispose_mapping(udc->usb_irq); - tasklet_kill(>rx_tasklet); - iounmap(udc->usb_regs); device_unregister(>gadget.dev); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] input: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng Cc: Dmitry Torokhov Cc: Tony Lindgren Cc: Sourav Poddar Cc: Josh Cc: Greg Kroah-Hartman Cc: linux-in...@vger.kernel.org --- drivers/input/keyboard/omap-keypad.c |3 +-- drivers/input/serio/hil_mlc.c|1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/input/keyboard/omap-keypad.c b/drivers/input/keyboard/omap-keypad.c index 4a5fcc8..6c52447 100644 --- a/drivers/input/keyboard/omap-keypad.c +++ b/drivers/input/keyboard/omap-keypad.c @@ -362,12 +362,11 @@ static int __devexit omap_kp_remove(struct platform_device *pdev) struct omap_kp *omap_kp = platform_get_drvdata(pdev); /* disable keypad interrupt handling */ - tasklet_disable(_tasklet); + tasklet_kill(_tasklet); omap_writew(1, OMAP1_MPUIO_BASE + OMAP_MPUIO_KBD_MASKIT); free_irq(omap_kp->irq, omap_kp); del_timer_sync(_kp->timer); - tasklet_kill(_tasklet); /* unregister everything */ input_unregister_device(omap_kp->input); diff --git a/drivers/input/serio/hil_mlc.c b/drivers/input/serio/hil_mlc.c index bfd3865..7fc1700 100644 --- a/drivers/input/serio/hil_mlc.c +++ b/drivers/input/serio/hil_mlc.c @@ -1011,7 +1011,6 @@ static void __exit hil_mlc_exit(void) { del_timer_sync(_mlcs_kicker); - tasklet_disable(_mlcs_tasklet); tasklet_kill(_mlcs_tasklet); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] tasklet: remove tasklet_kill_immediate()
tasklet_kill_immediate() is no longer used, just remove it. Signed-off-by: Xiaotian Feng Cc: Thomas Gleixner Cc: Frederic Weisbecker Cc: "Paul E. McKenney" Cc: Andrew Morton Cc: Josh Triplett --- kernel/softirq.c | 32 1 file changed, 32 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index cc96bdc..61abd86 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -762,38 +762,6 @@ static void run_ksoftirqd(unsigned int cpu) } #ifdef CONFIG_HOTPLUG_CPU -/* - * tasklet_kill_immediate is called to remove a tasklet which can already be - * scheduled for execution on @cpu. - * - * Unlike tasklet_kill, this function removes the tasklet - * _immediately_, even if the tasklet is in TASKLET_STATE_SCHED state. - * - * When this function is called, @cpu must be in the CPU_DEAD state. - */ -void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu) -{ - struct tasklet_struct **i; - - BUG_ON(cpu_online(cpu)); - BUG_ON(test_bit(TASKLET_STATE_RUN, >state)); - - if (!test_bit(TASKLET_STATE_SCHED, >state)) - return; - - /* CPU is dead, so no lock needed. */ - for (i = _cpu(tasklet_vec, cpu).head; *i; i = &(*i)->next) { - if (*i == t) { - *i = t->next; - /* If this was the tail element, move the tail ptr */ - if (*i == NULL) - per_cpu(tasklet_vec, cpu).tail = i; - return; - } - } - BUG(); -} - static void takeover_tasklets(unsigned int cpu) { /* CPU is dead, so no lock needed. */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] tasklet: remove tasklet_kill_immediate()
tasklet_kill_immediate() is no longer used, just remove it. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Thomas Gleixner t...@linutronix.de Cc: Frederic Weisbecker fweis...@gmail.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Andrew Morton a...@linux-foundation.org Cc: Josh Triplett j...@joshtriplett.org --- kernel/softirq.c | 32 1 file changed, 32 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index cc96bdc..61abd86 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -762,38 +762,6 @@ static void run_ksoftirqd(unsigned int cpu) } #ifdef CONFIG_HOTPLUG_CPU -/* - * tasklet_kill_immediate is called to remove a tasklet which can already be - * scheduled for execution on @cpu. - * - * Unlike tasklet_kill, this function removes the tasklet - * _immediately_, even if the tasklet is in TASKLET_STATE_SCHED state. - * - * When this function is called, @cpu must be in the CPU_DEAD state. - */ -void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu) -{ - struct tasklet_struct **i; - - BUG_ON(cpu_online(cpu)); - BUG_ON(test_bit(TASKLET_STATE_RUN, t-state)); - - if (!test_bit(TASKLET_STATE_SCHED, t-state)) - return; - - /* CPU is dead, so no lock needed. */ - for (i = per_cpu(tasklet_vec, cpu).head; *i; i = (*i)-next) { - if (*i == t) { - *i = t-next; - /* If this was the tail element, move the tail ptr */ - if (*i == NULL) - per_cpu(tasklet_vec, cpu).tail = i; - return; - } - } - BUG(); -} - static void takeover_tasklets(unsigned int cpu) { /* CPU is dead, so no lock needed. */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] input: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Dmitry Torokhov dmitry.torok...@gmail.com Cc: Tony Lindgren t...@atomide.com Cc: Sourav Poddar sourav.pod...@ti.com Cc: Josh joshua.tayl...@gmail.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Cc: linux-in...@vger.kernel.org --- drivers/input/keyboard/omap-keypad.c |3 +-- drivers/input/serio/hil_mlc.c|1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/input/keyboard/omap-keypad.c b/drivers/input/keyboard/omap-keypad.c index 4a5fcc8..6c52447 100644 --- a/drivers/input/keyboard/omap-keypad.c +++ b/drivers/input/keyboard/omap-keypad.c @@ -362,12 +362,11 @@ static int __devexit omap_kp_remove(struct platform_device *pdev) struct omap_kp *omap_kp = platform_get_drvdata(pdev); /* disable keypad interrupt handling */ - tasklet_disable(kp_tasklet); + tasklet_kill(kp_tasklet); omap_writew(1, OMAP1_MPUIO_BASE + OMAP_MPUIO_KBD_MASKIT); free_irq(omap_kp-irq, omap_kp); del_timer_sync(omap_kp-timer); - tasklet_kill(kp_tasklet); /* unregister everything */ input_unregister_device(omap_kp-input); diff --git a/drivers/input/serio/hil_mlc.c b/drivers/input/serio/hil_mlc.c index bfd3865..7fc1700 100644 --- a/drivers/input/serio/hil_mlc.c +++ b/drivers/input/serio/hil_mlc.c @@ -1011,7 +1011,6 @@ static void __exit hil_mlc_exit(void) { del_timer_sync(hil_mlcs_kicker); - tasklet_disable(hil_mlcs_tasklet); tasklet_kill(hil_mlcs_tasklet); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] usb: gadget: fsl_qe_udc: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Li Yang le...@freescale.com Cc: Felipe Balbi ba...@ti.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Cc: linux-...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- drivers/usb/gadget/fsl_qe_udc.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/gadget/fsl_qe_udc.c b/drivers/usb/gadget/fsl_qe_udc.c index b09452d..4ad3b82 100644 --- a/drivers/usb/gadget/fsl_qe_udc.c +++ b/drivers/usb/gadget/fsl_qe_udc.c @@ -2661,7 +2661,7 @@ static int __devexit qe_udc_remove(struct platform_device *ofdev) usb_del_gadget_udc(udc-gadget); udc-done = done; - tasklet_disable(udc-rx_tasklet); + tasklet_kill(udc-rx_tasklet); if (udc-nullmap) { dma_unmap_single(udc-gadget.dev.parent, @@ -2698,8 +2698,6 @@ static int __devexit qe_udc_remove(struct platform_device *ofdev) free_irq(udc-usb_irq, udc); irq_dispose_mapping(udc-usb_irq); - tasklet_kill(udc-rx_tasklet); - iounmap(udc-usb_regs); device_unregister(udc-gadget.dev); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] tipc: do not use tasklet_disable before tasklet_kill
If tasklet_disable() is called before related tasklet handled, tasklet_kill will never be finished. tasklet_kill is enough. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Jon Maloy jon.ma...@ericsson.com Cc: Allan Stephens allan.steph...@windriver.com Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org Cc: tipc-discuss...@lists.sourceforge.net --- net/tipc/handler.c |1 - 1 file changed, 1 deletion(-) diff --git a/net/tipc/handler.c b/net/tipc/handler.c index 111ff83..b36f0fc 100644 --- a/net/tipc/handler.c +++ b/net/tipc/handler.c @@ -116,7 +116,6 @@ void tipc_handler_stop(void) return; handler_enabled = 0; - tasklet_disable(tipc_tasklet); tasklet_kill(tipc_tasklet); spin_lock_bh(qitem_lock); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/net: use tasklet_kill in device remove/close process
Some driver uses tasklet_disable in device remove/close process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet because some softirq pressure, the tasklet will placed on the tasklet_vec, never have a chance to excute. This might lead to ksoftirqd heavy loaded, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org --- drivers/net/ethernet/jme.c|8 drivers/net/ethernet/marvell/skge.c |2 +- drivers/net/ethernet/micrel/ksz884x.c |4 ++-- drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +- drivers/net/wireless/b43legacy/pio.c |2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c index 8ddf9ca..76a91f6 100644 --- a/drivers/net/ethernet/jme.c +++ b/drivers/net/ethernet/jme.c @@ -1948,10 +1948,10 @@ jme_close(struct net_device *netdev) JME_NAPI_DISABLE(jme); - tasklet_disable(jme-linkch_task); - tasklet_disable(jme-txclean_task); - tasklet_disable(jme-rxclean_task); - tasklet_disable(jme-rxempty_task); + tasklet_kill(jme-linkch_task); + tasklet_kill(jme-txclean_task); + tasklet_kill(jme-rxclean_task); + tasklet_kill(jme-rxempty_task); jme_disable_rx_engine(jme); jme_disable_tx_engine(jme); diff --git a/drivers/net/ethernet/marvell/skge.c b/drivers/net/ethernet/marvell/skge.c index 9b9c2ac..d19a143 100644 --- a/drivers/net/ethernet/marvell/skge.c +++ b/drivers/net/ethernet/marvell/skge.c @@ -4026,7 +4026,7 @@ static void __devexit skge_remove(struct pci_dev *pdev) dev0 = hw-dev[0]; unregister_netdev(dev0); - tasklet_disable(hw-phy_task); + tasklet_kill(hw-phy_task); spin_lock_irq(hw-hw_lock); hw-intr_mask = 0; diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c index 318fee9..e558edd 100644 --- a/drivers/net/ethernet/micrel/ksz884x.c +++ b/drivers/net/ethernet/micrel/ksz884x.c @@ -5407,8 +5407,8 @@ static int netdev_close(struct net_device *dev) /* Delay for receive task to stop scheduling itself. */ msleep(2000 / HZ); - tasklet_disable(hw_priv-rx_tasklet); - tasklet_disable(hw_priv-tx_tasklet); + tasklet_kill(hw_priv-rx_tasklet); + tasklet_kill(hw_priv-tx_tasklet); free_irq(dev-irq, hw_priv-dev); transmit_cleanup(hw_priv, 0); diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 0793299..1d04754 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -990,7 +990,7 @@ static int axienet_stop(struct net_device *ndev) axienet_setoptions(ndev, lp-options ~(XAE_OPTION_TXEN | XAE_OPTION_RXEN)); - tasklet_disable(lp-dma_err_tasklet); + tasklet_kill(lp-dma_err_tasklet); free_irq(lp-tx_irq, ndev); free_irq(lp-rx_irq, ndev); diff --git a/drivers/net/wireless/b43legacy/pio.c b/drivers/net/wireless/b43legacy/pio.c index 192251a..282eede 100644 --- a/drivers/net/wireless/b43legacy/pio.c +++ b/drivers/net/wireless/b43legacy/pio.c @@ -382,7 +382,7 @@ static void cancel_transfers(struct b43legacy_pioqueue *queue) { struct b43legacy_pio_txpacket *packet, *tmp_packet; - tasklet_disable(queue-txtask); + tasklet_kill(queue-txtask); list_for_each_entry_safe(packet, tmp_packet, queue-txrunning, list) free_txpacket(packet, 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] atm: he: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Chas Williams c...@cmf.nrl.navy.mil Cc: linux-atm-gene...@lists.sourceforge.net --- drivers/atm/he.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/atm/he.c b/drivers/atm/he.c index b182c2f..1dfcc9a 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -1556,7 +1556,7 @@ he_stop(struct he_dev *he_dev) gen_cntl_0 = ~(INT_PROC_ENBL | INIT_ENB); pci_write_config_dword(pci_dev, GEN_CNTL_0, gen_cntl_0); - tasklet_disable(he_dev-tasklet); + tasklet_kill(he_dev-tasklet); /* disable recv and transmit */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dma: ioat: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com --- drivers/dma/ioat/dma.c|2 +- drivers/dma/ioat/dma_v2.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index 73b2b65..8bec4a2 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -379,7 +379,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan *c) if (ioat-desccount == 0) return; - tasklet_disable(chan-cleanup_task); + tasklet_kill(chan-cleanup_task); del_timer_sync(chan-timer); ioat1_cleanup(ioat); diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c index b9d6678..df3ccb0 100644 --- a/drivers/dma/ioat/dma_v2.c +++ b/drivers/dma/ioat/dma_v2.c @@ -794,7 +794,7 @@ void ioat2_free_chan_resources(struct dma_chan *c) if (!ioat-ring) return; - tasklet_disable(chan-cleanup_task); + tasklet_kill(chan-cleanup_task); del_timer_sync(chan-timer); device-cleanup_fn((unsigned long) c); device-reset_hw(chan); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] isdn/gigaset: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Hansjoerg Lipp hjl...@web.de Cc: Tilman Schmidt til...@imap.cc Cc: Karsten Keil i...@linux-pingi.de Cc: gigaset307x-com...@lists.sourceforge.net --- drivers/isdn/gigaset/interface.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/isdn/gigaset/interface.c b/drivers/isdn/gigaset/interface.c index 67abf3f..284c7f3 100644 --- a/drivers/isdn/gigaset/interface.c +++ b/drivers/isdn/gigaset/interface.c @@ -544,7 +544,6 @@ void gigaset_if_free(struct cardstate *cs) if (!drv-have_tty) return; - tasklet_disable(cs-if_wake_tasklet); tasklet_kill(cs-if_wake_tasklet); cs-tty_dev = NULL; tty_unregister_device(drv-tty, cs-minor_index); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mmc: s3cmci: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Ben Dooks ben-li...@fluff.org Cc: Chris Ball c...@laptop.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-...@vger.kernel.org --- drivers/mmc/host/s3cmci.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c index 4638dda..d70d0cc 100644 --- a/drivers/mmc/host/s3cmci.c +++ b/drivers/mmc/host/s3cmci.c @@ -1830,7 +1830,7 @@ static int __devexit s3cmci_remove(struct platform_device *pdev) clk_put(host-clk); - tasklet_disable(host-pio_tasklet); + tasklet_kill(host-pio_tasklet); if (s3cmci_host_usedma(host)) s3c2410_dma_free(host-dma, s3cmci_dma_client); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rapidio/tsi721: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Matt Porter mpor...@kernel.crashing.org Cc: Alexandre Bounine alexandre.boun...@idt.com Cc: Andrew Morton a...@linux-foundation.org --- drivers/rapidio/devices/tsi721_dma.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c index 92e06a5..a098fbc 100644 --- a/drivers/rapidio/devices/tsi721_dma.c +++ b/drivers/rapidio/devices/tsi721_dma.c @@ -589,7 +589,7 @@ static void tsi721_free_chan_resources(struct dma_chan *dchan) BUG_ON(!list_empty(bdma_chan-active_list)); BUG_ON(!list_empty(bdma_chan-queue)); - tasklet_disable(bdma_chan-tasklet); + tasklet_kill(bdma_chan-tasklet); spin_lock_bh(bdma_chan-lock); list_splice_init(bdma_chan-free_list, list); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging: ozwpan: use tasklet_kill in device remove/release process
Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Rupesh Gujare rguj...@ozmodevices.com Cc: Chris Kelly cke...@ozmodevices.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Cc: de...@driverdev.osuosl.org --- drivers/staging/ozwpan/ozhcd.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/ozwpan/ozhcd.c b/drivers/staging/ozwpan/ozhcd.c index 2e087ac..33c0009 100644 --- a/drivers/staging/ozwpan/ozhcd.c +++ b/drivers/staging/ozwpan/ozhcd.c @@ -2304,8 +2304,8 @@ error: */ void oz_hcd_term(void) { - tasklet_disable(g_urb_process_tasklet); - tasklet_disable(g_urb_cancel_tasklet); + tasklet_kill(g_urb_process_tasklet); + tasklet_kill(g_urb_cancel_tasklet); platform_device_unregister(g_plat_dev); platform_driver_unregister(g_oz_plat_drv); oz_trace(Pending urbs:%d\n, atomic_read(g_pending_urbs)); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: ioat: use tasklet_kill in device remove/release process
On Wed, Oct 31, 2012 at 6:41 PM, Xiaotian Feng xtf...@gmail.com wrote: Some driver uses tasklet_disable in device remove/release process, tasklet_disable will inc tasklet-count and return. If the tasklet is not handled yet under some softirq pressure, the tasklet will be placed on the tasklet_vec, never have a chance to be excuted. This might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq, but tasklet is disabled. tasklet_kill should be used in this case. Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Please ignore this one, I made a mistake here. --- drivers/dma/ioat/dma.c|2 +- drivers/dma/ioat/dma_v2.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index 73b2b65..8bec4a2 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -379,7 +379,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan *c) if (ioat-desccount == 0) return; - tasklet_disable(chan-cleanup_task); + tasklet_kill(chan-cleanup_task); del_timer_sync(chan-timer); ioat1_cleanup(ioat); diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c index b9d6678..df3ccb0 100644 --- a/drivers/dma/ioat/dma_v2.c +++ b/drivers/dma/ioat/dma_v2.c @@ -794,7 +794,7 @@ void ioat2_free_chan_resources(struct dma_chan *c) if (!ioat-ring) return; - tasklet_disable(chan-cleanup_task); + tasklet_kill(chan-cleanup_task); del_timer_sync(chan-timer); device-cleanup_fn((unsigned long) c); device-reset_hw(chan); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] V2 sched, autogroup: fix crash on reboot when autogroup is disabled
On Mon, Oct 29, 2012 at 3:19 AM, Mike Galbraith wrote: > On Sun, 2012-10-28 at 15:05 +0100, Ingo Molnar wrote: >> * Mike Galbraith wrote: >> >> > On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote: >> > > * Mike Galbraith wrote: >> > > >> > > > On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote: >> > > > > * Mike Galbraith wrote: >> > > > > >> > > > >> > > > > > No knobs, no glitz, nada, just a cute little thing folks can turn >> > > > > > on if they don't want to muck about with cgroups and/or systemd. >> > > > > >> > > > > Please also keep the Kconfig switch and reuse it to turn on >> > > > > the 'autogroups' knob. >> > > > > >> > > > > That way people with existing .config's don't have to change >> > > > > a thing to get this functionality. >> > > > >> > > > The Kconfig option is still there. The noautogroup -> >> > > > autogroup arg change just makes it off by default (since an >> > > > on/off switch would have to be a full move everybody thing >> > > > post 8323f26ce race fix), so distros can make it available in >> > > > their swiss army knife config, but it'll be out of the way >> > > > unless specifically asked for by the user at boot. >> > > > >> > > > I can make it default 'on' by removing that arg change if you >> > > > think that's the better way to go, but opt in at boot sounded >> > > > better to me given there is no runtime on/off switch at all >> > > > now. >> > > >> > > If I got your patch right then adding a command line option to >> > > turn it on will disable it in essence for pretty much everyone >> > > who has CONFIG_SCHED_AUTOGROUP=y in their .config today. >> > >> > With no user intervention, yes. >> >> 'No user intervention' is what happens with new kernel >> commandline options, in 99.% of the cases. >> >> > > The patch should not change the defaults for existing >> > > .config's. >> > > >> > > I.e. if autogroups was off, it should stay off, but if >> > > autogroups was enabled in the .config and the kernel booted >> > > with it enabled, then it should continue to do so in the >> > > future as well. >> > > >> > > Adding a boot tweak and removing the runtime knobs is OK - >> > > changing the current default functionality is not. >> > >> > Ok, I'll whack the arg change and respin. >> >> Thanks! >> >> I'd also suggest to still expose the state of autosched in >> /proc/sys, read-only, so that its status can be checked. > > Done. Autogroup remains default on with noautogroup opt out at boot > time as before. Sysctl remains intact, read only. Knobs are gone. > > sched, autogroup: fix crash on reboot when autogroup is disabled > > Between 8323f26ce and 800d4d30, autogroup is a wreck. With both > applied, all you have to do to crash a box is disable autogroup > during boot up, then reboot.. boom, NULL pointer dereference due > to 800d4d30 not allowing autogroup to move things, and 8323f26ce > making that the only way to switch runqueues. > > Remove all of the knobs, as what wasn't broken would be by making > autogroup exclusively either on or off from boot, with off meaning > autogroups are not created, so unavailable for proc interface. > I'm okay with the removal of runtime enable/disable autogroup. But can we simply remove these two knobs that is already exposed to user space since 2.6.38 ? So we can't cat /proc//autogroup or echo > /proc//autogroup anymore even if autogroup is on? > If the user fiddles with cgroups hereafter, tough, once tasks are > moved, autogroup won't mess with them again unless they call setsid(). > > No knobs, no glitz, nada, just a cute little thing folks can turn > on if they don't want to muck about with cgroups and/or systemd. > > Signed-off-by: Mike Galbraith > Cc: sta...@vger.kernel.org v3.6 > > --- > fs/proc/base.c| 78 > -- > kernel/sched/auto_group.c | 68 ++-- > kernel/sched/auto_group.h |9 - > kernel/sysctl.c |6 +-- > 4 files changed, 14 insertions(+), 147 deletions(-) > > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -1165,81 +1165,6 @@ static const struct file_operations proc > > #endif > > -#ifdef CONFIG_SCHED_AUTOGROUP > -/* > - * Print out autogroup related information: > - */ > -static int sched_autogroup_show(struct seq_file *m, void *v) > -{ > - struct inode *inode = m->private; > - struct task_struct *p; > - > - p = get_proc_task(inode); > - if (!p) > - return -ESRCH; > - proc_sched_autogroup_show_task(p, m); > - > - put_task_struct(p); > - > - return 0; > -} > - > -static ssize_t > -sched_autogroup_write(struct file *file, const char __user *buf, > - size_t count, loff_t *offset) > -{ > - struct inode *inode = file->f_path.dentry->d_inode; > - struct task_struct *p; > - char buffer[PROC_NUMBUF]; > - int nice; > - int err; > - > - memset(buffer, 0, sizeof(buffer)); > - if (count >
Re: [PATCH] V2 sched, autogroup: fix crash on reboot when autogroup is disabled
On Mon, Oct 29, 2012 at 3:19 AM, Mike Galbraith efa...@gmx.de wrote: On Sun, 2012-10-28 at 15:05 +0100, Ingo Molnar wrote: * Mike Galbraith efa...@gmx.de wrote: On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote: * Mike Galbraith efa...@gmx.de wrote: On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote: * Mike Galbraith efa...@gmx.de wrote: No knobs, no glitz, nada, just a cute little thing folks can turn on if they don't want to muck about with cgroups and/or systemd. Please also keep the Kconfig switch and reuse it to turn on the 'autogroups' knob. That way people with existing .config's don't have to change a thing to get this functionality. The Kconfig option is still there. The noautogroup - autogroup arg change just makes it off by default (since an on/off switch would have to be a full move everybody thing post 8323f26ce race fix), so distros can make it available in their swiss army knife config, but it'll be out of the way unless specifically asked for by the user at boot. I can make it default 'on' by removing that arg change if you think that's the better way to go, but opt in at boot sounded better to me given there is no runtime on/off switch at all now. If I got your patch right then adding a command line option to turn it on will disable it in essence for pretty much everyone who has CONFIG_SCHED_AUTOGROUP=y in their .config today. With no user intervention, yes. 'No user intervention' is what happens with new kernel commandline options, in 99.% of the cases. The patch should not change the defaults for existing .config's. I.e. if autogroups was off, it should stay off, but if autogroups was enabled in the .config and the kernel booted with it enabled, then it should continue to do so in the future as well. Adding a boot tweak and removing the runtime knobs is OK - changing the current default functionality is not. Ok, I'll whack the arg change and respin. Thanks! I'd also suggest to still expose the state of autosched in /proc/sys, read-only, so that its status can be checked. Done. Autogroup remains default on with noautogroup opt out at boot time as before. Sysctl remains intact, read only. Knobs are gone. sched, autogroup: fix crash on reboot when autogroup is disabled Between 8323f26ce and 800d4d30, autogroup is a wreck. With both applied, all you have to do to crash a box is disable autogroup during boot up, then reboot.. boom, NULL pointer dereference due to 800d4d30 not allowing autogroup to move things, and 8323f26ce making that the only way to switch runqueues. Remove all of the knobs, as what wasn't broken would be by making autogroup exclusively either on or off from boot, with off meaning autogroups are not created, so unavailable for proc interface. I'm okay with the removal of runtime enable/disable autogroup. But can we simply remove these two knobs that is already exposed to user space since 2.6.38 ? So we can't cat /proc/pid/autogroup or echo nice level /proc/pid/autogroup anymore even if autogroup is on? If the user fiddles with cgroups hereafter, tough, once tasks are moved, autogroup won't mess with them again unless they call setsid(). No knobs, no glitz, nada, just a cute little thing folks can turn on if they don't want to muck about with cgroups and/or systemd. Signed-off-by: Mike Galbraith efa...@gmx.de Cc: sta...@vger.kernel.org v3.6 --- fs/proc/base.c| 78 -- kernel/sched/auto_group.c | 68 ++-- kernel/sched/auto_group.h |9 - kernel/sysctl.c |6 +-- 4 files changed, 14 insertions(+), 147 deletions(-) --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1165,81 +1165,6 @@ static const struct file_operations proc #endif -#ifdef CONFIG_SCHED_AUTOGROUP -/* - * Print out autogroup related information: - */ -static int sched_autogroup_show(struct seq_file *m, void *v) -{ - struct inode *inode = m-private; - struct task_struct *p; - - p = get_proc_task(inode); - if (!p) - return -ESRCH; - proc_sched_autogroup_show_task(p, m); - - put_task_struct(p); - - return 0; -} - -static ssize_t -sched_autogroup_write(struct file *file, const char __user *buf, - size_t count, loff_t *offset) -{ - struct inode *inode = file-f_path.dentry-d_inode; - struct task_struct *p; - char buffer[PROC_NUMBUF]; - int nice; - int err; - - memset(buffer, 0, sizeof(buffer)); - if (count sizeof(buffer) - 1) - count = sizeof(buffer) - 1; - if (copy_from_user(buffer, buf, count)) - return -EFAULT; - - err = kstrtoint(strstrip(buffer), 0, nice); - if (err 0) -
Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup
On Fri, Oct 19, 2012 at 9:42 PM, Peter Zijlstra wrote: > Always try and CC people who wrote the code.. > > On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote: >> There's a regression from commit 800d4d30, in autogroup_move_group() >> >> p->signal->autogroup = autogroup_kref_get(ag); >> >> if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) >> goto out; >> ... >> out: >> autogroup_kref_put(prev); >> >> So kernel changed p's autogroup to ag, but never sched_move_task(p). >> Then previous autogroup of p is released, which may release task_group >> related with p. After commit 8323f26ce, p->sched_task_group might point >> to this stale value, and thus caused kernel crashes. >> >> This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0" >> to your /etc/sysctl.conf, your system will never boot up. It is not >> reasonable >> to put the sysctl enabled check in autogroup_move_group(), kernel should >> check >> it before autogroup_create in sched_autogroup_create_attach(). >> >> Reported-by: cwillu >> Reported-by: Luis Henriques >> Signed-off-by: Xiaotian Feng >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> --- >> kernel/sched/auto_group.c | 10 +- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c >> index 0984a21..ac62415 100644 >> --- a/kernel/sched/auto_group.c >> +++ b/kernel/sched/auto_group.c >> @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct >> autogroup *ag) >> >> p->signal->autogroup = autogroup_kref_get(ag); >> >> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) >> - goto out; >> - >> t = p; >> do { >> sched_move_task(t); >> } while_each_thread(p, t); >> >> -out: >> unlock_task_sighand(p, ); >> autogroup_kref_put(prev); >> } > > So I've looked at this for all of 1 minute, but why isn't moving that > check up one line to be above the p->signal->autogroup assignment > enough? I think if autogroup is disabled, we don't need to use autogroup_create() to create a new ag and tg, kernel will not use it. > >> @@ -159,8 +155,12 @@ out: >> /* Allocates GFP_KERNEL, cannot be called under any spinlock */ >> void sched_autogroup_create_attach(struct task_struct *p) >> { >> - struct autogroup *ag = autogroup_create(); >> + struct autogroup *ag; >> + >> + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) >> + return; >> >> + ag = autogroup_create(); >> autogroup_move_group(p, ag); >> /* drop extra reference added by autogroup_create() */ >> autogroup_kref_put(ag); > > Man,.. so on memory allocation fail we'll put the group in > autogroup_default, which I think ends up being the root cgroup. > > But what happens when sysctl_sched_autogroup_enabled is false? > autogroup runtime disable is very nasty, as it might happen at any place of sched_move_group() for any setsid task. After sysctl_sched_autogroup_enabled is changed to false, autogroup_task_group(p, tg) will return tg, which is from its cpu cgroup. > It looks like sched_autogroup_fork() is effective in that case, which > would mean we'll stay in whatever group our parent is in, which is not > the same as being disabled. It's true, but after autogroup is disabled, p->signal->autogroup will never be used then, as autogroup_task_group() will not use it. But after autogroup is enabled again, it might be a disaster I think we'd better delete the runtime enable/disable support for autogroup, because it might mess up the group scheduler > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup
On Fri, Oct 19, 2012 at 9:42 PM, Peter Zijlstra pet...@infradead.org wrote: Always try and CC people who wrote the code.. On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote: There's a regression from commit 800d4d30, in autogroup_move_group() p-signal-autogroup = autogroup_kref_get(ag); if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) goto out; ... out: autogroup_kref_put(prev); So kernel changed p's autogroup to ag, but never sched_move_task(p). Then previous autogroup of p is released, which may release task_group related with p. After commit 8323f26ce, p-sched_task_group might point to this stale value, and thus caused kernel crashes. This is very easy to reproduce, add kernel.sched_autogroup_enabled = 0 to your /etc/sysctl.conf, your system will never boot up. It is not reasonable to put the sysctl enabled check in autogroup_move_group(), kernel should check it before autogroup_create in sched_autogroup_create_attach(). Reported-by: cwillu cwi...@cwillu.com Reported-by: Luis Henriques luis.henriq...@canonical.com Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org --- kernel/sched/auto_group.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c index 0984a21..ac62415 100644 --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag) p-signal-autogroup = autogroup_kref_get(ag); - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) - goto out; - t = p; do { sched_move_task(t); } while_each_thread(p, t); -out: unlock_task_sighand(p, flags); autogroup_kref_put(prev); } So I've looked at this for all of 1 minute, but why isn't moving that check up one line to be above the p-signal-autogroup assignment enough? I think if autogroup is disabled, we don't need to use autogroup_create() to create a new ag and tg, kernel will not use it. @@ -159,8 +155,12 @@ out: /* Allocates GFP_KERNEL, cannot be called under any spinlock */ void sched_autogroup_create_attach(struct task_struct *p) { - struct autogroup *ag = autogroup_create(); + struct autogroup *ag; + + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) + return; + ag = autogroup_create(); autogroup_move_group(p, ag); /* drop extra reference added by autogroup_create() */ autogroup_kref_put(ag); Man,.. so on memory allocation fail we'll put the group in autogroup_default, which I think ends up being the root cgroup. But what happens when sysctl_sched_autogroup_enabled is false? autogroup runtime disable is very nasty, as it might happen at any place of sched_move_group() for any setsid task. After sysctl_sched_autogroup_enabled is changed to false, autogroup_task_group(p, tg) will return tg, which is from its cpu cgroup. It looks like sched_autogroup_fork() is effective in that case, which would mean we'll stay in whatever group our parent is in, which is not the same as being disabled. It's true, but after autogroup is disabled, p-signal-autogroup will never be used then, as autogroup_task_group() will not use it. But after autogroup is enabled again, it might be a disaster I think we'd better delete the runtime enable/disable support for autogroup, because it might mess up the group scheduler -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup
There's a regression from commit 800d4d30, in autogroup_move_group() p->signal->autogroup = autogroup_kref_get(ag); if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) goto out; ... out: autogroup_kref_put(prev); So kernel changed p's autogroup to ag, but never sched_move_task(p). Then previous autogroup of p is released, which may release task_group related with p. After commit 8323f26ce, p->sched_task_group might point to this stale value, and thus caused kernel crashes. This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0" to your /etc/sysctl.conf, your system will never boot up. It is not reasonable to put the sysctl enabled check in autogroup_move_group(), kernel should check it before autogroup_create in sched_autogroup_create_attach(). Reported-by: cwillu Reported-by: Luis Henriques Signed-off-by: Xiaotian Feng Cc: Ingo Molnar Cc: Peter Zijlstra --- kernel/sched/auto_group.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c index 0984a21..ac62415 100644 --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag) p->signal->autogroup = autogroup_kref_get(ag); - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) - goto out; - t = p; do { sched_move_task(t); } while_each_thread(p, t); -out: unlock_task_sighand(p, ); autogroup_kref_put(prev); } @@ -159,8 +155,12 @@ out: /* Allocates GFP_KERNEL, cannot be called under any spinlock */ void sched_autogroup_create_attach(struct task_struct *p) { - struct autogroup *ag = autogroup_create(); + struct autogroup *ag; + + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) + return; + ag = autogroup_create(); autogroup_move_group(p, ag); /* drop extra reference added by autogroup_create() */ autogroup_kref_put(ag); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup
There's a regression from commit 800d4d30, in autogroup_move_group() p-signal-autogroup = autogroup_kref_get(ag); if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) goto out; ... out: autogroup_kref_put(prev); So kernel changed p's autogroup to ag, but never sched_move_task(p). Then previous autogroup of p is released, which may release task_group related with p. After commit 8323f26ce, p-sched_task_group might point to this stale value, and thus caused kernel crashes. This is very easy to reproduce, add kernel.sched_autogroup_enabled = 0 to your /etc/sysctl.conf, your system will never boot up. It is not reasonable to put the sysctl enabled check in autogroup_move_group(), kernel should check it before autogroup_create in sched_autogroup_create_attach(). Reported-by: cwillu cwi...@cwillu.com Reported-by: Luis Henriques luis.henriq...@canonical.com Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org --- kernel/sched/auto_group.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c index 0984a21..ac62415 100644 --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag) p-signal-autogroup = autogroup_kref_get(ag); - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) - goto out; - t = p; do { sched_move_task(t); } while_each_thread(p, t); -out: unlock_task_sighand(p, flags); autogroup_kref_put(prev); } @@ -159,8 +155,12 @@ out: /* Allocates GFP_KERNEL, cannot be called under any spinlock */ void sched_autogroup_create_attach(struct task_struct *p) { - struct autogroup *ag = autogroup_create(); + struct autogroup *ag; + + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled)) + return; + ag = autogroup_create(); autogroup_move_group(p, ag); /* drop extra reference added by autogroup_create() */ autogroup_kref_put(ag); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)
On Wed, Oct 10, 2012 at 5:12 PM, Xiaotian Feng wrote: > On Wed, Oct 10, 2012 at 3:49 PM, Greg KH wrote: >> On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote: >>> On Tue, 9 Oct 2012 12:03:00 -0700 >>> Greg KH wrote: >>> >>> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote: >>> > > On Sat, 6 Oct 2012 23:56:33 +0400 >>> > > Andrew Vagin wrote: >>> > > >>> > > > Here is a stack trace of recursion: >>> > > > free_pid_ns(parent) >>> > > > put_pid_ns(parent) >>> > > > kref_put(>kref, free_pid_ns); >>> > > > free_pid_ns >>> > > > >>> > > > This patch turns recursion into loops. >>> > > > >>> > > > pidns can be nested many times, so in case of recursion >>> > > > a simple user space program can provoke a kernel panic >>> > > > due to exceed of a kernel stack. >>> > > >>> > > So we should backport this into earlier kernels. >>> > > >>> > > > --- a/include/linux/kref.h >>> > > > +++ b/include/linux/kref.h >>> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void >>> > > > (*release)(struct kref *kref) >>> > > > return kref_sub(kref, 1, release); >>> > > > } >>> > > > >>> > > > +/** >>> > > > + * kref_put - decrement refcount for object. >>> > > > + * @kref: object. >>> > > > + * >>> > > > + * Decrement the refcount. >>> > > > + * Return 1 if refcount is zero. >>> > > > + */ >>> > > > +static inline int __kref_put(struct kref *kref) >>> > > > +{ >>> > > > + return atomic_dec_and_test(>refcount); >>> > > > +} >>> > > >>> > > Greg might be interested in this. >>> > > >>> > > It's a pretty specialised thing and perhaps it needs some stern words >>> > > in the description explaining when and why it should and shouldn't be >>> > > used. >>> > > >>> > > I wonder if people might (ab)use this to avoid the "doesn't >>> > > have a release function" warning. >>> > >>> > Yes they would, please don't do this at all. >>> > >>> > In fact, why is it needed? It doesn't solve anything (if it does, >>> > something in the way the kref is being used is wrong.) >>> > >>> >>> It's right there in the changelog. The patch fixes deep >>> kref_put->release->kref_put recursion by turning the operation for >>> pidns into a loop. >> >> But why would a kref release function ever decrement the same kref >> again causing a loop in the first place? >> > > This should be kref_put ns->parent recursionly, put_pid_ns(ns) -> > free_pid_ns(ns) -> put_pid_ns(ns->parent)->... > If users create many nested namespaces, the recursion put_pid_ns() > will exausted the kernel stack. > > >> That's what I was referring to. This strongly sounds like a problem in >> how the kref is being used, not in the kref code itself. >> >> Is a kref even the correct thing here? > > Can we fix this by this way? free_pid_ns just release ns itself, we check > the return value of kref_put, if kref_put returns 1, means ns->kref is > removed, > then we kref_put(ns->parent). > > diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h > index 00474b0..2168535 100644 > --- a/include/linux/pid_namespace.h > +++ b/include/linux/pid_namespace.h > @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace > *pid_ns, int cmd); > > static inline void put_pid_ns(struct pid_namespace *ns) > { > - if (ns != _pid_ns) > - kref_put(>kref, free_pid_ns); > + int ret; > + struct pid_namespace *parent miss a ";" here > + > + while (ns != _pid_ns) { > + parent = ns->parent; > + ret = kref_put(>kref, free_pid_ns); > + if (!ret) This line should be "if (!ret | !parent)" > + break; > + else ns = parent; > + } > } > > #else /* !CONFIG_PID_NS */ > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 478bad2..85ca23a 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long > flags, struct pid_namespace *old > > void free_pid_ns(struct kref *kref) > { > - struct pid_namespace *ns, *parent; > + struct pid_namespace *ns; > > ns = container_of(kref, struct pid_namespace, kref); > > - parent = ns->parent; > destroy_pid_namespace(ns); > - > - if (parent != NULL) > - put_pid_ns(parent); > } > EXPORT_SYMBOL_GPL(free_pid_ns); > > >> >> greg k-h >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)
On Wed, Oct 10, 2012 at 3:49 PM, Greg KH wrote: > On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote: >> On Tue, 9 Oct 2012 12:03:00 -0700 >> Greg KH wrote: >> >> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote: >> > > On Sat, 6 Oct 2012 23:56:33 +0400 >> > > Andrew Vagin wrote: >> > > >> > > > Here is a stack trace of recursion: >> > > > free_pid_ns(parent) >> > > > put_pid_ns(parent) >> > > > kref_put(>kref, free_pid_ns); >> > > > free_pid_ns >> > > > >> > > > This patch turns recursion into loops. >> > > > >> > > > pidns can be nested many times, so in case of recursion >> > > > a simple user space program can provoke a kernel panic >> > > > due to exceed of a kernel stack. >> > > >> > > So we should backport this into earlier kernels. >> > > >> > > > --- a/include/linux/kref.h >> > > > +++ b/include/linux/kref.h >> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void >> > > > (*release)(struct kref *kref) >> > > > return kref_sub(kref, 1, release); >> > > > } >> > > > >> > > > +/** >> > > > + * kref_put - decrement refcount for object. >> > > > + * @kref: object. >> > > > + * >> > > > + * Decrement the refcount. >> > > > + * Return 1 if refcount is zero. >> > > > + */ >> > > > +static inline int __kref_put(struct kref *kref) >> > > > +{ >> > > > + return atomic_dec_and_test(>refcount); >> > > > +} >> > > >> > > Greg might be interested in this. >> > > >> > > It's a pretty specialised thing and perhaps it needs some stern words >> > > in the description explaining when and why it should and shouldn't be >> > > used. >> > > >> > > I wonder if people might (ab)use this to avoid the "doesn't >> > > have a release function" warning. >> > >> > Yes they would, please don't do this at all. >> > >> > In fact, why is it needed? It doesn't solve anything (if it does, >> > something in the way the kref is being used is wrong.) >> > >> >> It's right there in the changelog. The patch fixes deep >> kref_put->release->kref_put recursion by turning the operation for >> pidns into a loop. > > But why would a kref release function ever decrement the same kref > again causing a loop in the first place? > This should be kref_put ns->parent recursionly, put_pid_ns(ns) -> free_pid_ns(ns) -> put_pid_ns(ns->parent)->... If users create many nested namespaces, the recursion put_pid_ns() will exausted the kernel stack. > That's what I was referring to. This strongly sounds like a problem in > how the kref is being used, not in the kref code itself. > > Is a kref even the correct thing here? Can we fix this by this way? free_pid_ns just release ns itself, we check the return value of kref_put, if kref_put returns 1, means ns->kref is removed, then we kref_put(ns->parent). diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 00474b0..2168535 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd); static inline void put_pid_ns(struct pid_namespace *ns) { - if (ns != _pid_ns) - kref_put(>kref, free_pid_ns); + int ret; + struct pid_namespace *parent + + while (ns != _pid_ns) { + parent = ns->parent; + ret = kref_put(>kref, free_pid_ns); + if (!ret) + break; + else ns = parent; + } } #else /* !CONFIG_PID_NS */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 478bad2..85ca23a 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *old void free_pid_ns(struct kref *kref) { - struct pid_namespace *ns, *parent; + struct pid_namespace *ns; ns = container_of(kref, struct pid_namespace, kref); - parent = ns->parent; destroy_pid_namespace(ns); - - if (parent != NULL) - put_pid_ns(parent); } EXPORT_SYMBOL_GPL(free_pid_ns); > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)
On Wed, Oct 10, 2012 at 3:49 PM, Greg KH g...@kroah.com wrote: On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote: On Tue, 9 Oct 2012 12:03:00 -0700 Greg KH g...@kroah.com wrote: On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote: On Sat, 6 Oct 2012 23:56:33 +0400 Andrew Vagin ava...@openvz.org wrote: Here is a stack trace of recursion: free_pid_ns(parent) put_pid_ns(parent) kref_put(ns-kref, free_pid_ns); free_pid_ns This patch turns recursion into loops. pidns can be nested many times, so in case of recursion a simple user space program can provoke a kernel panic due to exceed of a kernel stack. So we should backport this into earlier kernels. --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref) return kref_sub(kref, 1, release); } +/** + * kref_put - decrement refcount for object. + * @kref: object. + * + * Decrement the refcount. + * Return 1 if refcount is zero. + */ +static inline int __kref_put(struct kref *kref) +{ + return atomic_dec_and_test(kref-refcount); +} Greg might be interested in this. It's a pretty specialised thing and perhaps it needs some stern words in the description explaining when and why it should and shouldn't be used. I wonder if people might (ab)use this to avoid the doesn't have a release function warning. Yes they would, please don't do this at all. In fact, why is it needed? It doesn't solve anything (if it does, something in the way the kref is being used is wrong.) It's right there in the changelog. The patch fixes deep kref_put-release-kref_put recursion by turning the operation for pidns into a loop. But why would a kref release function ever decrement the same kref again causing a loop in the first place? This should be kref_put ns-parent recursionly, put_pid_ns(ns) - free_pid_ns(ns) - put_pid_ns(ns-parent)-... If users create many nested namespaces, the recursion put_pid_ns() will exausted the kernel stack. That's what I was referring to. This strongly sounds like a problem in how the kref is being used, not in the kref code itself. Is a kref even the correct thing here? Can we fix this by this way? free_pid_ns just release ns itself, we check the return value of kref_put, if kref_put returns 1, means ns-kref is removed, then we kref_put(ns-parent). diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 00474b0..2168535 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd); static inline void put_pid_ns(struct pid_namespace *ns) { - if (ns != init_pid_ns) - kref_put(ns-kref, free_pid_ns); + int ret; + struct pid_namespace *parent + + while (ns != init_pid_ns) { + parent = ns-parent; + ret = kref_put(ns-kref, free_pid_ns); + if (!ret) + break; + else ns = parent; + } } #else /* !CONFIG_PID_NS */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 478bad2..85ca23a 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *old void free_pid_ns(struct kref *kref) { - struct pid_namespace *ns, *parent; + struct pid_namespace *ns; ns = container_of(kref, struct pid_namespace, kref); - parent = ns-parent; destroy_pid_namespace(ns); - - if (parent != NULL) - put_pid_ns(parent); } EXPORT_SYMBOL_GPL(free_pid_ns); greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)
On Wed, Oct 10, 2012 at 5:12 PM, Xiaotian Feng xtf...@gmail.com wrote: On Wed, Oct 10, 2012 at 3:49 PM, Greg KH g...@kroah.com wrote: On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote: On Tue, 9 Oct 2012 12:03:00 -0700 Greg KH g...@kroah.com wrote: On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote: On Sat, 6 Oct 2012 23:56:33 +0400 Andrew Vagin ava...@openvz.org wrote: Here is a stack trace of recursion: free_pid_ns(parent) put_pid_ns(parent) kref_put(ns-kref, free_pid_ns); free_pid_ns This patch turns recursion into loops. pidns can be nested many times, so in case of recursion a simple user space program can provoke a kernel panic due to exceed of a kernel stack. So we should backport this into earlier kernels. --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref) return kref_sub(kref, 1, release); } +/** + * kref_put - decrement refcount for object. + * @kref: object. + * + * Decrement the refcount. + * Return 1 if refcount is zero. + */ +static inline int __kref_put(struct kref *kref) +{ + return atomic_dec_and_test(kref-refcount); +} Greg might be interested in this. It's a pretty specialised thing and perhaps it needs some stern words in the description explaining when and why it should and shouldn't be used. I wonder if people might (ab)use this to avoid the doesn't have a release function warning. Yes they would, please don't do this at all. In fact, why is it needed? It doesn't solve anything (if it does, something in the way the kref is being used is wrong.) It's right there in the changelog. The patch fixes deep kref_put-release-kref_put recursion by turning the operation for pidns into a loop. But why would a kref release function ever decrement the same kref again causing a loop in the first place? This should be kref_put ns-parent recursionly, put_pid_ns(ns) - free_pid_ns(ns) - put_pid_ns(ns-parent)-... If users create many nested namespaces, the recursion put_pid_ns() will exausted the kernel stack. That's what I was referring to. This strongly sounds like a problem in how the kref is being used, not in the kref code itself. Is a kref even the correct thing here? Can we fix this by this way? free_pid_ns just release ns itself, we check the return value of kref_put, if kref_put returns 1, means ns-kref is removed, then we kref_put(ns-parent). diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 00474b0..2168535 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd); static inline void put_pid_ns(struct pid_namespace *ns) { - if (ns != init_pid_ns) - kref_put(ns-kref, free_pid_ns); + int ret; + struct pid_namespace *parent miss a ; here + + while (ns != init_pid_ns) { + parent = ns-parent; + ret = kref_put(ns-kref, free_pid_ns); + if (!ret) This line should be if (!ret | !parent) + break; + else ns = parent; + } } #else /* !CONFIG_PID_NS */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 478bad2..85ca23a 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *old void free_pid_ns(struct kref *kref) { - struct pid_namespace *ns, *parent; + struct pid_namespace *ns; ns = container_of(kref, struct pid_namespace, kref); - parent = ns-parent; destroy_pid_namespace(ns); - - if (parent != NULL) - put_pid_ns(parent); } EXPORT_SYMBOL_GPL(free_pid_ns); greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/