Re: [Bug] 12.864681 BUG: lock held when returning to user space!
On 10/17/2013 06:41 AM, Douglas Gilbert wrote: > That seems to be the case. Vaughan acknowledged the > problem and forwarded it to me 8 days ago. Yes, it > seems to be a "no-no" to hold a any kernel semaphore > when returning to the user space; in this case from > sg_open(). I was hoping a revised patch might > appear from Vaughan but to date that has not been > the case. So with only a few weeks to go before > lk 3.12 is released, reverting the whole 4 patches > in that series seems to be the safest course. > > Also without a new patch from Vaughan in the next few > weeks he may also miss the opportunity of getting > his improved O_EXCL logic into the lk 3.13 series. > > > Thinking about how to solve this problem: a field could > be added to 'struct sg_device' with one of three states: > no_opens, non_excl_opens and excl_open. It could be > manipulated by sg_open() and sg_release() like a > read-write semaphore. And the faulty 'struct > rw_semaphore o_sem' in sg_device could be replaced by a > normal semaphore to protect the manipulation of the new > three-state field. > And the new three-state field would replace (or expand) the 'char exclude' field in struct sg_device . > > Doug Gilbert Hi Doug, Thanks for providing advice on how to fix this. However, it seems be still awkward somehow. We have to 1. hold a lock (maybe sg_index_lock or a new one) 2. check a) the new three-state field; b) if sfp list is empty; c) sdp->detached field; if either condition fails, we may link the open process into o_excl_wait queue and need wakeup. if satisfied, we go on. 3. then we release at least sg_index_lock to malloc a new sfp and initialize. 4. try to acquire sg_index_lock again and add this sfp into sfd_siblings list if possible. <== We still have to check at least sdp->detached field 5. update three-state field to reflect the result of Step 4, and wake up processes waiting in o_excl_wait. This uncomfortable is introduced by releasing the sg_index_lock in the middle of check->malloc->add the new sfp struct. I wanna ask if it is possible to split the sg_add_sfp() into two functions, sg_init_sfp() and sg_add_sfp2(). We can do all initialize work in sg_init_sfp() without any lock and let sg_add_sfp2() only serve lock-check-add in one way. It seems more convenient for me to understand. But there is still some questions on this approach: 1. memory consume can be very large if lots of sg_init_sfp in the same time; 2. some field are initialized according to the fields of scsi device sdp points to, such as low_dma, sg_tablesize, max_sector, phys_segs. I know scsi_device_get() would keep the underlying scsi_device alive, however would these fields change in the gap of our initialize and add? The relationship of sg_device and scsi_device like above said confuse me somehow... Thanks, Vaughan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] 12.864681 BUG: lock held when returning to user space!
On 13-10-16 09:24 AM, James Bottomley wrote: On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote: On 13-10-08 02:44 AM, vaughan wrote: Hi Madper, CC to Douglas to get comments. I use the rw_semaphore o_sem to protect excl open, introduced in commit 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. Is it forbidden to do like that in kernel?... It appears you can not (allow sg_open() to hold a semaphore then return to the user space). So you will need to do some rework on that patch or revert it. OK, there being no reply on this, I'll do the revert ... that's all four patches, correct? That seems to be the case. Vaughan acknowledged the problem and forwarded it to me 8 days ago. Yes, it seems to be a "no-no" to hold a any kernel semaphore when returning to the user space; in this case from sg_open(). I was hoping a revised patch might appear from Vaughan but to date that has not been the case. So with only a few weeks to go before lk 3.12 is released, reverting the whole 4 patches in that series seems to be the safest course. Also without a new patch from Vaughan in the next few weeks he may also miss the opportunity of getting his improved O_EXCL logic into the lk 3.13 series. Thinking about how to solve this problem: a field could be added to 'struct sg_device' with one of three states: no_opens, non_excl_opens and excl_open. It could be manipulated by sg_open() and sg_release() like a read-write semaphore. And the faulty 'struct rw_semaphore o_sem' in sg_device could be replaced by a normal semaphore to protect the manipulation of the new three-state field. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] 12.864681 BUG: lock held when returning to user space!
On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote: > On 13-10-08 02:44 AM, vaughan wrote: > > Hi Madper, > > > > CC to Douglas to get comments. > > I use the rw_semaphore o_sem to protect excl open, introduced in commit > > 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. > > Is it forbidden to do like that in kernel?... > > It appears you can not (allow sg_open() to hold a semaphore > then return to the user space). So you will need to do some > rework on that patch or revert it. OK, there being no reply on this, I'll do the revert ... that's all four patches, correct? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] 12.864681 BUG: lock held when returning to user space!
On 13-10-08 02:44 AM, vaughan wrote: Hi Madper, CC to Douglas to get comments. I use the rw_semaphore o_sem to protect excl open, introduced in commit 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. Is it forbidden to do like that in kernel?... It appears you can not (allow sg_open() to hold a semaphore then return to the user space). So you will need to do some rework on that patch or revert it. Doug Gilbert Reference: scsi-linux + kernel lists, title: [PATCH v6 0/4][SCSI] sg: fix race condition in sg_open 20130828 On 10/08/2013 01:57 PM, Madper Xie wrote: Howdy Vaughan Cao, I can't meet this issue on both 3.11 and 3.11.4. There are only four patches between 3.11 and 3.12-rc2 and you are the author. Will you please check them if you have time. c...@redhat.com writes: Hi all, With kernel3.12-rc2 the dmesg shows following logs: [ 12.864680] [ 12.864681] [ BUG: lock held when returning to user space! ] [ 12.864682] 3.12.0-rc2 #1 Not tainted [ 12.864683] [ 12.864684] iprinit/719 is leaving the kernel with locks still held! [ 12.864685] 1 lock held by iprinit/719: [ 12.864686] #0: (&sdp->o_sem){.+.+..}, at: [] sg_open+0x4b5/0x644 [sg] [ 12.934954] ath9k :01:00.0: enabling device ( -> 0002) [ 12.940346] ath: phy0: timeout (1000 us) on reg 0x15f18: 0x & 0x0007 != 0x0004 [ 12.943125] ath: EEPROM regdomain: 0x60 [ 12.943127] ath: EEPROM indicates we should expect a direct regpair map [ 12.943129] ath: Country alpha2 being used: 00 [ 12.943130] ath: Regpair used: 0x60 [ 12.960202] r8169 :02:00.0 p3p1: link down [ 12.960236] r8169 :02:00.0 p3p1: link down [ 12.960256] IPv6: ADDRCONF(NETDEV_UP): p3p1: link is not ready [ 13.003523] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' [ 13.003886] ieee80211 phy0: Atheros AR9485 Rev:1 mem=0xc9000bc8, irq=16 [ 13.012120] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 13.023667] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [ 13.055802] Ebtables v2.0 registered [ 13.192291] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 15.906392] r8169 :02:00.0 p3p1: link up [ 15.906416] IPv6: ADDRCONF(NETDEV_CHANGE): p3p1: link becomes ready [ 17.121989] systemd-udevd (334) used greatest stack depth: 3352 bytes left I'm working on finding which version bring this bug in. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html