Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-16 Thread James Bottomley
On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:
 On 13-10-08 02:44 AM, vaughan wrote:
  Hi Madper,
 
  CC to Douglas to get comments.
  I use the rw_semaphore o_sem to protect excl open, introduced in commit
  15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
  Is it forbidden to do like that in kernel?...
 
 It appears you can not (allow sg_open() to hold a semaphore
 then return to the user space). So you will need to do some
 rework on that patch or revert it.

OK, there being no reply on this, I'll do the revert ... that's all four
patches, correct?

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-16 Thread Douglas Gilbert

On 13-10-16 09:24 AM, James Bottomley wrote:

On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:

On 13-10-08 02:44 AM, vaughan wrote:

Hi Madper,

CC to Douglas to get comments.
I use the rw_semaphore o_sem to protect excl open, introduced in commit
15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
Is it forbidden to do like that in kernel?...


It appears you can not (allow sg_open() to hold a semaphore
then return to the user space). So you will need to do some
rework on that patch or revert it.


OK, there being no reply on this, I'll do the revert ... that's all four
patches, correct?


That seems to be the case. Vaughan acknowledged the
problem and forwarded it to me 8 days ago. Yes, it
seems to be a no-no to hold a any kernel semaphore
when returning to the user space; in this case from
sg_open(). I was hoping a revised patch might
appear from Vaughan but to date that has not been
the case. So with only a few weeks to go before
lk 3.12 is released, reverting the whole 4 patches
in that series seems to be the safest course.

Also without a new patch from Vaughan in the next few
weeks he may also miss the opportunity of getting
his improved O_EXCL logic into the lk 3.13 series.


Thinking about how to solve this problem: a field could
be added to 'struct sg_device' with one of three states:
no_opens, non_excl_opens and excl_open. It could be
manipulated by sg_open() and sg_release() like a
read-write semaphore. And the faulty 'struct
rw_semaphore o_sem' in sg_device could be replaced by a
normal semaphore to protect the manipulation of the new
three-state field.

Doug Gilbert


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-08 Thread Douglas Gilbert

On 13-10-08 02:44 AM, vaughan wrote:

Hi Madper,

CC to Douglas to get comments.
I use the rw_semaphore o_sem to protect excl open, introduced in commit
15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
Is it forbidden to do like that in kernel?...


It appears you can not (allow sg_open() to hold a semaphore
then return to the user space). So you will need to do some
rework on that patch or revert it.

Doug Gilbert

Reference: scsi-linux + kernel lists, title:
  [PATCH v6 0/4][SCSI] sg: fix race condition in sg_open
  20130828


On 10/08/2013 01:57 PM, Madper Xie wrote:

Howdy Vaughan Cao,
I can't meet this issue on both 3.11 and 3.11.4. There are only four
patches between 3.11 and 3.12-rc2 and you are the author. Will you
please check them if you have time.

c...@redhat.com writes:


Hi all,
With kernel3.12-rc2 the dmesg shows following logs:
[   12.864680] 
[   12.864681] [ BUG: lock held when returning to user space! ]
[   12.864682] 3.12.0-rc2 #1 Not tainted
[   12.864683] 
[   12.864684] iprinit/719 is leaving the kernel with locks still held!
[   12.864685] 1 lock held by iprinit/719:
[   12.864686]  #0:  (sdp-o_sem){.+.+..}, at: [a050de05] 
sg_open+0x4b5/0x644 [sg]
[   12.934954] ath9k :01:00.0: enabling device ( - 0002)
[   12.940346] ath: phy0: timeout (1000 us) on reg 0x15f18: 0x  
0x0007 != 0x0004
[   12.943125] ath: EEPROM regdomain: 0x60
[   12.943127] ath: EEPROM indicates we should expect a direct regpair map
[   12.943129] ath: Country alpha2 being used: 00
[   12.943130] ath: Regpair used: 0x60
[   12.960202] r8169 :02:00.0 p3p1: link down
[   12.960236] r8169 :02:00.0 p3p1: link down
[   12.960256] IPv6: ADDRCONF(NETDEV_UP): p3p1: link is not ready
[   13.003523] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   13.003886] ieee80211 phy0: Atheros AR9485 Rev:1 mem=0xc9000bc8, 
irq=16
[   13.012120] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   13.023667] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   13.055802] Ebtables v2.0 registered
[   13.192291] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   15.906392] r8169 :02:00.0 p3p1: link up
[   15.906416] IPv6: ADDRCONF(NETDEV_CHANGE): p3p1: link becomes ready
[   17.121989] systemd-udevd (334) used greatest stack depth: 3352 bytes left

I'm working on finding which version bring this bug in.







--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html