Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-16 Thread vaughan
On 10/17/2013 06:41 AM, Douglas Gilbert wrote:
> That seems to be the case. Vaughan acknowledged the
> problem and forwarded it to me 8 days ago. Yes, it
> seems to be a "no-no" to hold a any kernel semaphore
> when returning to the user space; in this case from
> sg_open(). I was hoping a revised patch might
> appear from Vaughan but to date that has not been
> the case. So with only a few weeks to go before
> lk 3.12 is released, reverting the whole 4 patches
> in that series seems to be the safest course.
>
> Also without a new patch from Vaughan in the next few
> weeks he may also miss the opportunity of getting
> his improved O_EXCL logic into the lk 3.13 series.
>
>
> Thinking about how to solve this problem: a field could
> be added to 'struct sg_device' with one of three states:
> no_opens, non_excl_opens and excl_open. It could be
> manipulated by sg_open() and sg_release() like a
> read-write semaphore. And the faulty 'struct
> rw_semaphore o_sem' in sg_device could be replaced by a
> normal semaphore to protect the manipulation of the new
> three-state field.
> And the new three-state field would replace (or expand)
the 'char exclude' field in struct sg_device .
>
> Doug Gilbert
Hi Doug,

Thanks for providing advice on how to fix this.
However, it seems be still awkward somehow. We have to
1. hold a lock (maybe sg_index_lock or a new one)
2. check
   a) the new three-state field;
   b) if sfp list is empty;
   c) sdp->detached field;
if either condition fails, we may link the open process into o_excl_wait
queue and need wakeup.
if satisfied, we go on.
3. then we release at least sg_index_lock to malloc a new sfp and
initialize.
4. try to acquire sg_index_lock again and add this sfp into sfd_siblings
list if possible.  <== We still have to check at least sdp->detached field
5. update three-state field to reflect the result of Step 4, and wake up
processes waiting in o_excl_wait.

This uncomfortable is introduced by releasing the sg_index_lock in the
middle of check->malloc->add the new sfp struct.

I wanna ask if it is possible to split the sg_add_sfp() into two
functions, sg_init_sfp() and sg_add_sfp2(). We can do all initialize
work in sg_init_sfp()
without any lock and let sg_add_sfp2() only serve lock-check-add in one
way. It seems more convenient for me to understand.
But there is still some questions on this approach:
1. memory consume can be very large if lots of sg_init_sfp in the same time;
2. some field are initialized according to the fields of scsi device sdp
points to, such as low_dma, sg_tablesize, max_sector, phys_segs.
I know scsi_device_get() would keep the underlying scsi_device
alive, however would these fields change in the gap of our initialize
and add?
The relationship of sg_device and scsi_device like above said
confuse me somehow...

Thanks,
Vaughan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-16 Thread Douglas Gilbert

On 13-10-16 09:24 AM, James Bottomley wrote:

On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:

On 13-10-08 02:44 AM, vaughan wrote:

Hi Madper,

CC to Douglas to get comments.
I use the rw_semaphore o_sem to protect excl open, introduced in commit
15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
Is it forbidden to do like that in kernel?...


It appears you can not (allow sg_open() to hold a semaphore
then return to the user space). So you will need to do some
rework on that patch or revert it.


OK, there being no reply on this, I'll do the revert ... that's all four
patches, correct?


That seems to be the case. Vaughan acknowledged the
problem and forwarded it to me 8 days ago. Yes, it
seems to be a "no-no" to hold a any kernel semaphore
when returning to the user space; in this case from
sg_open(). I was hoping a revised patch might
appear from Vaughan but to date that has not been
the case. So with only a few weeks to go before
lk 3.12 is released, reverting the whole 4 patches
in that series seems to be the safest course.

Also without a new patch from Vaughan in the next few
weeks he may also miss the opportunity of getting
his improved O_EXCL logic into the lk 3.13 series.


Thinking about how to solve this problem: a field could
be added to 'struct sg_device' with one of three states:
no_opens, non_excl_opens and excl_open. It could be
manipulated by sg_open() and sg_release() like a
read-write semaphore. And the faulty 'struct
rw_semaphore o_sem' in sg_device could be replaced by a
normal semaphore to protect the manipulation of the new
three-state field.

Doug Gilbert


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-16 Thread James Bottomley
On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:
> On 13-10-08 02:44 AM, vaughan wrote:
> > Hi Madper,
> >
> > CC to Douglas to get comments.
> > I use the rw_semaphore o_sem to protect excl open, introduced in commit
> > 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
> > Is it forbidden to do like that in kernel?...
> 
> It appears you can not (allow sg_open() to hold a semaphore
> then return to the user space). So you will need to do some
> rework on that patch or revert it.

OK, there being no reply on this, I'll do the revert ... that's all four
patches, correct?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] 12.864681 BUG: lock held when returning to user space!

2013-10-08 Thread Douglas Gilbert

On 13-10-08 02:44 AM, vaughan wrote:

Hi Madper,

CC to Douglas to get comments.
I use the rw_semaphore o_sem to protect excl open, introduced in commit
15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
Is it forbidden to do like that in kernel?...


It appears you can not (allow sg_open() to hold a semaphore
then return to the user space). So you will need to do some
rework on that patch or revert it.

Doug Gilbert

Reference: scsi-linux + kernel lists, title:
  [PATCH v6 0/4][SCSI] sg: fix race condition in sg_open
  20130828


On 10/08/2013 01:57 PM, Madper Xie wrote:

Howdy Vaughan Cao,
I can't meet this issue on both 3.11 and 3.11.4. There are only four
patches between 3.11 and 3.12-rc2 and you are the author. Will you
please check them if you have time.

c...@redhat.com writes:


Hi all,
With kernel3.12-rc2 the dmesg shows following logs:
[   12.864680] 
[   12.864681] [ BUG: lock held when returning to user space! ]
[   12.864682] 3.12.0-rc2 #1 Not tainted
[   12.864683] 
[   12.864684] iprinit/719 is leaving the kernel with locks still held!
[   12.864685] 1 lock held by iprinit/719:
[   12.864686]  #0:  (&sdp->o_sem){.+.+..}, at: [] 
sg_open+0x4b5/0x644 [sg]
[   12.934954] ath9k :01:00.0: enabling device ( -> 0002)
[   12.940346] ath: phy0: timeout (1000 us) on reg 0x15f18: 0x & 
0x0007 != 0x0004
[   12.943125] ath: EEPROM regdomain: 0x60
[   12.943127] ath: EEPROM indicates we should expect a direct regpair map
[   12.943129] ath: Country alpha2 being used: 00
[   12.943130] ath: Regpair used: 0x60
[   12.960202] r8169 :02:00.0 p3p1: link down
[   12.960236] r8169 :02:00.0 p3p1: link down
[   12.960256] IPv6: ADDRCONF(NETDEV_UP): p3p1: link is not ready
[   13.003523] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   13.003886] ieee80211 phy0: Atheros AR9485 Rev:1 mem=0xc9000bc8, 
irq=16
[   13.012120] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   13.023667] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   13.055802] Ebtables v2.0 registered
[   13.192291] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   15.906392] r8169 :02:00.0 p3p1: link up
[   15.906416] IPv6: ADDRCONF(NETDEV_CHANGE): p3p1: link becomes ready
[   17.121989] systemd-udevd (334) used greatest stack depth: 3352 bytes left

I'm working on finding which version bring this bug in.







--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html