Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-25 Thread Oliver Neukum
On Mon, 2015-08-24 at 14:21 -0400, Alan Stern wrote:
> > In theory, an architecture could implement atomic bit operations
> using 
> > a spinlock to insure atomicity.  I don't know if any architectures
> do 
> > this, but if they do then the scenario above could arise.
> 
> Now that I see this in writing, I realize it's not possible after
> all.  
> clear_bit() et al. will work with a single unsigned long, which
> doesn't
> leave any place for spinlocks or other mechanisms.  I was thinking of 
> atomic_t.

Refuting yourself you are making the assumption that the lock has
to be inside the data structure. That is not true.

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-25 Thread Oliver Neukum
On Mon, 2015-08-24 at 15:29 +0200, Bjørn Mork wrote:
> Eugene Shatokhin  writes:
> 
> > 19.08.2015 15:31, Bjørn Mork пишет:
> >> Eugene Shatokhin  writes:

> >> Stopping the tasklet rescheduling etc depends only on netif_running(),
> >> which will be false when usbnet_stop is called.  There is no need to
> >> touch dev->flags for this to happen.
> >
> > That was one of the first ideas we discussed here. Unfortunately, it
> > is probably not so simple.
> >
> > Setting dev->flags to 0 makes some delayed operations do nothing and,
> > among other things, not to reschedule usbnet_bh().
> 
> Yes, but I believe that is merely a side effect.  You should never need
> to clear multiple flags to get the desired behaviour.

Why? Is there any reason you cannot have a TX and an RX halt at the same
time?

> > As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called
> > as a tasklet function and as a timer function in a number of
> > situations (look for the usage of dev->bh and dev->delay there).
> >
> > netif_running() is indeed false when usbnet_stop() runs, usbnet_stop()
> > also disables Tx. This seems to be enough for many cases where
> > usbnet_bh() is scheduled, but I am not so sure about the remaining
> > ones, namely:
> >
> > 1. A work function, usbnet_deferred_kevent(), may reschedule
> > usbnet_bh(). Looks like the workqueue is only stopped in
> > usbnet_disconnect(), so a work item might be processed while
> > usbnet_stop() works. Setting dev->flags to 0 makes the work function
> > do nothing, by the way. See also the comment in usbnet_stop() about
> > this.

Yes, this is the main reason the flags are collectively cleared.
We could do them all with clear_bit(). Ugly though.

> > A work item may be placed to this workqueue in a number of ways, by
> > both usbnet module and the mini-drivers. It is not too easy to track
> > all these situations.
> 
> That's an understatement :)

Yes.

> So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls
> to usbnet_status_start() and usbnet_status_stop().  This will require
> testing on some of the devices with the original firmware problem
> however.

And there you point out the main problem.

> In any case: I do not think this flag should be considered when trying
> to make usbnet_stop behaviour saner.  It's only purpose is to
> deliberately break usbnet_stop by not actually stopping.

Yes.

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread David Miller
From: Alan Stern 
Date: Mon, 24 Aug 2015 14:06:15 -0400 (EDT)

> On Mon, 24 Aug 2015, David Miller wrote:
>> Atomic operations like clear_bit also will behave that way.
> 
> Are you certain about that?  I couldn't find any mention of it in
> Documentation/atomic_ops.txt.
> 
> In theory, an architecture could implement atomic bit operations using 
> a spinlock to insure atomicity.  I don't know if any architectures do 
> this, but if they do then the scenario above could arise.

Indeed, we do have platforms like 32-bit sparc and parisc that do this.

So, taking that into consideration, this is a bit unfortunate and on
such platforms we do have this problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, Alan Stern wrote:

> On Mon, 24 Aug 2015, David Miller wrote:
> 
> > From: Eugene Shatokhin 
> > Date: Wed, 19 Aug 2015 14:59:01 +0300
> > 
> > > So the following might be possible, although unlikely:
> > > 
> > > CPU0 CPU1
> > >  clear_bit: read dev->flags
> > >  clear_bit: clear EVENT_RX_KILL in the read value
> > > 
> > > dev->flags=0;
> > > 
> > >  clear_bit: write updated dev->flags
> > > 
> > > As a result, dev->flags may become non-zero again.
> > 
> > Is this really possible?
> > 
> > Stores really are "atomic" in the sense that the do their update
> > in one indivisible operation.
> 
> Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
> call it now.
> 
> > Atomic operations like clear_bit also will behave that way.
> 
> Are you certain about that?  I couldn't find any mention of it in
> Documentation/atomic_ops.txt.
> 
> In theory, an architecture could implement atomic bit operations using 
> a spinlock to insure atomicity.  I don't know if any architectures do 
> this, but if they do then the scenario above could arise.

Now that I see this in writing, I realize it's not possible after all.  
clear_bit() et al. will work with a single unsigned long, which doesn't
leave any place for spinlocks or other mechanisms.  I was thinking of 
atomic_t.

So never mind...

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin

24.08.2015 20:43, David Miller пишет:

From: Eugene Shatokhin 
Date: Wed, 19 Aug 2015 14:59:01 +0300


So the following might be possible, although unlikely:

CPU0 CPU1
  clear_bit: read dev->flags
  clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

  clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.


Is this really possible?


On x86, it is not possible, so this is not a problem. Perhaps, for ARM 
too. As for the other architectures supported by the kernel - not sure, 
no common guarantees, it seems. Anyway, this is not a critical issue, I 
agree.


OK, let us leave things as they are for this one and fix the rest.



Stores really are "atomic" in the sense that the do their update
in one indivisible operation.

Atomic operations like clear_bit also will behave that way.

If a clear_bit is in progress, the "dev->flags=0" store will not be
able to grab the cache line exclusively until the clear_bit is done.

So I think the above sequent of events is completely impossible.  Once
a clear_bit starts, a write by another foreign agent on the bus is
absolutely impossible to legally occur until the clear_bit completes.

I think this is a non-issue.



Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, David Miller wrote:

> From: Eugene Shatokhin 
> Date: Wed, 19 Aug 2015 14:59:01 +0300
> 
> > So the following might be possible, although unlikely:
> > 
> > CPU0 CPU1
> >  clear_bit: read dev->flags
> >  clear_bit: clear EVENT_RX_KILL in the read value
> > 
> > dev->flags=0;
> > 
> >  clear_bit: write updated dev->flags
> > 
> > As a result, dev->flags may become non-zero again.
> 
> Is this really possible?
> 
> Stores really are "atomic" in the sense that the do their update
> in one indivisible operation.

Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
call it now.

> Atomic operations like clear_bit also will behave that way.

Are you certain about that?  I couldn't find any mention of it in
Documentation/atomic_ops.txt.

In theory, an architecture could implement atomic bit operations using 
a spinlock to insure atomicity.  I don't know if any architectures do 
this, but if they do then the scenario above could arise.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread David Miller
From: Eugene Shatokhin 
Date: Wed, 19 Aug 2015 14:59:01 +0300

> So the following might be possible, although unlikely:
> 
> CPU0 CPU1
>  clear_bit: read dev->flags
>  clear_bit: clear EVENT_RX_KILL in the read value
> 
> dev->flags=0;
> 
>  clear_bit: write updated dev->flags
> 
> As a result, dev->flags may become non-zero again.

Is this really possible?

Stores really are "atomic" in the sense that the do their update
in one indivisible operation.

Atomic operations like clear_bit also will behave that way.

If a clear_bit is in progress, the "dev->flags=0" store will not be
able to grab the cache line exclusively until the clear_bit is done.

So I think the above sequent of events is completely impossible.  Once
a clear_bit starts, a write by another foreign agent on the bus is
absolutely impossible to legally occur until the clear_bit completes.

I think this is a non-issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin

24.08.2015 16:29, Bjørn Mork пишет:

Eugene Shatokhin  writes:


19.08.2015 15:31, Bjørn Mork пишет:

Eugene Shatokhin  writes:


The problem is not in the reordering but rather in the fact that
"dev->flags = 0" is not necessarily atomic
w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.

So the following might be possible, although unlikely:

CPU0 CPU1
   clear_bit: read dev->flags
   clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

   clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.


Ah, right.  Thanks for explaining.


I cannot prove yet that this is an impossible situation. If anyone
can, please explain. If so, this part of the patch will not be needed.


I wonder if we could simply move the dev->flags = 0 down a few lines to
fix both issues?  It doesn't seem to do anything useful except for
resetting the flags to a sane initial state after the device is down.

Stopping the tasklet rescheduling etc depends only on netif_running(),
which will be false when usbnet_stop is called.  There is no need to
touch dev->flags for this to happen.


That was one of the first ideas we discussed here. Unfortunately, it
is probably not so simple.

Setting dev->flags to 0 makes some delayed operations do nothing and,
among other things, not to reschedule usbnet_bh().


Yes, but I believe that is merely a side effect.  You should never need
to clear multiple flags to get the desired behaviour.


As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called
as a tasklet function and as a timer function in a number of
situations (look for the usage of dev->bh and dev->delay there).

netif_running() is indeed false when usbnet_stop() runs, usbnet_stop()
also disables Tx. This seems to be enough for many cases where
usbnet_bh() is scheduled, but I am not so sure about the remaining
ones, namely:

1. A work function, usbnet_deferred_kevent(), may reschedule
usbnet_bh(). Looks like the workqueue is only stopped in
usbnet_disconnect(), so a work item might be processed while
usbnet_stop() works. Setting dev->flags to 0 makes the work function
do nothing, by the way. See also the comment in usbnet_stop() about
this.

A work item may be placed to this workqueue in a number of ways, by
both usbnet module and the mini-drivers. It is not too easy to track
all these situations.


That's an understatement :)




2. rx_complete() and tx_complete() may schedule execution of
usbnet_bh() as a tasklet or a timer function. These two are URB
completion callbacks.

It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop()
clears dev->flags, indeed. But it does not prevent the completion
handlers for the previously submitted URBs from running concurrently
with usbnet_stop(). The latter waits for them to complete (via
usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not
set in info->flags. rndis_wlan, however, sets this flag for a few
hardware models. So - no guarantees here as well.


FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer
ability to keep the status urb active. I believe that must have been the
real reason for adding it, based on the commit message and the effect
the flag will have:

  commit 1487cd5e76337555737cbc55d7d83f41460d198f
  Author: Jussi Kivilinna 
  Date:   Thu Jul 30 19:41:20 2009 +0300

 usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop

 rndis_wlan devices freeze after running usbnet_stop several times. It 
appears
 that firmware freezes in state where it does not respond to any RNDIS 
commands
 and device have to be physically unplugged/replugged. This patch lets
 minidrivers to disable unlink_urbs on usbnet_stop through new info flag.

 Signed-off-by: Jussi Kivilinna 
 Cc: David Brownell 
 Signed-off-by: John W. Linville 



The rx urbs will not be resubmitted in any case, and there are of course
no tx urbs being submitted.  So the only effect of this flag is on the
status/interrupt urb, which I can imagine some RNDIS devices wants
active all the time.

So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls
to usbnet_status_start() and usbnet_status_stop().  This will require
testing on some of the devices with the original firmware problem
however.

In any case: I do not think this flag should be considered when trying
to make usbnet_stop behaviour saner.  It's only purpose is to
deliberately break usbnet_stop by not actually stopping.



If someone could list the particular bits of dev->flags that should be
cleared to make sure no deferred call could reschedule usbnet_bh(),
etc... Well, it would be enough to clear these first and use
dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I
cannot point out these particular bits now.



I don't think any of the flags must be cleared.  The sequence

 dev_close(dev->net);
usbnet_terminate_urbs(d

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Bjørn Mork
Eugene Shatokhin  writes:

> 19.08.2015 15:31, Bjørn Mork пишет:
>> Eugene Shatokhin  writes:
>>
>>> The problem is not in the reordering but rather in the fact that
>>> "dev->flags = 0" is not necessarily atomic
>>> w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.
>>>
>>> So the following might be possible, although unlikely:
>>>
>>> CPU0 CPU1
>>>   clear_bit: read dev->flags
>>>   clear_bit: clear EVENT_RX_KILL in the read value
>>>
>>> dev->flags=0;
>>>
>>>   clear_bit: write updated dev->flags
>>>
>>> As a result, dev->flags may become non-zero again.
>>
>> Ah, right.  Thanks for explaining.
>>
>>> I cannot prove yet that this is an impossible situation. If anyone
>>> can, please explain. If so, this part of the patch will not be needed.
>>
>> I wonder if we could simply move the dev->flags = 0 down a few lines to
>> fix both issues?  It doesn't seem to do anything useful except for
>> resetting the flags to a sane initial state after the device is down.
>>
>> Stopping the tasklet rescheduling etc depends only on netif_running(),
>> which will be false when usbnet_stop is called.  There is no need to
>> touch dev->flags for this to happen.
>
> That was one of the first ideas we discussed here. Unfortunately, it
> is probably not so simple.
>
> Setting dev->flags to 0 makes some delayed operations do nothing and,
> among other things, not to reschedule usbnet_bh().

Yes, but I believe that is merely a side effect.  You should never need
to clear multiple flags to get the desired behaviour.

> As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called
> as a tasklet function and as a timer function in a number of
> situations (look for the usage of dev->bh and dev->delay there).
>
> netif_running() is indeed false when usbnet_stop() runs, usbnet_stop()
> also disables Tx. This seems to be enough for many cases where
> usbnet_bh() is scheduled, but I am not so sure about the remaining
> ones, namely:
>
> 1. A work function, usbnet_deferred_kevent(), may reschedule
> usbnet_bh(). Looks like the workqueue is only stopped in
> usbnet_disconnect(), so a work item might be processed while
> usbnet_stop() works. Setting dev->flags to 0 makes the work function
> do nothing, by the way. See also the comment in usbnet_stop() about
> this.
>
> A work item may be placed to this workqueue in a number of ways, by
> both usbnet module and the mini-drivers. It is not too easy to track
> all these situations.

That's an understatement :)



> 2. rx_complete() and tx_complete() may schedule execution of
> usbnet_bh() as a tasklet or a timer function. These two are URB
> completion callbacks.
>
> It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop()
> clears dev->flags, indeed. But it does not prevent the completion
> handlers for the previously submitted URBs from running concurrently
> with usbnet_stop(). The latter waits for them to complete (via
> usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not
> set in info->flags. rndis_wlan, however, sets this flag for a few
> hardware models. So - no guarantees here as well.

FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer
ability to keep the status urb active. I believe that must have been the
real reason for adding it, based on the commit message and the effect
the flag will have:

 commit 1487cd5e76337555737cbc55d7d83f41460d198f
 Author: Jussi Kivilinna 
 Date:   Thu Jul 30 19:41:20 2009 +0300

usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop

rndis_wlan devices freeze after running usbnet_stop several times. It 
appears
that firmware freezes in state where it does not respond to any RNDIS 
commands
and device have to be physically unplugged/replugged. This patch lets
minidrivers to disable unlink_urbs on usbnet_stop through new info flag.

Signed-off-by: Jussi Kivilinna 
Cc: David Brownell 
Signed-off-by: John W. Linville 



The rx urbs will not be resubmitted in any case, and there are of course
no tx urbs being submitted.  So the only effect of this flag is on the
status/interrupt urb, which I can imagine some RNDIS devices wants
active all the time. 

So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls
to usbnet_status_start() and usbnet_status_stop().  This will require
testing on some of the devices with the original firmware problem
however.

In any case: I do not think this flag should be considered when trying
to make usbnet_stop behaviour saner.  It's only purpose is to
deliberately break usbnet_stop by not actually stopping.


> If someone could list the particular bits of dev->flags that should be
> cleared to make sure no deferred call could reschedule usbnet_bh(),
> etc... Well, it would be enough to clear these first and use
> dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I
> cannot point out these particular bits now.


I don'

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin

19.08.2015 15:31, Bjørn Mork пишет:

Eugene Shatokhin  writes:


The problem is not in the reordering but rather in the fact that
"dev->flags = 0" is not necessarily atomic
w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.

So the following might be possible, although unlikely:

CPU0 CPU1
  clear_bit: read dev->flags
  clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

  clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.


Ah, right.  Thanks for explaining.


I cannot prove yet that this is an impossible situation. If anyone
can, please explain. If so, this part of the patch will not be needed.


I wonder if we could simply move the dev->flags = 0 down a few lines to
fix both issues?  It doesn't seem to do anything useful except for
resetting the flags to a sane initial state after the device is down.

Stopping the tasklet rescheduling etc depends only on netif_running(),
which will be false when usbnet_stop is called.  There is no need to
touch dev->flags for this to happen.


That was one of the first ideas we discussed here. Unfortunately, it is 
probably not so simple.


Setting dev->flags to 0 makes some delayed operations do nothing and, 
among other things, not to reschedule usbnet_bh().


As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called as 
a tasklet function and as a timer function in a number of situations 
(look for the usage of dev->bh and dev->delay there).


netif_running() is indeed false when usbnet_stop() runs, usbnet_stop() 
also disables Tx. This seems to be enough for many cases where 
usbnet_bh() is scheduled, but I am not so sure about the remaining ones, 
namely:


1. A work function, usbnet_deferred_kevent(), may reschedule 
usbnet_bh(). Looks like the workqueue is only stopped in 
usbnet_disconnect(), so a work item might be processed while 
usbnet_stop() works. Setting dev->flags to 0 makes the work function do 
nothing, by the way. See also the comment in usbnet_stop() about this.


A work item may be placed to this workqueue in a number of ways, by both 
usbnet module and the mini-drivers. It is not too easy to track all 
these situations.


2. rx_complete() and tx_complete() may schedule execution of usbnet_bh() 
as a tasklet or a timer function. These two are URB completion callbacks.


It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop() 
clears dev->flags, indeed. But it does not prevent the completion 
handlers for the previously submitted URBs from running concurrently 
with usbnet_stop(). The latter waits for them to complete (via 
usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not 
set in info->flags. rndis_wlan, however, sets this flag for a few 
hardware models. So - no guarantees here as well.


If someone could list the particular bits of dev->flags that should be 
cleared to make sure no deferred call could reschedule usbnet_bh(), 
etc... Well, it would be enough to clear these first and use dev->flags 
= 0 later, after tasklet_kill() and del_timer_sync(). I cannot point out 
these particular bits now.


Besides, it is possible, although unlikely, that new event bits will be 
added to dev->flags in the future. And one will need to keep track of 
these to see if they should be cleared as well. I'd prever to play safer 
for now and clear them all.





The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
that out as a separate fix.  It's a separate issue, and should be
backported to all maintained stable releases it applies to (anything
from v3.8 and newer)


Yes, that makes sense. However, this fix was originally provided by
Oliver Neukum rather than me, so I would like to hear his opinion as
well first.


If what I write above is correct (please help me verify...), then maybe
it does make sense to do these together anyway.


I think, your suggestion to make it a separate patch is right. Will do 
it in the next version of the patchset, hopefully soon.


Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Bjørn Mork
Eugene Shatokhin  writes:

> The problem is not in the reordering but rather in the fact that
> "dev->flags = 0" is not necessarily atomic
> w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.
>
> So the following might be possible, although unlikely:
>
> CPU0 CPU1
>  clear_bit: read dev->flags
>  clear_bit: clear EVENT_RX_KILL in the read value
>
> dev->flags=0;
>
>  clear_bit: write updated dev->flags
>
> As a result, dev->flags may become non-zero again.

Ah, right.  Thanks for explaining.

> I cannot prove yet that this is an impossible situation. If anyone
> can, please explain. If so, this part of the patch will not be needed.

I wonder if we could simply move the dev->flags = 0 down a few lines to
fix both issues?  It doesn't seem to do anything useful except for
resetting the flags to a sane initial state after the device is down.

Stopping the tasklet rescheduling etc depends only on netif_running(),
which will be false when usbnet_stop is called.  There is no need to
touch dev->flags for this to happen.

>> The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
>> that out as a separate fix.  It's a separate issue, and should be
>> backported to all maintained stable releases it applies to (anything
>> from v3.8 and newer)
>
> Yes, that makes sense. However, this fix was originally provided by
> Oliver Neukum rather than me, so I would like to hear his opinion as
> well first.

If what I write above is correct (please help me verify...), then maybe
it does make sense to do these together anyway.



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Eugene Shatokhin

19.08.2015 13:54, Bjørn Mork пишет:

Eugene Shatokhin  writes:


19.08.2015 04:54, David Miller пишет:

From: Eugene Shatokhin 
Date: Fri, 14 Aug 2015 19:58:36 +0300


2. The second race is on dev->flags.

dev->flags is set to 0 here:
*0  usbnet_stop (usbnet.c:816)
  /* deferred work (task, timer, softirq) must also stop.
   * can't flush_scheduled_work() until we drop rtnl (later),
   * else workers could deadlock; so make workers a NOP.
   */
  dev->flags = 0;
  del_timer_sync (&dev->delay);
  tasklet_kill (&dev->bh);

And here, the code clears EVENT_RX_KILL bit in dev->flags, which may
execute concurrently with the above operation:
*0 clear_bit (bitops.h:113, inlined)
*1 usbnet_bh (usbnet.c:1475)
  /* restart RX again after disabling due to high error rate */
  clear_bit(EVENT_RX_KILL, &dev->flags);

It seems, setting dev->flags to 0 is not necessarily atomic w.r.t.
clear_bit() and other bit operations with dev->flags. It is safer to
make it atomic and this way, make the race harmless.

While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop() was fixed too: the bit should be checked before dev->flags
is cleared.


The fix for this is excessive.

Instead of all of this madness, looping over expensive clear_bit()
atomics, just do whatever it takes to make sure that usbnet_bh() is
quiesced and cannot execute any more.  Then you can safely clear
dev->flags normally.



If I understand it correctly, it is to make sure usbnet_bh() is not
scheduled again that dev->flags should be set to 0 first, one way or
another. That is what this madness is for.


Assuming there is a race which may reorder these, exactly what
difference does it make wrt EVENT_RX_KILL if you do

a)  clear_bit(EVENT_RX_KILL, &dev->flags);
 dev->flags = 0;

or

b)  dev->flags = 0;
 clear_bit(EVENT_RX_KILL, &dev->flags);


AFAICS, the result will be a cleared EVENT_RX_KILL bit in either case.



Thanks for the review!

The problem is not in the reordering but rather in the fact that 
"dev->flags = 0" is not necessarily atomic w.r.t. 
"clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.


So the following might be possible, although unlikely:

CPU0 CPU1
 clear_bit: read dev->flags
 clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

 clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.

I cannot prove yet that this is an impossible situation. If anyone can, 
please explain. If so, this part of the patch will not be needed.




The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
that out as a separate fix.  It's a separate issue, and should be
backported to all maintained stable releases it applies to (anything
from v3.8 and newer)


Yes, that makes sense. However, this fix was originally provided by 
Oliver Neukum rather than me, so I would like to hear his opinion as 
well first.



Bjørn



Regards,
Eugene
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Bjørn Mork
Eugene Shatokhin  writes:

> 19.08.2015 04:54, David Miller пишет:
>> From: Eugene Shatokhin 
>> Date: Fri, 14 Aug 2015 19:58:36 +0300
>>
>>> 2. The second race is on dev->flags.
>>>
>>> dev->flags is set to 0 here:
>>> *0  usbnet_stop (usbnet.c:816)
>>>  /* deferred work (task, timer, softirq) must also stop.
>>>   * can't flush_scheduled_work() until we drop rtnl (later),
>>>   * else workers could deadlock; so make workers a NOP.
>>>   */
>>>  dev->flags = 0;
>>>  del_timer_sync (&dev->delay);
>>>  tasklet_kill (&dev->bh);
>>>
>>> And here, the code clears EVENT_RX_KILL bit in dev->flags, which may
>>> execute concurrently with the above operation:
>>> *0 clear_bit (bitops.h:113, inlined)
>>> *1 usbnet_bh (usbnet.c:1475)
>>>  /* restart RX again after disabling due to high error rate */
>>>  clear_bit(EVENT_RX_KILL, &dev->flags);
>>>
>>> It seems, setting dev->flags to 0 is not necessarily atomic w.r.t.
>>> clear_bit() and other bit operations with dev->flags. It is safer to
>>> make it atomic and this way, make the race harmless.
>>>
>>> While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev->flags in
>>> usbnet_stop() was fixed too: the bit should be checked before dev->flags
>>> is cleared.
>>
>> The fix for this is excessive.
>>
>> Instead of all of this madness, looping over expensive clear_bit()
>> atomics, just do whatever it takes to make sure that usbnet_bh() is
>> quiesced and cannot execute any more.  Then you can safely clear
>> dev->flags normally.
>>
>
> If I understand it correctly, it is to make sure usbnet_bh() is not
> scheduled again that dev->flags should be set to 0 first, one way or
> another. That is what this madness is for.

Assuming there is a race which may reorder these, exactly what
difference does it make wrt EVENT_RX_KILL if you do

a)  clear_bit(EVENT_RX_KILL, &dev->flags);
dev->flags = 0;

or

b)  dev->flags = 0;
clear_bit(EVENT_RX_KILL, &dev->flags);


AFAICS, the result will be a cleared EVENT_RX_KILL bit in either case.


The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
that out as a separate fix.  It's a separate issue, and should be
backported to all maintained stable releases it applies to (anything
from v3.8 and newer)


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Eugene Shatokhin

19.08.2015 04:54, David Miller пишет:

From: Eugene Shatokhin 
Date: Fri, 14 Aug 2015 19:58:36 +0300


2. The second race is on dev->flags.

dev->flags is set to 0 here:
*0  usbnet_stop (usbnet.c:816)
 /* deferred work (task, timer, softirq) must also stop.
  * can't flush_scheduled_work() until we drop rtnl (later),
  * else workers could deadlock; so make workers a NOP.
  */
 dev->flags = 0;
 del_timer_sync (&dev->delay);
 tasklet_kill (&dev->bh);

And here, the code clears EVENT_RX_KILL bit in dev->flags, which may
execute concurrently with the above operation:
*0 clear_bit (bitops.h:113, inlined)
*1 usbnet_bh (usbnet.c:1475)
 /* restart RX again after disabling due to high error rate */
 clear_bit(EVENT_RX_KILL, &dev->flags);

It seems, setting dev->flags to 0 is not necessarily atomic w.r.t.
clear_bit() and other bit operations with dev->flags. It is safer to
make it atomic and this way, make the race harmless.

While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop() was fixed too: the bit should be checked before dev->flags
is cleared.


The fix for this is excessive.

Instead of all of this madness, looping over expensive clear_bit()
atomics, just do whatever it takes to make sure that usbnet_bh() is
quiesced and cannot execute any more.  Then you can safely clear
dev->flags normally.



If I understand it correctly, it is to make sure usbnet_bh() is not 
scheduled again that dev->flags should be set to 0 first, one way or 
another. That is what this madness is for.


tasklet_kill() will wait then for the already running instance of 
usbnet_bh() (if one is running). After that, it is guaranteed BH is not 
running and will not be re-scheduled.


As for the performance concerns, I doubt that usbnet_stop() is anywhere 
on the critical path. I have been testing this patch for some time and 
haven't seen any new performance issues with it yet.


If needed, it is possible to measure and compare the time needed for 
usbnet_stop() before and after this patch and try to estimate the impact 
of this on the overall performance.


Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-18 Thread David Miller
From: Eugene Shatokhin 
Date: Fri, 14 Aug 2015 19:58:36 +0300

> 2. The second race is on dev->flags.
> 
> dev->flags is set to 0 here:
> *0  usbnet_stop (usbnet.c:816)
> /* deferred work (task, timer, softirq) must also stop.
>  * can't flush_scheduled_work() until we drop rtnl (later),
>  * else workers could deadlock; so make workers a NOP.
>  */
> dev->flags = 0;
> del_timer_sync (&dev->delay);
> tasklet_kill (&dev->bh);
> 
> And here, the code clears EVENT_RX_KILL bit in dev->flags, which may
> execute concurrently with the above operation:
> *0 clear_bit (bitops.h:113, inlined)
> *1 usbnet_bh (usbnet.c:1475)
> /* restart RX again after disabling due to high error rate */
> clear_bit(EVENT_RX_KILL, &dev->flags);
> 
> It seems, setting dev->flags to 0 is not necessarily atomic w.r.t.
> clear_bit() and other bit operations with dev->flags. It is safer to
> make it atomic and this way, make the race harmless.
> 
> While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev->flags in
> usbnet_stop() was fixed too: the bit should be checked before dev->flags
> is cleared.

The fix for this is excessive.

Instead of all of this madness, looping over expensive clear_bit()
atomics, just do whatever it takes to make sure that usbnet_bh() is
quiesced and cannot execute any more.  Then you can safely clear
dev->flags normally.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/