Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
David Miller [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] (Eric W. Biederman)
 Date: Fri, 28 Sep 2007 18:59:08 -0600

 
 Currently we have the call path:
 macvlan_open - dev_unicast_add - __dev_set_rx_mode -
  __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock
 
 When mutex debugging is on taking a mutex complains if we are not
 allowed to sleep.  At that point we have called netif_tx_lock_bh
 so we are clearly not allowed to sleep.  Arguably this is not a
 problem for mutex_trylock.
 
 However we can avoid the complaint and make the ASSERT_RTNL code
 cheaper, faster and more obvious by simply calling mutex_is_locked.
 
 So this patch adds rtnl_is_locked (which does mutex_is_locked on
 the rtnl_mutex) and changes ASSERT_RTNL to use that.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

 There was a lot of discussion about how to do this right and
 therefore you'll need to resubmit all of this with this
 discussioned issues addressed.

Thanks for the update on where this patch sits.

My understanding is that the patch as I submitted it is correct.

The code path that sparked this patch has been seen to be extremely
convoluted from a locking perspective.  Herbert and Patrick have been
discussing that code path and it was my impression they were working
together to figure out how to refactor that code path to make the
locking simpler.  There are enough convoluted details that I
do not feel comfortable refactoring that code path.

There was a practical suggestion by Herbert that ASSERT_RTNL have a
might_sleep() added.  That suggestion will currently result in
ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

So I do not feel comfortable adding a might_sleep() into ASSERT_RTNL,
as it appears that it will warn on currently correct code.  Even if
that code has code has nearly incomprehensible locking.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Herbert Xu
On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:

 There was a practical suggestion by Herbert that ASSERT_RTNL have a
 might_sleep() added.  That suggestion will currently result in
 ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

As I've already said we should change the macvlan and core
netdev code so that this doesn't happen in the first place.
In gernal checking for the RTNL while holding a spin lock is
a sign of a bug.

So I would object to a patch that caused the RTNL_ASSERT to not
warn about being called in an atomic context.

I don't have a problem with your patch per se, it's the fact
that the patch is removing the warning when it's called in an
atomic context that I don't like.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
Herbert Xu [EMAIL PROTECTED] writes:

 On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:

 There was a practical suggestion by Herbert that ASSERT_RTNL have a
 might_sleep() added.  That suggestion will currently result in
 ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

 As I've already said we should change the macvlan and core
 netdev code so that this doesn't happen in the first place.

Agreed.  Until that is done I am reluctant to add a warning
to ASSERT_RTNL.  Last I looked at that part of the thread
it looked like you and Patrick were making good progress
towards unraveling that, so I have no problem adding
an extra warning when we don't expect it to ever trigger.

 In gernal checking for the RTNL while holding a spin lock is
 a sign of a bug.

 So I would object to a patch that caused the RTNL_ASSERT to not
 warn about being called in an atomic context.

ASSERT_RTNL does not warn about being called in an atomic context
today!

 I don't have a problem with your patch per se, it's the fact
 that the patch is removing the warning when it's called in an
 atomic context that I don't like.

No my patch does not remove a warning.

Way way deep in mutex debugging on the slowpath there is a unreadable
and incomprehensible WARN_ON in muxtex_trylock that will trigger if
you have 10 tons of debugging turned on, and you are in,
interrupt context, and you manage to hit the slow path.  I think that
is a pretty unlikely scenario.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Herbert Xu
On Thu, Oct 11, 2007 at 02:23:35AM -0600, Eric W. Biederman wrote:

  So I would object to a patch that caused the RTNL_ASSERT to not
  warn about being called in an atomic context.
 
 ASSERT_RTNL does not warn about being called in an atomic context
 today!

Well it did didn't it or we wouldn't be having this thread :)
 
 Way way deep in mutex debugging on the slowpath there is a unreadable
 and incomprehensible WARN_ON in muxtex_trylock that will trigger if
 you have 10 tons of debugging turned on, and you are in,
 interrupt context, and you manage to hit the slow path.  I think that
 is a pretty unlikely scenario.

Well thanks to that warning we're on our way of improving the
code that triggered it in such a way that this warning will soon
go silent.

That's precisely the reason why I object to having this warning
removed.  Now you have a good point that this warning doesn't
trigger all the time.  The fix to that is to *make* it trigger
always, not removing it.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
Herbert Xu [EMAIL PROTECTED] writes:

 Well thanks to that warning we're on our way of improving the
 code that triggered it in such a way that this warning will soon
 go silent.

 That's precisely the reason why I object to having this warning
 removed.  Now you have a good point that this warning doesn't
 trigger all the time.  The fix to that is to *make* it trigger
 always, not removing it.

I'm almost convinced but.

Where people deliberately use convoluted locking is where we
most need things like ASSERT_RTNL.

Having ASSERT_RTNL warn if you were sleeping does not seem
intuitive from the name.

This instance of convoluted locking seems like a complete
one off to me, and if it will warn about other constructs
currently in the kernel it seems wrong.

Frankly I don't feel comfortable adding the check because I can't
defend the presence of might_sleep() in ASSERT_RTNL.  If I can't
understand a change well enough to defend it I will not take
responsibility for it, and I will not add my Signed-off-by to it.

The patch I wrote was trivial a trivial optimization and obviously
correct.  Adding the might_sleep() and the patch becomes the start
of a crusade for better code that I don't believe in.

So I would rather forget this patch then make that one line addition.

Thanks,
Eric



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 11 Oct 2007 10:33:39 -0600

 Having ASSERT_RTNL warn if you were sleeping does not seem
 intuitive from the name.
 
 This instance of convoluted locking seems like a complete
 one off to me, and if it will warn about other constructs
 currently in the kernel it seems wrong.

RTNL is a semaphore, therefore it sleeps.

Therefore anything that requires RTNL is held can also assume that it
can do things like GFP_KERNEL allocations and other sleeping actions.

This is why any code path that runs with spinlocks held or interrupts
disabled, and hits an RTNL assertion, is buggy.  It is the chain of
dependencies of what is allowed in such contexts.

ASSERT_RTNL();
page = alloc_page(GFP_KERNEL);

That sequence above is absolutely always legal.

The might_sleep() check is just letting us know the problem exists.

If spinlocks or interrupt disabling is needed to implement something
deeper in the call chain, that's fine, it just cannot call back into
the RTNL asserted domain with those spinlocks held or interrupts
disabled.

If the mac_vlan driver, or whichever one has the problem, does things
like this it must be fixed and putting or not putting a proper
might_sleep() check here doesn't change that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman

David, all.  My apologies. I'm tired and I don't have the energy to
grok yet another part of the kernel right now.  So I can not
productively participate in this discussion.

I do agree that the locking below dev_unicast_add() that was exposed
by the macvlan driver is unmaintainable even if it is currently
correct.

I don't see any fundamental problems with adding a might_sleep(),
ASSERT_RTNL.

I just don't have the energy to audit everything I feel I would have
to audit to be comfortable taking responsibility for adding the
might_sleep() or fixing the locking in dev_unicast_add() (which seems
more important).

So again my apologies.  This is all beyond what I can deal with at the moment.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-10 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Fri, 28 Sep 2007 18:59:08 -0600

 
 Currently we have the call path:
 macvlan_open - dev_unicast_add - __dev_set_rx_mode -
   __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock
 
 When mutex debugging is on taking a mutex complains if we are not
 allowed to sleep.  At that point we have called netif_tx_lock_bh
 so we are clearly not allowed to sleep.  Arguably this is not a
 problem for mutex_trylock.
 
 However we can avoid the complaint and make the ASSERT_RTNL code
 cheaper, faster and more obvious by simply calling mutex_is_locked.
 
 So this patch adds rtnl_is_locked (which does mutex_is_locked on
 the rtnl_mutex) and changes ASSERT_RTNL to use that.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

There was a lot of discussion about how to do this right and
therefore you'll need to resubmit all of this with this
discussioned issues addressed.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-07 Thread Patrick McHardy
Herbert Xu wrote:
 On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote:
 
I think this doesn't completely fix it, when dev_unicast_add is
interrupted by dev_mc_add before the unicast changes are performed,
they will get committed in the dev_mc_add context, so we might still
call change_flags with BH disabled. Taking the TX lock around the
dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode
should fix this.
 
 
 Good catch.  Digging back in history it seems that you added
 the change_rx_flags function so that the driver didn't have to
 do it under TX lock, right?


Yes, and to make sure the RTNL is held.

 The problem with this is that the stack can now call
 change_rx_flags and set_multicast_list simultaneously
 which presents a potential headache for the driver
 author (if they were to use change_rx_flags).


The change_rx_flags function was intended to be used by software
devices that want to synchronize flags to a different device,
in which case they shouldn't interfere since set_multicast_list
would only be used for the multicast list and not the flags.

 It seems to me what we could do is in fact separate out the
 part that adds the address and the part that syncs it with
 hardware.


That sounds like a really good idea to get rid of all the
confusion.

 That way we can call the hardware from a process context later
 and use the RTNL to guarantee that we only enter the driver
 once.
 
 So dev_mc_add would look like:
 
 1) Hold some form of lock L.
 2) Modify mc list A (a copy of the current mc list).
 3) Drop lock.
 4) Schedule an update to the hardware.
 
 The update to the hardware would look lie:
 
 1) Hold RTNL.
 2) Hold lock L.
 3) Copy list A to list B (B would be our current list).
 4) Drop lock L.
 5) Call the hardware.
 6) Drop RTNL.
 
 For compatibility, set_multicast_list would still be invoked
 under the TX lock while set_rx_mode would do exactly the same
 thing but would only hold the RTNL.
 
 What do you think about this approach?


Sounds great :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-03 Thread Herbert Xu
On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote:

 I think this doesn't completely fix it, when dev_unicast_add is
 interrupted by dev_mc_add before the unicast changes are performed,
 they will get committed in the dev_mc_add context, so we might still
 call change_flags with BH disabled. Taking the TX lock around the
 dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode
 should fix this.

Good catch.  Digging back in history it seems that you added
the change_rx_flags function so that the driver didn't have to
do it under TX lock, right?

The problem with this is that the stack can now call
change_rx_flags and set_multicast_list simultaneously
which presents a potential headache for the driver
author (if they were to use change_rx_flags).

It seems to me what we could do is in fact separate out the
part that adds the address and the part that syncs it with
hardware.

That way we can call the hardware from a process context later
and use the RTNL to guarantee that we only enter the driver
once.

So dev_mc_add would look like:

1) Hold some form of lock L.
2) Modify mc list A (a copy of the current mc list).
3) Drop lock.
4) Schedule an update to the hardware.

The update to the hardware would look lie:

1) Hold RTNL.
2) Hold lock L.
3) Copy list A to list B (B would be our current list).
4) Drop lock L.
5) Call the hardware.
6) Drop RTNL.

For compatibility, set_multicast_list would still be invoked
under the TX lock while set_rx_mode would do exactly the same
thing but would only hold the RTNL.

What do you think about this approach?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-02 Thread Herbert Xu
On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote:

 In the IPv6 case we're only changing the multicast list,
 so we're never calling into __dev_set_promiscuity.
 
 I actually even added a comment about this :)
 
 /* Unicast addresses changes may only happen under the rtnl,
  * therefore calling __dev_set_promiscuity here is safe.
  */

Good point.  I should stop mentally blocking out comments when
I read code :)

I'm a bit uncomfortable with having a function (change_flags)
that's sometimes in a sleepable context and sometimes running
with BH disabled.

So how about splitting up the unicast/multicast entry points
as follows?

[NET]: Restore multicast-only __dev_mc_upload

As it is dev_set_rx_mode needs to cope with being called with
or without the RTNL.  This is rather confusing when it comes
to understanding what locks are being held at a particular
point.

Since the only path that calls it without the RTNL is in the
multicast code, we could avoid the confusion by splitting that
out.

This patch restores the old __dev_mc_upload as a multicast-only
variant of dev_set_rx_mode.  It also gets rid of the now-unused
__dev_set_rx_mod and makes dev_set_rx_mode static since it's
only used in net/core/dev.c.

As a side-effect this fixes the warning triggered by the
RTNL_ASSERT in __dev_set_promiscuity.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 91cd3f3..e9a579e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1356,8 +1356,6 @@ extern struct net_device *alloc_netdev_mq(int 
sizeof_priv, const char *name,
 extern int register_netdev(struct net_device *dev);
 extern voidunregister_netdev(struct net_device *dev);
 /* Functions used for secondary unicast and multicast support */
-extern voiddev_set_rx_mode(struct net_device *dev);
-extern void__dev_set_rx_mode(struct net_device *dev);
 extern int dev_unicast_delete(struct net_device *dev, void *addr, 
int alen);
 extern int dev_unicast_add(struct net_device *dev, void *addr, int 
alen);
 extern int dev_mc_delete(struct net_device *dev, void *addr, int 
alen, int all);
diff --git a/net/core/dev.c b/net/core/dev.c
index 833f060..4cf985f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -967,6 +967,8 @@ void dev_load(struct net *net, const char *name)
request_module(%s, name);
 }
 
+static void dev_set_rx_mode(struct net_device *dev);
+
 /**
  * dev_open- prepare an interface for use.
  * @dev:   device to open
@@ -2775,7 +2777,7 @@ void dev_set_allmulti(struct net_device *dev, int inc)
  * filtering it is put in promiscous mode while unicast addresses
  * are present.
  */
-void __dev_set_rx_mode(struct net_device *dev)
+static void dev_set_rx_mode(struct net_device *dev)
 {
/* dev_open will call this function so the list will stay sane. */
if (!(dev-flagsIFF_UP))
@@ -2784,9 +2786,11 @@ void __dev_set_rx_mode(struct net_device *dev)
if (!netif_device_present(dev))
return;
 
-   if (dev-set_rx_mode)
+   if (dev-set_rx_mode) {
+   netif_tx_lock_bh(dev);
dev-set_rx_mode(dev);
-   else {
+   netif_tx_unlock_bh(dev);
+   } else {
/* Unicast addresses changes may only happen under the rtnl,
 * therefore calling __dev_set_promiscuity here is safe.
 */
@@ -2798,18 +2802,14 @@ void __dev_set_rx_mode(struct net_device *dev)
dev-uc_promisc = 0;
}
 
-   if (dev-set_multicast_list)
+   if (dev-set_multicast_list) {
+   netif_tx_lock_bh(dev);
dev-set_multicast_list(dev);
+   netif_tx_unlock_bh(dev);
+   }
}
 }
 
-void dev_set_rx_mode(struct net_device *dev)
-{
-   netif_tx_lock_bh(dev);
-   __dev_set_rx_mode(dev);
-   netif_tx_unlock_bh(dev);
-}
-
 int __dev_addr_delete(struct dev_addr_list **list, int *count,
  void *addr, int alen, int glbl)
 {
@@ -2885,11 +2885,9 @@ int dev_unicast_delete(struct net_device *dev, void 
*addr, int alen)
 
ASSERT_RTNL();
 
-   netif_tx_lock_bh(dev);
err = __dev_addr_delete(dev-uc_list, dev-uc_count, addr, alen, 0);
if (!err)
-   __dev_set_rx_mode(dev);
-   netif_tx_unlock_bh(dev);
+   dev_set_rx_mode(dev);
return err;
 }
 EXPORT_SYMBOL(dev_unicast_delete);
@@ -2911,11 +2909,9 @@ int dev_unicast_add(struct net_device *dev, void *addr, 
int alen)
 
ASSERT_RTNL();
 
-   netif_tx_lock_bh(dev);

Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-02 Thread Patrick McHardy

On Tue, 2 Oct 2007, Herbert Xu wrote:


On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote:



I'm a bit uncomfortable with having a function (change_flags)
that's sometimes in a sleepable context and sometimes running
with BH disabled.

So how about splitting up the unicast/multicast entry points
as follows?




[NET]: Restore multicast-only __dev_mc_upload

As it is dev_set_rx_mode needs to cope with being called with
or without the RTNL.  This is rather confusing when it comes
to understanding what locks are being held at a particular
point.

Since the only path that calls it without the RTNL is in the
multicast code, we could avoid the confusion by splitting that
out.

This patch restores the old __dev_mc_upload as a multicast-only
variant of dev_set_rx_mode.  It also gets rid of the now-unused
__dev_set_rx_mod and makes dev_set_rx_mode static since it's
only used in net/core/dev.c.

As a side-effect this fixes the warning triggered by the
RTNL_ASSERT in __dev_set_promiscuity.



I think this doesn't completely fix it, when dev_unicast_add is
interrupted by dev_mc_add before the unicast changes are performed,
they will get committed in the dev_mc_add context, so we might still
call change_flags with BH disabled. Taking the TX lock around the
dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode
should fix this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-30 Thread Patrick McHardy
Herbert Xu wrote:
 On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote:
 
For unicast addresses its not strictly necessary since they may
only be changed under the RTNL anyway. The reason why it takes
the tx_lock is for consistency with multicast address handling,
which can't rely on the RTNL since IPv6 changes them from
BH context. The idea was that the -set_rx_mode function should
handle both secondary unicast and multicast addresses for
simplicity.
 
 
 In any case, coming back to the original question, the RTNL
 assertion is simply wrong in this case because if we're being
 called from IPv6 then the RTNL won't even be held.
 
 So I think we need to
 
 1) Move the assert into dev_set_promiscuity.
 2) Take the TX lock in dev_set_promiscuity.


In the IPv6 case we're only changing the multicast list,
so we're never calling into __dev_set_promiscuity.

I actually even added a comment about this :)

/* Unicast addresses changes may only happen under the rtnl,
 * therefore calling __dev_set_promiscuity here is safe.
 */

I would prefer to keep the ASSERT_RTNL in __dev_set_promiscuity
since it also covers the __dev_set_rx_mode path. How about
adding an ASSERT_RTNL_ATOMIC without the might_sleep or simply
open coding it?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-29 Thread Patrick McHardy
Herbert Xu wrote:
 Eric W. Biederman [EMAIL PROTECTED] wrote:
 
Currently we have the call path:
macvlan_open - dev_unicast_add - __dev_set_rx_mode -
   __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock

When mutex debugging is on taking a mutex complains if we are not
allowed to sleep.  At that point we have called netif_tx_lock_bh
so we are clearly not allowed to sleep.  Arguably this is not a
problem for mutex_trylock.
 
 
 Actually holding the TX lock here is a bug.  We're going to
 call down into the hardware with __dev_set_promiscuity, which
 may sleep (think USB NICs), so we definitely shouldn't be holding
 any spin locks.
 
 Patrick, could we avoid taking the TX lock here somehow?


For unicast addresses its not strictly necessary since they may
only be changed under the RTNL anyway. The reason why it takes
the tx_lock is for consistency with multicast address handling,
which can't rely on the RTNL since IPv6 changes them from
BH context. The idea was that the -set_rx_mode function should
handle both secondary unicast and multicast addresses for
simplicity.

But I don't understand the problem, if this doesn't work for
unicast addresses, why does it work for multicast addresses
and -set_multicast_list? I had a quick look at some USB
network drivers but couldn't spot anything special ..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-29 Thread Eric W. Biederman
Herbert Xu [EMAIL PROTECTED] writes:

 Eric W. Biederman [EMAIL PROTECTED] wrote:
 
 Currently we have the call path:
 macvlan_open - dev_unicast_add - __dev_set_rx_mode -
__dev_set_promiscuity - ASSERT_RTNL - mutex_trylock
 
 When mutex debugging is on taking a mutex complains if we are not
 allowed to sleep.  At that point we have called netif_tx_lock_bh
 so we are clearly not allowed to sleep.  Arguably this is not a
 problem for mutex_trylock.

 Actually holding the TX lock here is a bug.  We're going to
 call down into the hardware with __dev_set_promiscuity, which
 may sleep (think USB NICs), so we definitely shouldn't be holding
 any spin locks.

Regardless of the correctness of where we have ASSERT_RTNL.
I think not actually taking the mutex on the assertion failure path
(just so we can release it), is still a good deal regardless.

For this particular call site clearly we need to look at what
is happening a little more.  The obvious thing would be to add
an explicit might_sleep if we are calling code that can sleep.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-29 Thread Patrick McHardy
Eric W. Biederman wrote:
 Regardless of the correctness of where we have ASSERT_RTNL.
 I think not actually taking the mutex on the assertion failure path
 (just so we can release it), is still a good deal regardless.


Agreed. I actually have an identical patch somewhere that
I wanted to submit soon.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-29 Thread Herbert Xu
On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote:

 For unicast addresses its not strictly necessary since they may
 only be changed under the RTNL anyway. The reason why it takes
 the tx_lock is for consistency with multicast address handling,
 which can't rely on the RTNL since IPv6 changes them from
 BH context. The idea was that the -set_rx_mode function should
 handle both secondary unicast and multicast addresses for
 simplicity.

OK, the documentation (which predates USB NIC drivers) disagrees
with me and dev-set_multicast will always operate with the xmit
lock.

This isn't very nice for USB drivers since they have to leave
the change request hanging after dev-set_multicast returns.
I suppose the one thing that guarantees this will work is the
S in USB :)

However, at least one USB driver is buggy (kaweth) in not
allowing another set_multicast until the previous one is
done, which can lead to silent losses of set_multicast calls
since it's a void.

In any case, coming back to the original question, the RTNL
assertion is simply wrong in this case because if we're being
called from IPv6 then the RTNL won't even be held.

So I think we need to

1) Move the assert into dev_set_promiscuity.
2) Take the TX lock in dev_set_promiscuity.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-29 Thread Herbert Xu
On Sat, Sep 29, 2007 at 11:18:18AM -0600, Eric W. Biederman wrote:

 Regardless of the correctness of where we have ASSERT_RTNL.
 I think not actually taking the mutex on the assertion failure path
 (just so we can release it), is still a good deal regardless.

Provided that you add a might_sleep call in there so that if
somebody does this under locks it'll complain then I agree.

Checking RTNL under spin locks is almost certainly the sign
of a bug.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-28 Thread Eric W. Biederman

Currently we have the call path:
macvlan_open - dev_unicast_add - __dev_set_rx_mode -
__dev_set_promiscuity - ASSERT_RTNL - mutex_trylock

When mutex debugging is on taking a mutex complains if we are not
allowed to sleep.  At that point we have called netif_tx_lock_bh
so we are clearly not allowed to sleep.  Arguably this is not a
problem for mutex_trylock.

However we can avoid the complaint and make the ASSERT_RTNL code
cheaper, faster and more obvious by simply calling mutex_is_locked.

So this patch adds rtnl_is_locked (which does mutex_is_locked on
the rtnl_mutex) and changes ASSERT_RTNL to use that.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/linux/rtnetlink.h |4 ++--
 net/core/rtnetlink.c  |5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dff3192..9c21e45 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -714,13 +714,13 @@ extern void rtmsg_ifinfo(int type, struct net_device 
*dev, unsigned change);
 extern void rtnl_lock(void);
 extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
+extern int rtnl_is_locked(void);
 
 extern void rtnetlink_init(void);
 extern void __rtnl_unlock(void);
 
 #define ASSERT_RTNL() do { \
-   if (unlikely(rtnl_trylock())) { \
-   rtnl_unlock(); \
+   if (unlikely(!rtnl_is_locked())) { \
printk(KERN_ERR RTNL: assertion failed at %s (%d)\n, \
   __FILE__,  __LINE__); \
dump_stack(); \
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 739fbad..8bc68e6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -85,6 +85,11 @@ int rtnl_trylock(void)
return mutex_trylock(rtnl_mutex);
 }
 
+int rtnl_is_locked(void)
+{
+   return mutex_is_locked(rtnl_mutex);
+}
+
 int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len)
 {
memset(tb, 0, sizeof(struct rtattr*)*maxattr);
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-09-28 Thread Herbert Xu
Eric W. Biederman [EMAIL PROTECTED] wrote:
 
 Currently we have the call path:
 macvlan_open - dev_unicast_add - __dev_set_rx_mode -
__dev_set_promiscuity - ASSERT_RTNL - mutex_trylock
 
 When mutex debugging is on taking a mutex complains if we are not
 allowed to sleep.  At that point we have called netif_tx_lock_bh
 so we are clearly not allowed to sleep.  Arguably this is not a
 problem for mutex_trylock.

Actually holding the TX lock here is a bug.  We're going to
call down into the hardware with __dev_set_promiscuity, which
may sleep (think USB NICs), so we definitely shouldn't be holding
any spin locks.

Patrick, could we avoid taking the TX lock here somehow?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html