Re: [PATCH] rtnl: Simplify ASSERT_RTNL
David Miller [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Fri, 28 Sep 2007 18:59:08 -0600 Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. However we can avoid the complaint and make the ASSERT_RTNL code cheaper, faster and more obvious by simply calling mutex_is_locked. So this patch adds rtnl_is_locked (which does mutex_is_locked on the rtnl_mutex) and changes ASSERT_RTNL to use that. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] There was a lot of discussion about how to do this right and therefore you'll need to resubmit all of this with this discussioned issues addressed. Thanks for the update on where this patch sits. My understanding is that the patch as I submitted it is correct. The code path that sparked this patch has been seen to be extremely convoluted from a locking perspective. Herbert and Patrick have been discussing that code path and it was my impression they were working together to figure out how to refactor that code path to make the locking simpler. There are enough convoluted details that I do not feel comfortable refactoring that code path. There was a practical suggestion by Herbert that ASSERT_RTNL have a might_sleep() added. That suggestion will currently result in ASSERT_RTNL firing unnecessarily from the macvlan_open code path. So I do not feel comfortable adding a might_sleep() into ASSERT_RTNL, as it appears that it will warn on currently correct code. Even if that code has code has nearly incomprehensible locking. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote: There was a practical suggestion by Herbert that ASSERT_RTNL have a might_sleep() added. That suggestion will currently result in ASSERT_RTNL firing unnecessarily from the macvlan_open code path. As I've already said we should change the macvlan and core netdev code so that this doesn't happen in the first place. In gernal checking for the RTNL while holding a spin lock is a sign of a bug. So I would object to a patch that caused the RTNL_ASSERT to not warn about being called in an atomic context. I don't have a problem with your patch per se, it's the fact that the patch is removing the warning when it's called in an atomic context that I don't like. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu [EMAIL PROTECTED] writes: On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote: There was a practical suggestion by Herbert that ASSERT_RTNL have a might_sleep() added. That suggestion will currently result in ASSERT_RTNL firing unnecessarily from the macvlan_open code path. As I've already said we should change the macvlan and core netdev code so that this doesn't happen in the first place. Agreed. Until that is done I am reluctant to add a warning to ASSERT_RTNL. Last I looked at that part of the thread it looked like you and Patrick were making good progress towards unraveling that, so I have no problem adding an extra warning when we don't expect it to ever trigger. In gernal checking for the RTNL while holding a spin lock is a sign of a bug. So I would object to a patch that caused the RTNL_ASSERT to not warn about being called in an atomic context. ASSERT_RTNL does not warn about being called in an atomic context today! I don't have a problem with your patch per se, it's the fact that the patch is removing the warning when it's called in an atomic context that I don't like. No my patch does not remove a warning. Way way deep in mutex debugging on the slowpath there is a unreadable and incomprehensible WARN_ON in muxtex_trylock that will trigger if you have 10 tons of debugging turned on, and you are in, interrupt context, and you manage to hit the slow path. I think that is a pretty unlikely scenario. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Thu, Oct 11, 2007 at 02:23:35AM -0600, Eric W. Biederman wrote: So I would object to a patch that caused the RTNL_ASSERT to not warn about being called in an atomic context. ASSERT_RTNL does not warn about being called in an atomic context today! Well it did didn't it or we wouldn't be having this thread :) Way way deep in mutex debugging on the slowpath there is a unreadable and incomprehensible WARN_ON in muxtex_trylock that will trigger if you have 10 tons of debugging turned on, and you are in, interrupt context, and you manage to hit the slow path. I think that is a pretty unlikely scenario. Well thanks to that warning we're on our way of improving the code that triggered it in such a way that this warning will soon go silent. That's precisely the reason why I object to having this warning removed. Now you have a good point that this warning doesn't trigger all the time. The fix to that is to *make* it trigger always, not removing it. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu [EMAIL PROTECTED] writes: Well thanks to that warning we're on our way of improving the code that triggered it in such a way that this warning will soon go silent. That's precisely the reason why I object to having this warning removed. Now you have a good point that this warning doesn't trigger all the time. The fix to that is to *make* it trigger always, not removing it. I'm almost convinced but. Where people deliberately use convoluted locking is where we most need things like ASSERT_RTNL. Having ASSERT_RTNL warn if you were sleeping does not seem intuitive from the name. This instance of convoluted locking seems like a complete one off to me, and if it will warn about other constructs currently in the kernel it seems wrong. Frankly I don't feel comfortable adding the check because I can't defend the presence of might_sleep() in ASSERT_RTNL. If I can't understand a change well enough to defend it I will not take responsibility for it, and I will not add my Signed-off-by to it. The patch I wrote was trivial a trivial optimization and obviously correct. Adding the might_sleep() and the patch becomes the start of a crusade for better code that I don't believe in. So I would rather forget this patch then make that one line addition. Thanks, Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Thu, 11 Oct 2007 10:33:39 -0600 Having ASSERT_RTNL warn if you were sleeping does not seem intuitive from the name. This instance of convoluted locking seems like a complete one off to me, and if it will warn about other constructs currently in the kernel it seems wrong. RTNL is a semaphore, therefore it sleeps. Therefore anything that requires RTNL is held can also assume that it can do things like GFP_KERNEL allocations and other sleeping actions. This is why any code path that runs with spinlocks held or interrupts disabled, and hits an RTNL assertion, is buggy. It is the chain of dependencies of what is allowed in such contexts. ASSERT_RTNL(); page = alloc_page(GFP_KERNEL); That sequence above is absolutely always legal. The might_sleep() check is just letting us know the problem exists. If spinlocks or interrupt disabling is needed to implement something deeper in the call chain, that's fine, it just cannot call back into the RTNL asserted domain with those spinlocks held or interrupts disabled. If the mac_vlan driver, or whichever one has the problem, does things like this it must be fixed and putting or not putting a proper might_sleep() check here doesn't change that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
David, all. My apologies. I'm tired and I don't have the energy to grok yet another part of the kernel right now. So I can not productively participate in this discussion. I do agree that the locking below dev_unicast_add() that was exposed by the macvlan driver is unmaintainable even if it is currently correct. I don't see any fundamental problems with adding a might_sleep(), ASSERT_RTNL. I just don't have the energy to audit everything I feel I would have to audit to be comfortable taking responsibility for adding the might_sleep() or fixing the locking in dev_unicast_add() (which seems more important). So again my apologies. This is all beyond what I can deal with at the moment. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Fri, 28 Sep 2007 18:59:08 -0600 Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. However we can avoid the complaint and make the ASSERT_RTNL code cheaper, faster and more obvious by simply calling mutex_is_locked. So this patch adds rtnl_is_locked (which does mutex_is_locked on the rtnl_mutex) and changes ASSERT_RTNL to use that. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] There was a lot of discussion about how to do this right and therefore you'll need to resubmit all of this with this discussioned issues addressed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu wrote: On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote: I think this doesn't completely fix it, when dev_unicast_add is interrupted by dev_mc_add before the unicast changes are performed, they will get committed in the dev_mc_add context, so we might still call change_flags with BH disabled. Taking the TX lock around the dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode should fix this. Good catch. Digging back in history it seems that you added the change_rx_flags function so that the driver didn't have to do it under TX lock, right? Yes, and to make sure the RTNL is held. The problem with this is that the stack can now call change_rx_flags and set_multicast_list simultaneously which presents a potential headache for the driver author (if they were to use change_rx_flags). The change_rx_flags function was intended to be used by software devices that want to synchronize flags to a different device, in which case they shouldn't interfere since set_multicast_list would only be used for the multicast list and not the flags. It seems to me what we could do is in fact separate out the part that adds the address and the part that syncs it with hardware. That sounds like a really good idea to get rid of all the confusion. That way we can call the hardware from a process context later and use the RTNL to guarantee that we only enter the driver once. So dev_mc_add would look like: 1) Hold some form of lock L. 2) Modify mc list A (a copy of the current mc list). 3) Drop lock. 4) Schedule an update to the hardware. The update to the hardware would look lie: 1) Hold RTNL. 2) Hold lock L. 3) Copy list A to list B (B would be our current list). 4) Drop lock L. 5) Call the hardware. 6) Drop RTNL. For compatibility, set_multicast_list would still be invoked under the TX lock while set_rx_mode would do exactly the same thing but would only hold the RTNL. What do you think about this approach? Sounds great :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote: I think this doesn't completely fix it, when dev_unicast_add is interrupted by dev_mc_add before the unicast changes are performed, they will get committed in the dev_mc_add context, so we might still call change_flags with BH disabled. Taking the TX lock around the dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode should fix this. Good catch. Digging back in history it seems that you added the change_rx_flags function so that the driver didn't have to do it under TX lock, right? The problem with this is that the stack can now call change_rx_flags and set_multicast_list simultaneously which presents a potential headache for the driver author (if they were to use change_rx_flags). It seems to me what we could do is in fact separate out the part that adds the address and the part that syncs it with hardware. That way we can call the hardware from a process context later and use the RTNL to guarantee that we only enter the driver once. So dev_mc_add would look like: 1) Hold some form of lock L. 2) Modify mc list A (a copy of the current mc list). 3) Drop lock. 4) Schedule an update to the hardware. The update to the hardware would look lie: 1) Hold RTNL. 2) Hold lock L. 3) Copy list A to list B (B would be our current list). 4) Drop lock L. 5) Call the hardware. 6) Drop RTNL. For compatibility, set_multicast_list would still be invoked under the TX lock while set_rx_mode would do exactly the same thing but would only hold the RTNL. What do you think about this approach? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote: In the IPv6 case we're only changing the multicast list, so we're never calling into __dev_set_promiscuity. I actually even added a comment about this :) /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ Good point. I should stop mentally blocking out comments when I read code :) I'm a bit uncomfortable with having a function (change_flags) that's sometimes in a sleepable context and sometimes running with BH disabled. So how about splitting up the unicast/multicast entry points as follows? [NET]: Restore multicast-only __dev_mc_upload As it is dev_set_rx_mode needs to cope with being called with or without the RTNL. This is rather confusing when it comes to understanding what locks are being held at a particular point. Since the only path that calls it without the RTNL is in the multicast code, we could avoid the confusion by splitting that out. This patch restores the old __dev_mc_upload as a multicast-only variant of dev_set_rx_mode. It also gets rid of the now-unused __dev_set_rx_mod and makes dev_set_rx_mode static since it's only used in net/core/dev.c. As a side-effect this fixes the warning triggered by the RTNL_ASSERT in __dev_set_promiscuity. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 91cd3f3..e9a579e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1356,8 +1356,6 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, extern int register_netdev(struct net_device *dev); extern voidunregister_netdev(struct net_device *dev); /* Functions used for secondary unicast and multicast support */ -extern voiddev_set_rx_mode(struct net_device *dev); -extern void__dev_set_rx_mode(struct net_device *dev); extern int dev_unicast_delete(struct net_device *dev, void *addr, int alen); extern int dev_unicast_add(struct net_device *dev, void *addr, int alen); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); diff --git a/net/core/dev.c b/net/core/dev.c index 833f060..4cf985f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -967,6 +967,8 @@ void dev_load(struct net *net, const char *name) request_module(%s, name); } +static void dev_set_rx_mode(struct net_device *dev); + /** * dev_open- prepare an interface for use. * @dev: device to open @@ -2775,7 +2777,7 @@ void dev_set_allmulti(struct net_device *dev, int inc) * filtering it is put in promiscous mode while unicast addresses * are present. */ -void __dev_set_rx_mode(struct net_device *dev) +static void dev_set_rx_mode(struct net_device *dev) { /* dev_open will call this function so the list will stay sane. */ if (!(dev-flagsIFF_UP)) @@ -2784,9 +2786,11 @@ void __dev_set_rx_mode(struct net_device *dev) if (!netif_device_present(dev)) return; - if (dev-set_rx_mode) + if (dev-set_rx_mode) { + netif_tx_lock_bh(dev); dev-set_rx_mode(dev); - else { + netif_tx_unlock_bh(dev); + } else { /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ @@ -2798,18 +2802,14 @@ void __dev_set_rx_mode(struct net_device *dev) dev-uc_promisc = 0; } - if (dev-set_multicast_list) + if (dev-set_multicast_list) { + netif_tx_lock_bh(dev); dev-set_multicast_list(dev); + netif_tx_unlock_bh(dev); + } } } -void dev_set_rx_mode(struct net_device *dev) -{ - netif_tx_lock_bh(dev); - __dev_set_rx_mode(dev); - netif_tx_unlock_bh(dev); -} - int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -2885,11 +2885,9 @@ int dev_unicast_delete(struct net_device *dev, void *addr, int alen) ASSERT_RTNL(); - netif_tx_lock_bh(dev); err = __dev_addr_delete(dev-uc_list, dev-uc_count, addr, alen, 0); if (!err) - __dev_set_rx_mode(dev); - netif_tx_unlock_bh(dev); + dev_set_rx_mode(dev); return err; } EXPORT_SYMBOL(dev_unicast_delete); @@ -2911,11 +2909,9 @@ int dev_unicast_add(struct net_device *dev, void *addr, int alen) ASSERT_RTNL(); - netif_tx_lock_bh(dev);
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Tue, 2 Oct 2007, Herbert Xu wrote: On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote: I'm a bit uncomfortable with having a function (change_flags) that's sometimes in a sleepable context and sometimes running with BH disabled. So how about splitting up the unicast/multicast entry points as follows? [NET]: Restore multicast-only __dev_mc_upload As it is dev_set_rx_mode needs to cope with being called with or without the RTNL. This is rather confusing when it comes to understanding what locks are being held at a particular point. Since the only path that calls it without the RTNL is in the multicast code, we could avoid the confusion by splitting that out. This patch restores the old __dev_mc_upload as a multicast-only variant of dev_set_rx_mode. It also gets rid of the now-unused __dev_set_rx_mod and makes dev_set_rx_mode static since it's only used in net/core/dev.c. As a side-effect this fixes the warning triggered by the RTNL_ASSERT in __dev_set_promiscuity. I think this doesn't completely fix it, when dev_unicast_add is interrupted by dev_mc_add before the unicast changes are performed, they will get committed in the dev_mc_add context, so we might still call change_flags with BH disabled. Taking the TX lock around the dev-uc_count and dev-uc_promisc checks and changes in __dev_set_rx_mode should fix this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu wrote: On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote: For unicast addresses its not strictly necessary since they may only be changed under the RTNL anyway. The reason why it takes the tx_lock is for consistency with multicast address handling, which can't rely on the RTNL since IPv6 changes them from BH context. The idea was that the -set_rx_mode function should handle both secondary unicast and multicast addresses for simplicity. In any case, coming back to the original question, the RTNL assertion is simply wrong in this case because if we're being called from IPv6 then the RTNL won't even be held. So I think we need to 1) Move the assert into dev_set_promiscuity. 2) Take the TX lock in dev_set_promiscuity. In the IPv6 case we're only changing the multicast list, so we're never calling into __dev_set_promiscuity. I actually even added a comment about this :) /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ I would prefer to keep the ASSERT_RTNL in __dev_set_promiscuity since it also covers the __dev_set_rx_mode path. How about adding an ASSERT_RTNL_ATOMIC without the might_sleep or simply open coding it? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu wrote: Eric W. Biederman [EMAIL PROTECTED] wrote: Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. Actually holding the TX lock here is a bug. We're going to call down into the hardware with __dev_set_promiscuity, which may sleep (think USB NICs), so we definitely shouldn't be holding any spin locks. Patrick, could we avoid taking the TX lock here somehow? For unicast addresses its not strictly necessary since they may only be changed under the RTNL anyway. The reason why it takes the tx_lock is for consistency with multicast address handling, which can't rely on the RTNL since IPv6 changes them from BH context. The idea was that the -set_rx_mode function should handle both secondary unicast and multicast addresses for simplicity. But I don't understand the problem, if this doesn't work for unicast addresses, why does it work for multicast addresses and -set_multicast_list? I had a quick look at some USB network drivers but couldn't spot anything special .. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Herbert Xu [EMAIL PROTECTED] writes: Eric W. Biederman [EMAIL PROTECTED] wrote: Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. Actually holding the TX lock here is a bug. We're going to call down into the hardware with __dev_set_promiscuity, which may sleep (think USB NICs), so we definitely shouldn't be holding any spin locks. Regardless of the correctness of where we have ASSERT_RTNL. I think not actually taking the mutex on the assertion failure path (just so we can release it), is still a good deal regardless. For this particular call site clearly we need to look at what is happening a little more. The obvious thing would be to add an explicit might_sleep if we are calling code that can sleep. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Eric W. Biederman wrote: Regardless of the correctness of where we have ASSERT_RTNL. I think not actually taking the mutex on the assertion failure path (just so we can release it), is still a good deal regardless. Agreed. I actually have an identical patch somewhere that I wanted to submit soon. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote: For unicast addresses its not strictly necessary since they may only be changed under the RTNL anyway. The reason why it takes the tx_lock is for consistency with multicast address handling, which can't rely on the RTNL since IPv6 changes them from BH context. The idea was that the -set_rx_mode function should handle both secondary unicast and multicast addresses for simplicity. OK, the documentation (which predates USB NIC drivers) disagrees with me and dev-set_multicast will always operate with the xmit lock. This isn't very nice for USB drivers since they have to leave the change request hanging after dev-set_multicast returns. I suppose the one thing that guarantees this will work is the S in USB :) However, at least one USB driver is buggy (kaweth) in not allowing another set_multicast until the previous one is done, which can lead to silent losses of set_multicast calls since it's a void. In any case, coming back to the original question, the RTNL assertion is simply wrong in this case because if we're being called from IPv6 then the RTNL won't even be held. So I think we need to 1) Move the assert into dev_set_promiscuity. 2) Take the TX lock in dev_set_promiscuity. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Sat, Sep 29, 2007 at 11:18:18AM -0600, Eric W. Biederman wrote: Regardless of the correctness of where we have ASSERT_RTNL. I think not actually taking the mutex on the assertion failure path (just so we can release it), is still a good deal regardless. Provided that you add a might_sleep call in there so that if somebody does this under locks it'll complain then I agree. Checking RTNL under spin locks is almost certainly the sign of a bug. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rtnl: Simplify ASSERT_RTNL
Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. However we can avoid the complaint and make the ASSERT_RTNL code cheaper, faster and more obvious by simply calling mutex_is_locked. So this patch adds rtnl_is_locked (which does mutex_is_locked on the rtnl_mutex) and changes ASSERT_RTNL to use that. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/rtnetlink.h |4 ++-- net/core/rtnetlink.c |5 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index dff3192..9c21e45 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -714,13 +714,13 @@ extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change); extern void rtnl_lock(void); extern void rtnl_unlock(void); extern int rtnl_trylock(void); +extern int rtnl_is_locked(void); extern void rtnetlink_init(void); extern void __rtnl_unlock(void); #define ASSERT_RTNL() do { \ - if (unlikely(rtnl_trylock())) { \ - rtnl_unlock(); \ + if (unlikely(!rtnl_is_locked())) { \ printk(KERN_ERR RTNL: assertion failed at %s (%d)\n, \ __FILE__, __LINE__); \ dump_stack(); \ diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 739fbad..8bc68e6 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -85,6 +85,11 @@ int rtnl_trylock(void) return mutex_trylock(rtnl_mutex); } +int rtnl_is_locked(void) +{ + return mutex_is_locked(rtnl_mutex); +} + int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len) { memset(tb, 0, sizeof(struct rtattr*)*maxattr); -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
Eric W. Biederman [EMAIL PROTECTED] wrote: Currently we have the call path: macvlan_open - dev_unicast_add - __dev_set_rx_mode - __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock When mutex debugging is on taking a mutex complains if we are not allowed to sleep. At that point we have called netif_tx_lock_bh so we are clearly not allowed to sleep. Arguably this is not a problem for mutex_trylock. Actually holding the TX lock here is a bug. We're going to call down into the hardware with __dev_set_promiscuity, which may sleep (think USB NICs), so we definitely shouldn't be holding any spin locks. Patrick, could we avoid taking the TX lock here somehow? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html