[PATCH 1/3] Bridge: RCU barriers
Use RCU macro's to ensure barrier safety on the bridge receive path. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- br-fix.orig/net/bridge/br_if.c +++ br-fix/net/bridge/br_if.c @@ -124,7 +124,7 @@ static void del_nbp(struct net_bridge_po struct net_bridge *br = p->br; struct net_device *dev = p->dev; - dev->br_port = NULL; + rcu_assign_pointer(dev->br_port, NULL); dev_set_promiscuity(dev, -1); cancel_delayed_work(&p->carrier_check); --- br-fix.orig/net/bridge/br_input.c +++ br-fix/net/bridge/br_input.c @@ -45,13 +45,20 @@ static void br_pass_frame_up(struct net_ int br_handle_frame_finish(struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; - struct net_bridge_port *p = skb->dev->br_port; - struct net_bridge *br = p->br; + struct net_bridge_port *p; + struct net_bridge *br; struct net_bridge_fdb_entry *dst; int passedup = 0; /* insert into forwarding database after filtering to avoid spoofing */ - br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + p = rcu_dereference(skb->dev->br_port); + if (unlikely(!p)) { + kfree_skb(skb); + goto out; + } + + br = p->br; + br_fdb_update(br, p, eth_hdr(skb)->h_source); if (p->state == BR_STATE_LEARNING) { kfree_skb(skb); -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Mon, Jan 23, 2006 at 12:05:26PM -0800, Stephen Hemminger wrote: > Use RCU macro's to ensure barrier safety on the bridge > receive path. Thanks for working on this Stephen. > --- br-fix.orig/net/bridge/br_if.c > +++ br-fix/net/bridge/br_if.c > @@ -124,7 +124,7 @@ static void del_nbp(struct net_bridge_po > struct net_bridge *br = p->br; > struct net_device *dev = p->dev; > > - dev->br_port = NULL; > + rcu_assign_pointer(dev->br_port, NULL); I think this barrier is in the wrong place. On the deletion path what we want to achieve is separate the zeroing of dev->br_port and the actual freeing of the br_port object through a quiescent state. This is already achieved by the RCU call further down. On the other hand, when we set dev->br_port in the first instance, we do need the barrier. So we need to turn the br_port assignment in new_nbp into an rcu_assign_pointer. In fact, we also need to rearrange the code there so that we're 100% sure that the bridge port is fully initialised before we assign dev->br_port. This is because as soon as that assignment is made packets may start getting delivered to the bridge via this device. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Thu, 26 Jan 2006 19:45:06 +1100 Herbert Xu <[EMAIL PROTECTED]> wrote: > On Mon, Jan 23, 2006 at 12:05:26PM -0800, Stephen Hemminger wrote: > > Use RCU macro's to ensure barrier safety on the bridge > > receive path. > > Thanks for working on this Stephen. > > > --- br-fix.orig/net/bridge/br_if.c > > +++ br-fix/net/bridge/br_if.c > > @@ -124,7 +124,7 @@ static void del_nbp(struct net_bridge_po > > struct net_bridge *br = p->br; > > struct net_device *dev = p->dev; > > > > - dev->br_port = NULL; > > + rcu_assign_pointer(dev->br_port, NULL); > > I think this barrier is in the wrong place. On the deletion path what > we want to achieve is separate the zeroing of dev->br_port and the > actual freeing of the br_port object through a quiescent state. This > is already achieved by the RCU call further down. That breaks because of the race (found by Xen) where an interface is being deleted from a bridge and the device is being removed. br_del_if rmmod device netlink event br_device_event ... > On the other hand, when we set dev->br_port in the first instance, > we do need the barrier. So we need to turn the br_port assignment > in new_nbp into an rcu_assign_pointer. In fact, we also need to > rearrange the code there so that we're 100% sure that the bridge > port is fully initialised before we assign dev->br_port. This is > because as soon as that assignment is made packets may start getting > delivered to the bridge via this device. > > Cheers, -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Thu, Jan 26, 2006 at 09:41:00AM -0800, Stephen Hemminger wrote: > > > > --- br-fix.orig/net/bridge/br_if.c > > > +++ br-fix/net/bridge/br_if.c > > > @@ -124,7 +124,7 @@ static void del_nbp(struct net_bridge_po > > > struct net_bridge *br = p->br; > > > struct net_device *dev = p->dev; > > > > > > - dev->br_port = NULL; > > > + rcu_assign_pointer(dev->br_port, NULL); > > > > I think this barrier is in the wrong place. On the deletion path what > > we want to achieve is separate the zeroing of dev->br_port and the > > actual freeing of the br_port object through a quiescent state. This > > is already achieved by the RCU call further down. > > That breaks because of the race (found by Xen) where an interface > is being deleted from a bridge and the device is being removed. > > br_del_if > rmmod device > netlink event > br_device_event > ... Sorry, I don't get it. How does adding a barrier in del_nbp fix this bug? In fact, as far as I can see, you need to add a pair of rcu_read_lock/rcu_read_unlock in br_device_event to make it work. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Fri, Jan 27, 2006 at 07:33:24AM +1100, herbert wrote: > > > That breaks because of the race (found by Xen) where an interface > > is being deleted from a bridge and the device is being removed. > > > > br_del_if > > rmmod device > > netlink event > > br_device_event > > ... > > Sorry, I don't get it. How does adding a barrier in del_nbp > fix this bug? > > In fact, as far as I can see, you need to add a pair of > rcu_read_lock/rcu_read_unlock in br_device_event to make it > work. Actually, this race shouldn't exist anyway because both br_del_if and br_device_event are meant to be called under the RTNL. So what's going on here? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Fri, 27 Jan 2006 09:29:12 +1100 Herbert Xu <[EMAIL PROTECTED]> wrote: > On Fri, Jan 27, 2006 at 07:33:24AM +1100, herbert wrote: > > > > > That breaks because of the race (found by Xen) where an interface > > > is being deleted from a bridge and the device is being removed. > > > > > > br_del_if > > > rmmod device > > > netlink event > > > br_device_event > > > ... > > > > Sorry, I don't get it. How does adding a barrier in del_nbp > > fix this bug? > > > > In fact, as far as I can see, you need to add a pair of > > rcu_read_lock/rcu_read_unlock in br_device_event to make it > > work. > > Actually, this race shouldn't exist anyway because both br_del_if > and br_device_event are meant to be called under the RTNL. So > what's going on here? Well, it was before I changed del_nbp to set dev->br_port to NULL. So what br_del_if would get called twice for same port. -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
Plan B: 1. Use p->br (the back pointer) as the RCU sentinel 2. Keep dev->br_port set until after the RCU transition 3. Order operations so we don't have to worry about p->br being NULL in the case of STP timers. Untested. Probably excessive use of rcu() macros. Since it is only needed in cases where no locks held (ie. !(RTNL | br_lock)) --- br-fix.orig/net/bridge/br_fdb.c +++ br-fix/net/bridge/br_fdb.c @@ -67,9 +67,9 @@ static __inline__ void fdb_delete(struct br_fdb_put(f); } -void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) +void br_fdb_changeaddr(struct net_bridge *br, struct net_bridge_port *p, + const unsigned char *newaddr) { - struct net_bridge *br = p->br; int i; spin_lock_bh(&br->hash_lock); --- br-fix.orig/net/bridge/br_if.c +++ br-fix/net/bridge/br_if.c @@ -79,24 +79,31 @@ static int port_cost(struct net_device * */ static void port_carrier_check(void *arg) { - struct net_bridge_port *p = arg; + struct net_device *dev = arg; + struct net_bridge_port *p; + struct net_bridge *br; rtnl_lock(); - if (netif_carrier_ok(p->dev)) { - u32 cost = port_cost(p->dev); + if ((p = dev->br_port) == NULL || + (br = rcu_dereference(p->br)) == NULL) + goto done; - spin_lock_bh(&p->br->lock); + if (netif_carrier_ok(dev)) { + u32 cost = port_cost(dev); + + spin_lock_bh(&br->lock); if (p->state == BR_STATE_DISABLED) { p->path_cost = cost; br_stp_enable_port(p); } - spin_unlock_bh(&p->br->lock); + spin_unlock_bh(&br->lock); } else { - spin_lock_bh(&p->br->lock); + spin_lock_bh(&br->lock); if (p->state != BR_STATE_DISABLED) br_stp_disable_port(p); - spin_unlock_bh(&p->br->lock); + spin_unlock_bh(&br->lock); } +done: rtnl_unlock(); } @@ -104,7 +111,7 @@ static void destroy_nbp(struct net_bridg { struct net_device *dev = p->dev; - p->br = NULL; + dev->br_port = NULL; p->dev = NULL; dev_put(dev); @@ -119,29 +126,25 @@ static void destroy_nbp_rcu(struct rcu_h } /* called with RTNL */ -static void del_nbp(struct net_bridge_port *p) +static void del_nbp(struct net_bridge *br, struct net_bridge_port *p) { - struct net_bridge *br = p->br; - struct net_device *dev = p->dev; - - dev->br_port = NULL; - dev_set_promiscuity(dev, -1); + dev_set_promiscuity(p->dev, -1); cancel_delayed_work(&p->carrier_check); - flush_scheduled_work(); spin_lock_bh(&br->lock); br_stp_disable_port(p); + list_del_rcu(&p->list); spin_unlock_bh(&br->lock); br_fdb_delete_by_port(br, p); - list_del_rcu(&p->list); - del_timer_sync(&p->message_age_timer); del_timer_sync(&p->forward_delay_timer); del_timer_sync(&p->hold_timer); + rcu_assign_pointer(p->br, NULL); + call_rcu(&p->rcu, destroy_nbp_rcu); } @@ -152,7 +155,7 @@ static void del_br(struct net_bridge *br list_for_each_entry_safe(p, n, &br->port_list, list) { br_sysfs_removeif(p); - del_nbp(p); + del_nbp(br, p); } del_timer_sync(&br->gc_timer); @@ -249,7 +252,7 @@ static struct net_bridge_port *new_nbp(s p->port_no = index; br_init_port(p); p->state = BR_STATE_DISABLED; - INIT_WORK(&p->carrier_check, port_carrier_check, p); + INIT_WORK(&p->carrier_check, port_carrier_check, dev); kobject_init(&p->kobj); return p; @@ -386,7 +389,7 @@ int br_add_if(struct net_bridge *br, str destroy_nbp(p); else if ((err = br_sysfs_addif(p))) - del_nbp(p); + del_nbp(br, p); else { dev_set_promiscuity(dev, 1); @@ -415,7 +418,7 @@ int br_del_if(struct net_bridge *br, str return -EINVAL; br_sysfs_removeif(p); - del_nbp(p); + del_nbp(br, p); spin_lock_bh(&br->lock); br_stp_recalculate_bridge_id(br); --- br-fix.orig/net/bridge/br_input.c +++ br-fix/net/bridge/br_input.c @@ -46,17 +46,19 @@ int br_handle_frame_finish(struct sk_buf { const unsigned char *dest = eth_hdr(skb)->h_dest; struct net_bridge_port *p = skb->dev->br_port; - struct net_bridge *br = p->br; + struct net_bridge *br; struct net_bridge_fdb_entry *dst; int passedup = 0; + br = rcu_dereference(p->br); + if (!br) + goto drop; + /* insert into forwarding database after filtering to avoid spoofing */ - br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + br_fdb_update(b
Re: [PATCH 1/3] Bridge: RCU barriers
On Thu, Jan 26, 2006 at 03:01:00PM -0800, Stephen Hemminger wrote: > > Well, it was before I changed del_nbp to set dev->br_port to NULL. > So what br_del_if would get called twice for same port. Right. I wasn't questioning the zeroing of dev->br_port. I was just saying that placing a barrier using rcu_assign_pointer when you're setting it to NULL is useless because the mechanism that is required here is the quiescent state. On the other hand, when we set dev->br_port to a non-NULL value we do need the barrier. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Thu, Jan 26, 2006 at 03:05:54PM -0800, Stephen Hemminger wrote: > Plan B: > 1. Use p->br (the back pointer) as the RCU sentinel > 2. Keep dev->br_port set until after the RCU transition > 3. Order operations so we don't have to worry about p->br being NULL >in the case of STP timers. This sounds like a good plan. I'll go over it later today. Thanks Stephen, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Thu, Jan 26, 2006 at 03:05:54PM -0800, Stephen Hemminger wrote: > Plan B: > 1. Use p->br (the back pointer) as the RCU sentinel A thought just came to mind. What about adding a dead flag (p->dead)? That means you won't have to carry around all those extra pointers. So on the deleting path, you set p->dead, wait for the state to roll over, and proceed with the deletion. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Tue, 31 Jan 2006 10:11:10 +1100 Herbert Xu <[EMAIL PROTECTED]> wrote: > On Thu, Jan 26, 2006 at 03:05:54PM -0800, Stephen Hemminger wrote: > > Plan B: > > 1. Use p->br (the back pointer) as the RCU sentinel > > A thought just came to mind. What about adding a dead flag (p->dead)? > That means you won't have to carry around all those extra pointers. > > So on the deleting path, you set p->dead, wait for the state to roll > over, and proceed with the deletion. > > Cheers, Or could use STP state, but the extra pointers aren't that much anyway. -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Bridge: RCU barriers
On Mon, Jan 30, 2006 at 03:44:15PM -0800, Stephen Hemminger wrote: > Or could use STP state, but the extra pointers aren't that much > anyway. Well it'll make the patch smaller and easier to read at least :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html