Re: [PATCH] net bridge: add null pointer check, fix panic
On Thu, Jun 20, 2013 at 11:29 AM, Eric Dumazet wrote: > On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > >> HI Eric >> the problem is as follow: >> br_del_if()-->del_nbp(): >> >> list_del_rcu(&p->list); >> dev->priv_flags &= ~IFF_BRIDGE_PORT; >> >> -->at this point, the nic be deleting still have rx_handler , so , may >> in br_handle_frame() >> -->br_port_exists() will return false,so br_get_port_rcu() will return >> NULL >> -->so in br_handle_frame , there will be a null panic. >> >> netdev_rx_handler_unregister(dev); >> synchronize_net(); > > This code is no longer like that in current tree. > Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd > ("bridge: remove a redundant synchronize_net()") > >> >> >> i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. > > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. > > I would change the dismantle path to make sure br_get_port_rcu() does > not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT > Try something like that : > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 1b8b8b8..2edfb80 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) > int br_handle_frame_finish(struct sk_buff *skb) > { > const unsigned char *dest = eth_hdr(skb)->h_dest; > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > struct net_bridge *br; > struct net_bridge_fdb_entry *dst; > struct net_bridge_mdb_entry *mdst; > @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) > bool unicast = true; > u16 vid = 0; > > - if (!p || p->state == BR_STATE_DISABLED) > + if (p->state == BR_STATE_DISABLED) > goto drop; > > if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) > @@ -142,7 +142,7 @@ drop: > /* note: already called with rcu_read_lock */ > static int br_handle_local_finish(struct sk_buff *skb) > { > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > u16 vid = 0; > > br_vlan_get_tag(skb, &vid); > @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > if (!skb) > return RX_HANDLER_CONSUMED; > > - p = br_port_get_rcu(skb->dev); > + p = __br_port_get_rcu(skb->dev); > > if (unlikely(is_link_local_ether_addr(dest))) { > /* > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 3be89b3..9fdd467 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -184,6 +184,11 @@ struct net_bridge_port > > #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) > > +static inline struct net_bridge_port *__br_port_get_rcu(const struct > net_device *dev) > +{ > + return rcu_dereference(dev->rx_handler_data); > +} > + > static inline struct net_bridge_port *br_port_get_rcu(const struct > net_device *dev) > { > struct net_bridge_port *port = > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. Hello, This commit makes trouble for current STP. Two days ago i tried to switch to 3.10.18 and i've caught "bad magic" on uninitialized br->lock: ./net/bridge/br_stp_bpdu.c +158 in br_stp_rcv (trace attached): p = br_port_get_rcu(dev); br = p->br; spin_lock(&br->lock); <- here --- Bisect pointed to this commit (linux-stable: 960b8e5018a552f62cfbc0dfe94be7b6ba178f13) (mainline 716ec052d2280d511e10e90ad54a86f5b5d4dcc2) As far as i can see this happens when: a) bridge module had been loaded but there was no bridge interface, br->lock had not been initialized. b) interface had been in promiscuous mod (tcpdump) c) stp broadcasts 01:80:c2:00:00:00 coming to this iface (llc_rcv drops PACKET_OTHERHOST to protect us in promiscuous mode but seems like not a broadcasts) d) and finally rx_handler_data had been initialised for this interface ( by macvlan in my case) It seems like STP needs its own IFF_BRIDGE_PORT check. probably an easiest option to check it in br_stp_rcv as before (or probably in llc_rcv)... -- Best regards. Alexander Y. Fomichev br_stp_rcv-spin_lock.trace Description: Binary data fix_stp_bridge_uninitialized.patch Description: application/download
Re: [PATCH] net bridge: add null pointer check, fix panic
On Thu, 2013-06-20 at 15:47 +0800, xiaoming gao wrote: > > if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns > NULL, the problem is fixed. > but the codes in mainline is still bugged, am i right?? > by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very > easily to reproduce. Commit 00cfec37484761a4 has to be backported anyway, if not already backported. I would rather fix things properly, instead of adding defensive code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net bridge: add null pointer check, fix panic
Eric Dumazet said, at 2013-6-20 15:29: > On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > >> HI Eric >> the problem is as follow: >> br_del_if()-->del_nbp(): >> >> list_del_rcu(&p->list); >> dev->priv_flags &= ~IFF_BRIDGE_PORT; >> >> -->at this point, the nic be deleting still have rx_handler , so , may >> in br_handle_frame() >> -->br_port_exists() will return false,so br_get_port_rcu() will return >> NULL >> -->so in br_handle_frame , there will be a null panic. >> >> netdev_rx_handler_unregister(dev); >> synchronize_net(); > > This code is no longer like that in current tree. > Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd > ("bridge: remove a redundant synchronize_net()") > >> >> >> i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. > > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. > > I would change the dismantle path to make sure br_get_port_rcu() does > not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT > > Try something like that : > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 1b8b8b8..2edfb80 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) > int br_handle_frame_finish(struct sk_buff *skb) > { > const unsigned char *dest = eth_hdr(skb)->h_dest; > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > struct net_bridge *br; > struct net_bridge_fdb_entry *dst; > struct net_bridge_mdb_entry *mdst; > @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) > bool unicast = true; > u16 vid = 0; > > - if (!p || p->state == BR_STATE_DISABLED) > + if (p->state == BR_STATE_DISABLED) > goto drop; > > if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) > @@ -142,7 +142,7 @@ drop: > /* note: already called with rcu_read_lock */ > static int br_handle_local_finish(struct sk_buff *skb) > { > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > u16 vid = 0; > > br_vlan_get_tag(skb, &vid); > @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > if (!skb) > return RX_HANDLER_CONSUMED; > > - p = br_port_get_rcu(skb->dev); > + p = __br_port_get_rcu(skb->dev); > > if (unlikely(is_link_local_ether_addr(dest))) { > /* > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 3be89b3..9fdd467 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -184,6 +184,11 @@ struct net_bridge_port > > #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) > > +static inline struct net_bridge_port *__br_port_get_rcu(const struct > net_device *dev) > +{ > + return rcu_dereference(dev->rx_handler_data); > +} > + > static inline struct net_bridge_port *br_port_get_rcu(const struct > net_device *dev) > { > struct net_bridge_port *port = > > if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns NULL, the problem is fixed. but the codes in mainline is still bugged, am i right?? by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very easily to reproduce. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net bridge: add null pointer check, fix panic
On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > HI Eric > the problem is as follow: > br_del_if()-->del_nbp(): > > list_del_rcu(&p->list); > dev->priv_flags &= ~IFF_BRIDGE_PORT; > > -->at this point, the nic be deleting still have rx_handler , so , may in > br_handle_frame() > -->br_port_exists() will return false,so br_get_port_rcu() will return > NULL > -->so in br_handle_frame , there will be a null panic. > > netdev_rx_handler_unregister(dev); > synchronize_net(); This code is no longer like that in current tree. Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd ("bridge: remove a redundant synchronize_net()") > > > i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. I claim adding NULL tests is not needed in the fast path, it was clearly stated in the changelog. I would change the dismantle path to make sure br_get_port_rcu() does not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT Try something like that : diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 1b8b8b8..2edfb80 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) int br_handle_frame_finish(struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; - struct net_bridge_port *p = br_port_get_rcu(skb->dev); + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); struct net_bridge *br; struct net_bridge_fdb_entry *dst; struct net_bridge_mdb_entry *mdst; @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) bool unicast = true; u16 vid = 0; - if (!p || p->state == BR_STATE_DISABLED) + if (p->state == BR_STATE_DISABLED) goto drop; if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) @@ -142,7 +142,7 @@ drop: /* note: already called with rcu_read_lock */ static int br_handle_local_finish(struct sk_buff *skb) { - struct net_bridge_port *p = br_port_get_rcu(skb->dev); + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); u16 vid = 0; br_vlan_get_tag(skb, &vid); @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) if (!skb) return RX_HANDLER_CONSUMED; - p = br_port_get_rcu(skb->dev); + p = __br_port_get_rcu(skb->dev); if (unlikely(is_link_local_ether_addr(dest))) { /* diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 3be89b3..9fdd467 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -184,6 +184,11 @@ struct net_bridge_port #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev) +{ + return rcu_dereference(dev->rx_handler_data); +} + static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) { struct net_bridge_port *port = -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net bridge: add null pointer check, fix panic
Eric Dumazet said, at 2013-6-20 12:55: > On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote: >> From: newtongao >> Date: Wed, 19 Jun 2013 14:58:33 +0800 >> Subject: [PATCH] net bridge: add null pointer check,fix panic >> >> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be >> deleting from bridge, >> but in function br_handle_frame and br_handle_local_finish, the pointer >> didn't be checked before using, >> so all br_port_get_rcu callers must do null check,or there occurs the null >> pointer panic. >> >> kernel 3.4 also has this bug,i have verified. >> mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i >> have not tested it yet. > > Please check current version before sending a patch. > > This was most probably fixed in commit 00cfec37484761a44 > ("net: add a synchronize_net() in netdev_rx_handler_unregister()") > > Thanks > > HI Eric the problem is as follow: br_del_if()-->del_nbp(): list_del_rcu(&p->list); dev->priv_flags &= ~IFF_BRIDGE_PORT; -->at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame() -->br_port_exists() will return false,so br_get_port_rcu() will return NULL -->so in br_handle_frame , there will be a null panic. netdev_rx_handler_unregister(dev); synchronize_net(); i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net bridge: add null pointer check, fix panic
On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote: > From: newtongao > Date: Wed, 19 Jun 2013 14:58:33 +0800 > Subject: [PATCH] net bridge: add null pointer check,fix panic > > in kernel 3.0, br_port_get_rcu() may return NULL when network interface be > deleting from bridge, > but in function br_handle_frame and br_handle_local_finish, the pointer > didn't be checked before using, > so all br_port_get_rcu callers must do null check,or there occurs the null > pointer panic. > > kernel 3.4 also has this bug,i have verified. > mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i > have not tested it yet. Please check current version before sending a patch. This was most probably fixed in commit 00cfec37484761a44 ("net: add a synchronize_net() in netdev_rx_handler_unregister()") Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] net bridge: add null pointer check, fix panic
From: newtongao Date: Wed, 19 Jun 2013 14:58:33 +0800 Subject: [PATCH] net bridge: add null pointer check,fix panic in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. kernel 3.4 also has this bug,i have verified. mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. method to reproduce null pointer panic: 1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);". 2. create net bridge testbr and bind some network interface(e.g. testif) on it. 3. send packets to testif frequetly,such as ping testif. 4. brctl delif testbr testif. backtrace: [1002130.426411] FS: 7f2235153700() GS:88007b02() knlGS: [1002130.524656] CS: e033 DS: ES: CR0: 8005003b [1002130.594931] CR2: 0021 CR3: 01dc7000 CR4: 2660 [1002130.681798] DR0: DR1: DR2: [1002130.768654] DR3: DR6: 0ff0 DR7: 0400 [1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo 880062642000, task 880062640800) [1002130.957868] Stack: [1002130.983758] 8000 88007bc06a40 88005e3c6c80 81901d10 [1002131.073887] 88005efc2000 0001 880062643bf0 818497d9 [1002131.163945] 88006147e090 0020 88007bc06a40 88005e3c6c80 [1002131.254023] Call Trace: [1002131.285086] [] ? br_handle_frame_finish+0x2b0/0x2b0 [1002131.364677] [] __netif_receive_skb+0x109/0x4a0 [1002131.439081] [] __netif_receive_skb+0x442/0x4a0 [1002131.513493] [] ? __kmalloc_node+0x3e/0x50 [1002131.582744] [] netif_receive_skb+0x78/0x80 [1002131.653053] [] napi_skb_finish+0x48/0x60 [1002131.721269] [] napi_gro_receive+0xfd/0x130 [1002131.791551] [] vlan_gro_receive+0x16/0x20 [1002131.860796] [] igb_poll+0x6f1/0xdc0 [1002131.923871] [] ? __schedule+0x2e5/0x810 [1002131.991071] [] ? _raw_spin_lock_irq+0xb/0x30 [1002132.063421] [] net_rx_action+0xa1/0x1d0 [1002132.130619] [] __do_softirq+0x99/0x130 [1002132.196803] [] run_ksoftirqd+0xba/0x170 [1002132.264000] [] ? __do_softirq+0x130/0x130 [1002132.333258] [] kthread+0x96/0xa0 [1002132.393224] [] kernel_thread_helper+0x4/0x10 [1002132.465580] [] ? int_ret_from_sys_call+0x7/0x1b [1002132.541016] [] ? retint_restore_args+0x5/0x6 [1002132.613368] [] ? gs_change+0x13/0x13 [1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00 [1002132.835500] 0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1 [1002132.920864] RIP [] br_handle_frame+0x107/0x260 [1002132.995296] RSP [1002133.038735] CR2: 0021 Signed-off-by: Xiaoming Gao --- net/bridge/br_input.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index f06ee39..c8b365f 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb) { struct net_bridge_port *p = br_port_get_rcu(skb->dev); - br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + if (p) + br_fdb_update(p->br, p, eth_hdr(skb)->h_source); return 0;/* process further */ } @@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; p = br_port_get_rcu(skb->dev); + if (!p) + goto drop; if (unlikely(is_link_local(dest))) { /* Pause frames shouldn't be passed up by driver anyway */ -- 1.7.1 From 2b39c4ccd812723891117184b297f6886dcfd976 Mon Sep 17 00:00:00 2001 From: newtongao Date: Wed, 19 Jun 2013 14:58:33 +0800 Subject: [PATCH] net bridge: add null pointer check,fix panic in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. kernel 3.4 also has this bug,i have verified. mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. method to reproduce null pointer panic: 1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);". 2. create n