Re: [PATCH] net bridge: add null pointer check, fix panic

2013-11-11 Thread Alexander Y. Fomichev
On Thu, Jun 20, 2013 at 11:29 AM, Eric Dumazet  wrote:
> On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:
>
>> HI Eric
>> the problem is as follow:
>> br_del_if()-->del_nbp():
>>
>> list_del_rcu(&p->list);
>> dev->priv_flags &= ~IFF_BRIDGE_PORT;
>>
>> -->at this point, the nic be deleting still have rx_handler , so , may 
>> in br_handle_frame()
>> -->br_port_exists() will return false,so br_get_port_rcu() will return 
>> NULL
>> -->so in br_handle_frame , there will be a null panic.
>>
>> netdev_rx_handler_unregister(dev);
>> synchronize_net();
>
> This code is no longer like that in current tree.
> Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
> ("bridge: remove a redundant synchronize_net()")
>
>>
>>
>> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
>
> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.
>
> I would change the dismantle path to make sure br_get_port_rcu() does
> not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT
> Try something like that :
>
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 1b8b8b8..2edfb80 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
>  int br_handle_frame_finish(struct sk_buff *skb)
>  {
> const unsigned char *dest = eth_hdr(skb)->h_dest;
> -   struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> +   struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> struct net_bridge *br;
> struct net_bridge_fdb_entry *dst;
> struct net_bridge_mdb_entry *mdst;
> @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
> bool unicast = true;
> u16 vid = 0;
>
> -   if (!p || p->state == BR_STATE_DISABLED)
> +   if (p->state == BR_STATE_DISABLED)
> goto drop;
>
> if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
> @@ -142,7 +142,7 @@ drop:
>  /* note: already called with rcu_read_lock */
>  static int br_handle_local_finish(struct sk_buff *skb)
>  {
> -   struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> +   struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> u16 vid = 0;
>
> br_vlan_get_tag(skb, &vid);
> @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> if (!skb)
> return RX_HANDLER_CONSUMED;
>
> -   p = br_port_get_rcu(skb->dev);
> +   p = __br_port_get_rcu(skb->dev);
>
> if (unlikely(is_link_local_ether_addr(dest))) {
> /*
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 3be89b3..9fdd467 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -184,6 +184,11 @@ struct net_bridge_port
>
>  #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>
> +static inline struct net_bridge_port *__br_port_get_rcu(const struct 
> net_device *dev)
> +{
> +   return rcu_dereference(dev->rx_handler_data);
> +}
> +
>  static inline struct net_bridge_port *br_port_get_rcu(const struct 
> net_device *dev)
>  {
> struct net_bridge_port *port =
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.

Hello,

This commit makes trouble for current STP.
Two days ago i tried to switch to 3.10.18 and i've caught "bad magic"
on uninitialized br->lock:
./net/bridge/br_stp_bpdu.c +158 in br_stp_rcv (trace attached):

p = br_port_get_rcu(dev);

br = p->br;
spin_lock(&br->lock); <- here
---

 Bisect pointed to this commit
(linux-stable: 960b8e5018a552f62cfbc0dfe94be7b6ba178f13)
(mainline 716ec052d2280d511e10e90ad54a86f5b5d4dcc2)

As far as i can see this happens when:

a) bridge module had been loaded but there was no bridge interface,
br->lock had not been initialized.
b) interface had been in promiscuous mod (tcpdump)
c) stp broadcasts 01:80:c2:00:00:00 coming to this iface
   (llc_rcv drops PACKET_OTHERHOST to protect us in promiscuous mode
but seems like not a broadcasts)
d) and finally rx_handler_data had been initialised for this interface
( by macvlan in my case)

It seems like STP needs its own IFF_BRIDGE_PORT check.
probably an easiest option to check it in br_stp_rcv as before (or
probably in llc_rcv)...

-- 
Best regards.
   Alexander Y. Fomichev 


br_stp_rcv-spin_lock.trace
Description: Binary data


fix_stp_bridge_uninitialized.patch
Description: application/download


Re: [PATCH] net bridge: add null pointer check, fix panic

2013-06-20 Thread Eric Dumazet
On Thu, 2013-06-20 at 15:47 +0800, xiaoming gao wrote:

> 
> if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns 
> NULL, the problem is fixed.
> but the codes in mainline is still bugged, am i right??
> by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very 
> easily to reproduce.

Commit 00cfec37484761a4 has to be backported anyway, if not already
backported.

I would rather fix things properly, instead of adding defensive code.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net bridge: add null pointer check, fix panic

2013-06-20 Thread xiaoming gao
Eric Dumazet said, at 2013-6-20 15:29:
> On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:
> 
>> HI Eric
>> the problem is as follow:
>> br_del_if()-->del_nbp():
>>
>> list_del_rcu(&p->list);
>> dev->priv_flags &= ~IFF_BRIDGE_PORT;
>>
>> -->at this point, the nic be deleting still have rx_handler , so , may 
>> in br_handle_frame()
>> -->br_port_exists() will return false,so br_get_port_rcu() will return 
>> NULL
>> -->so in br_handle_frame , there will be a null panic.
>>
>> netdev_rx_handler_unregister(dev);
>> synchronize_net();
> 
> This code is no longer like that in current tree.
> Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
> ("bridge: remove a redundant synchronize_net()")
> 
>>
>>
>> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
> 
> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.
> 
> I would change the dismantle path to make sure br_get_port_rcu() does
> not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT
> 
> Try something like that :
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 1b8b8b8..2edfb80 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
>  int br_handle_frame_finish(struct sk_buff *skb)
>  {
>   const unsigned char *dest = eth_hdr(skb)->h_dest;
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
>   struct net_bridge *br;
>   struct net_bridge_fdb_entry *dst;
>   struct net_bridge_mdb_entry *mdst;
> @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>   bool unicast = true;
>   u16 vid = 0;
>  
> - if (!p || p->state == BR_STATE_DISABLED)
> + if (p->state == BR_STATE_DISABLED)
>   goto drop;
>  
>   if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
> @@ -142,7 +142,7 @@ drop:
>  /* note: already called with rcu_read_lock */
>  static int br_handle_local_finish(struct sk_buff *skb)
>  {
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
>   u16 vid = 0;
>  
>   br_vlan_get_tag(skb, &vid);
> @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
>   if (!skb)
>   return RX_HANDLER_CONSUMED;
>  
> - p = br_port_get_rcu(skb->dev);
> + p = __br_port_get_rcu(skb->dev);
>  
>   if (unlikely(is_link_local_ether_addr(dest))) {
>   /*
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 3be89b3..9fdd467 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -184,6 +184,11 @@ struct net_bridge_port
>  
>  #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>  
> +static inline struct net_bridge_port *__br_port_get_rcu(const struct 
> net_device *dev)
> +{
> + return rcu_dereference(dev->rx_handler_data);
> +}
> +
>  static inline struct net_bridge_port *br_port_get_rcu(const struct 
> net_device *dev)
>  {
>   struct net_bridge_port *port =
> 
> 

if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns 
NULL, the problem is fixed.
but the codes in mainline is still bugged, am i right??
by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very 
easily to reproduce.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net bridge: add null pointer check, fix panic

2013-06-20 Thread Eric Dumazet
On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:

> HI Eric
> the problem is as follow:
> br_del_if()-->del_nbp():
> 
> list_del_rcu(&p->list);
> dev->priv_flags &= ~IFF_BRIDGE_PORT;
> 
> -->at this point, the nic be deleting still have rx_handler , so , may in 
> br_handle_frame()
> -->br_port_exists() will return false,so br_get_port_rcu() will return 
> NULL
> -->so in br_handle_frame , there will be a null panic.
> 
> netdev_rx_handler_unregister(dev);
> synchronize_net();

This code is no longer like that in current tree.
Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
("bridge: remove a redundant synchronize_net()")

> 
> 
> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..

I claim adding NULL tests is not needed in the fast path, it was clearly
stated in the changelog.

I would change the dismantle path to make sure br_get_port_rcu() does
not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT

Try something like that :

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 1b8b8b8..2edfb80 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
 int br_handle_frame_finish(struct sk_buff *skb)
 {
const unsigned char *dest = eth_hdr(skb)->h_dest;
-   struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+   struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
struct net_bridge *br;
struct net_bridge_fdb_entry *dst;
struct net_bridge_mdb_entry *mdst;
@@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
bool unicast = true;
u16 vid = 0;
 
-   if (!p || p->state == BR_STATE_DISABLED)
+   if (p->state == BR_STATE_DISABLED)
goto drop;
 
if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
@@ -142,7 +142,7 @@ drop:
 /* note: already called with rcu_read_lock */
 static int br_handle_local_finish(struct sk_buff *skb)
 {
-   struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+   struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
u16 vid = 0;
 
br_vlan_get_tag(skb, &vid);
@@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
if (!skb)
return RX_HANDLER_CONSUMED;
 
-   p = br_port_get_rcu(skb->dev);
+   p = __br_port_get_rcu(skb->dev);
 
if (unlikely(is_link_local_ether_addr(dest))) {
/*
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3be89b3..9fdd467 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -184,6 +184,11 @@ struct net_bridge_port
 
 #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
 
+static inline struct net_bridge_port *__br_port_get_rcu(const struct 
net_device *dev)
+{
+   return rcu_dereference(dev->rx_handler_data);
+}
+
 static inline struct net_bridge_port *br_port_get_rcu(const struct net_device 
*dev)
 {
struct net_bridge_port *port =


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net bridge: add null pointer check, fix panic

2013-06-20 Thread xiaoming gao
Eric Dumazet said, at 2013-6-20 12:55:
> On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote:
>> From: newtongao 
>> Date: Wed, 19 Jun 2013 14:58:33 +0800
>> Subject: [PATCH] net bridge: add null pointer check,fix panic
>>
>> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be 
>> deleting from bridge,
>> but in function br_handle_frame and br_handle_local_finish, the pointer 
>> didn't be checked before using,
>> so all br_port_get_rcu callers must do null check,or there occurs the null 
>> pointer panic.
>>
>> kernel 3.4 also has this bug,i have verified.
>> mainline kernel still did not  check br_port_get_rcu()'s NULL pointer, but i 
>> have not tested it yet.
> 
> Please check current version before sending a patch.
> 
> This was most probably fixed in commit 00cfec37484761a44
> ("net: add a synchronize_net() in netdev_rx_handler_unregister()")
> 
> Thanks
> 
> 
HI Eric
the problem is as follow:
br_del_if()-->del_nbp():

list_del_rcu(&p->list);
dev->priv_flags &= ~IFF_BRIDGE_PORT;

-->at this point, the nic be deleting still have rx_handler , so , may in 
br_handle_frame()
-->br_port_exists() will return false,so br_get_port_rcu() will return NULL
-->so in br_handle_frame , there will be a null panic.

netdev_rx_handler_unregister(dev);
synchronize_net();


i have checked commit 00cfec37484761a44, i think it didn't fix this bug..

thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net bridge: add null pointer check, fix panic

2013-06-19 Thread Eric Dumazet
On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote:
> From: newtongao 
> Date: Wed, 19 Jun 2013 14:58:33 +0800
> Subject: [PATCH] net bridge: add null pointer check,fix panic
> 
> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be 
> deleting from bridge,
> but in function br_handle_frame and br_handle_local_finish, the pointer 
> didn't be checked before using,
> so all br_port_get_rcu callers must do null check,or there occurs the null 
> pointer panic.
> 
> kernel 3.4 also has this bug,i have verified.
> mainline kernel still did not  check br_port_get_rcu()'s NULL pointer, but i 
> have not tested it yet.

Please check current version before sending a patch.

This was most probably fixed in commit 00cfec37484761a44
("net: add a synchronize_net() in netdev_rx_handler_unregister()")

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net bridge: add null pointer check, fix panic

2013-06-19 Thread xiaoming gao
From: newtongao 
Date: Wed, 19 Jun 2013 14:58:33 +0800
Subject: [PATCH] net bridge: add null pointer check,fix panic

in kernel 3.0, br_port_get_rcu() may return NULL when network interface be 
deleting from bridge,
but in function br_handle_frame and br_handle_local_finish, the pointer didn't 
be checked before using,
so all br_port_get_rcu callers must do null check,or there occurs the null 
pointer panic.

kernel 3.4 also has this bug,i have verified.
mainline kernel still did not  check br_port_get_rcu()'s NULL pointer, but i 
have not tested it yet.

method to reproduce null pointer panic:
1. in function del_nbp,amplify the gap between "dev->priv_flags &= 
~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new 
line "while(1);".
2. create net bridge testbr and bind some network interface(e.g. testif) on it.
3. send packets to testif frequetly,such as ping testif.
4. brctl delif testbr testif.

backtrace:
[1002130.426411] FS:  7f2235153700() GS:88007b02() 
knlGS:
[1002130.524656] CS:  e033 DS:  ES:  CR0: 8005003b
[1002130.594931] CR2: 0021 CR3: 01dc7000 CR4: 
2660
[1002130.681798] DR0:  DR1:  DR2: 

[1002130.768654] DR3:  DR6: 0ff0 DR7: 
0400
[1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo 880062642000, task 
880062640800)
[1002130.957868] Stack:
[1002130.983758]  8000 88007bc06a40 88005e3c6c80 
81901d10
[1002131.073887]  88005efc2000 0001 880062643bf0 
818497d9
[1002131.163945]  88006147e090 0020 88007bc06a40 
88005e3c6c80
[1002131.254023] Call Trace:
[1002131.285086]  [] ? br_handle_frame_finish+0x2b0/0x2b0
[1002131.364677]  [] __netif_receive_skb+0x109/0x4a0
[1002131.439081]  [] __netif_receive_skb+0x442/0x4a0
[1002131.513493]  [] ? __kmalloc_node+0x3e/0x50
[1002131.582744]  [] netif_receive_skb+0x78/0x80
[1002131.653053]  [] napi_skb_finish+0x48/0x60
[1002131.721269]  [] napi_gro_receive+0xfd/0x130
[1002131.791551]  [] vlan_gro_receive+0x16/0x20
[1002131.860796]  [] igb_poll+0x6f1/0xdc0
[1002131.923871]  [] ? __schedule+0x2e5/0x810
[1002131.991071]  [] ? _raw_spin_lock_irq+0xb/0x30
[1002132.063421]  [] net_rx_action+0xa1/0x1d0
[1002132.130619]  [] __do_softirq+0x99/0x130
[1002132.196803]  [] run_ksoftirqd+0xba/0x170
[1002132.264000]  [] ? __do_softirq+0x130/0x130
[1002132.333258]  [] kthread+0x96/0xa0
[1002132.393224]  [] kernel_thread_helper+0x4/0x10
[1002132.465580]  [] ? int_ret_from_sys_call+0x7/0x1b
[1002132.541016]  [] ? retint_restore_args+0x5/0x6
[1002132.613368]  [] ? gs_change+0x13/0x13
[1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 
45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00
[1002132.835500]  0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1
[1002132.920864] RIP  [] br_handle_frame+0x107/0x260
[1002132.995296]  RSP 
[1002133.038735] CR2: 0021

Signed-off-by: Xiaoming Gao 
---
 net/bridge/br_input.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f06ee39..c8b365f 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb)
 {
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
 
-   br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
+   if (p)
+   br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
return 0;/* process further */
 }
 
@@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
return RX_HANDLER_CONSUMED;
 
p = br_port_get_rcu(skb->dev);
+   if (!p)
+   goto drop;
 
if (unlikely(is_link_local(dest))) {
/* Pause frames shouldn't be passed up by driver anyway */
-- 
1.7.1



From 2b39c4ccd812723891117184b297f6886dcfd976 Mon Sep 17 00:00:00 2001
From: newtongao 
Date: Wed, 19 Jun 2013 14:58:33 +0800
Subject: [PATCH] net bridge: add null pointer check,fix panic

in kernel 3.0, br_port_get_rcu() may return NULL when network interface be 
deleting from bridge,
but in function br_handle_frame and br_handle_local_finish, the pointer didn't 
be checked before using,
so all br_port_get_rcu callers must do null check,or there occurs the null 
pointer panic.

kernel 3.4 also has this bug,i have verified.
mainline kernel still did not  check br_port_get_rcu()'s NULL pointer, but i 
have not tested it yet.

method to reproduce null pointer panic:
1. in function del_nbp,amplify the gap between "dev->priv_flags &= 
~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new 
line "while(1);".
2. create n