Re: [PATCH] Fix race condition about network device name allocation
On Wed, Jun 13, 2007 at 09:36:31AM -0700, Stephen Hemminger wrote: > On Wed, 13 Jun 2007 12:45:21 +0300 > Dan Aloni <[EMAIL PROTECTED]> wrote: > > > On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote: > > > Kenji Kaneshige found this race between device removal and > > > registration. On unregister it is possible for the old device to > > > exist, because sysfs file is still open. A new device with 'eth%d' > > > will select the same name, but sysfs kobject register will fial. > > > > > > The following changes the shutdown order slightly. It hold a removes the > > > sysfs > > > entries earlier (on unregister_netdevice), but holds a kobject reference. > > > Then when todo runs the actual last put free happens. > > > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > > That patch breaks the bonding driver. After reverting it I avoid this crash: > > >[..] > > > > I assume this happens when bonded slave device is removed? Yes, it's just a simple removal via sysfs. > Which kernel version? 2.6.21.5 -- Dan Aloni XIV LTD, http://www.xivstorage.com da-x (at) monatomic.org, dan (at) xiv.co.il - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix race condition about network device name allocation
The following patch (based on a patch from Stephen Hemminger <[EMAIL PROTECTED]>) removes use after free conditions in the unregister path for the bonding master. Without this patch, an operation of the form "echo -bond0 > /sys/class/net/bonding_masters" would trigger a NULL pointer dereference in sysfs. I was not able to induce the failure with the non-sysfs code path, but for consistency I updated that code as well. I also did some testing of the bonding /proc file being open while the bond is being deleted, and didn't see any problems there. Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 223517d..6287ffb 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4345,8 +4345,8 @@ static void bond_free_all(void) bond_mc_list_destroy(bond); /* Release the bonded slaves */ bond_release_all(bond_dev); - unregister_netdevice(bond_dev); bond_deinit(bond_dev); + unregister_netdevice(bond_dev); } #ifdef CONFIG_PROC_FS diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index a122baa..60cccf2 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -164,9 +164,9 @@ static ssize_t bonding_store_bonds(struct class *cls, const char *buffer, size_t printk(KERN_INFO DRV_NAME ": %s is being deleted...\n", bond->dev->name); - unregister_netdevice(bond->dev); bond_deinit(bond->dev); bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond->dev); rtnl_unlock(); goto out; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix race condition about network device name allocation
Bonding refers to device after unregistering. This has always been a dangerous thing. The following UNTESTED should fix the problem. --- a/drivers/net/bonding/bond_sysfs.c 2007-06-13 15:48:37.0 -0700 +++ b/drivers/net/bonding/bond_sysfs.c 2007-06-13 15:49:17.0 -0700 @@ -164,9 +164,10 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME ": %s is being deleted...\n", bond->dev->name); - unregister_netdevice(bond->dev); + bond_deinit(bond->dev); bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond->dev); rtnl_unlock(); goto out; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix race condition about network device name allocation
On Wed, 13 Jun 2007 12:45:21 +0300 Dan Aloni <[EMAIL PROTECTED]> wrote: > On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote: > > Kenji Kaneshige found this race between device removal and > > registration. On unregister it is possible for the old device to > > exist, because sysfs file is still open. A new device with 'eth%d' > > will select the same name, but sysfs kobject register will fial. > > > > The following changes the shutdown order slightly. It hold a removes the > > sysfs > > entries earlier (on unregister_netdevice), but holds a kobject reference. > > Then when todo runs the actual last put free happens. > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > That patch breaks the bonding driver. After reverting it I avoid this crash: > > <6>[1115260.637351] Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007) > <6>[1115260.637358] bonding: MII link monitoring set to 100 ms > <6>[1115260.637767] bonding: bond0 is being deleted... > <1>[1115260.695812] Unable to handle kernel NULL pointer dereference at > 0020 RIP: > <1>[1115260.701504] [] __lookup_hash+0x22/0x150 > <6>[1115260.709754] PGD 798bf067 PUD 798c4067 PMD 0 > <0>[1115260.714394] Oops: [1] SMP > [1]kdb> > [1]kdb> btc > btc: cpu status: Currently on cpu 1 > Available cpus: 0(I), 1 > Stack traceback for pid 0 > 0x8058542000 10 I 0x80585710 swapper > rspripFunction (args) > 0x8066feb0 0x8046783f thread_return+0x5e > 0x8066fec0 0x80469df9 _spin_unlock_irq+0x9 > 0x8066ff28 0x80209176 mwait_idle+0x46 > 0x8066ff50 0x802088e2 enter_idle+0x22 > 0x8066ff60 0x802090bc cpu_idle+0x5c > 0x8066ff80 0x80207146 rest_init+0x26 > 0x8066ff90 0x80678c1a start_kernel+0x2ea > 0x8066ffc0 0x8067815f _sinittext+0x15f > Stack traceback for pid 66 > 0x81006a411850 661 11 R 0x81006a411b40 > *platform_node > rspripFunction (args) > 0x81006a46dd98 0x802805e2 __lookup_hash+0x22 > 0x81006a46de00 0x80280cba lookup_one_len+0x9a > 0x81006a46de20 0x802be671 sysfs_remove_group+0x31 > 0x81006a46de50 0x8800fe4a [bonding]bond_destroy_sysfs_entry+0x1a > 0x81006a46de60 0x88011974 [bonding]bonding_store_bonds+0x214 > 0x81006a46deb0 0x8037c9d4 class_attr_store+0x24 > [1]more> > 0x81006a46dec0 0x802bbe30 sysfs_write_file+0x100 > 0x81006a46df10 0x80277d7e vfs_write+0xbe > 0x81006a46df40 0x80278400 sys_write+0x50 > 0x81006a46df80 0x80209e6e system_call+0x7e > I assume this happens when bonded slave device is removed? Which kernel version? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix race condition about network device name allocation
On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote: > Kenji Kaneshige found this race between device removal and > registration. On unregister it is possible for the old device to > exist, because sysfs file is still open. A new device with 'eth%d' > will select the same name, but sysfs kobject register will fial. > > The following changes the shutdown order slightly. It hold a removes the sysfs > entries earlier (on unregister_netdevice), but holds a kobject reference. > Then when todo runs the actual last put free happens. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> That patch breaks the bonding driver. After reverting it I avoid this crash: <6>[1115260.637351] Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007) <6>[1115260.637358] bonding: MII link monitoring set to 100 ms <6>[1115260.637767] bonding: bond0 is being deleted... <1>[1115260.695812] Unable to handle kernel NULL pointer dereference at 0020 RIP: <1>[1115260.701504] [] __lookup_hash+0x22/0x150 <6>[1115260.709754] PGD 798bf067 PUD 798c4067 PMD 0 <0>[1115260.714394] Oops: [1] SMP [1]kdb> [1]kdb> btc btc: cpu status: Currently on cpu 1 Available cpus: 0(I), 1 Stack traceback for pid 0 0x8058542000 10 I 0x80585710 swapper rspripFunction (args) 0x8066feb0 0x8046783f thread_return+0x5e 0x8066fec0 0x80469df9 _spin_unlock_irq+0x9 0x8066ff28 0x80209176 mwait_idle+0x46 0x8066ff50 0x802088e2 enter_idle+0x22 0x8066ff60 0x802090bc cpu_idle+0x5c 0x8066ff80 0x80207146 rest_init+0x26 0x8066ff90 0x80678c1a start_kernel+0x2ea 0x8066ffc0 0x8067815f _sinittext+0x15f Stack traceback for pid 66 0x81006a411850 661 11 R 0x81006a411b40 *platform_node rspripFunction (args) 0x81006a46dd98 0x802805e2 __lookup_hash+0x22 0x81006a46de00 0x80280cba lookup_one_len+0x9a 0x81006a46de20 0x802be671 sysfs_remove_group+0x31 0x81006a46de50 0x8800fe4a [bonding]bond_destroy_sysfs_entry+0x1a 0x81006a46de60 0x88011974 [bonding]bonding_store_bonds+0x214 0x81006a46deb0 0x8037c9d4 class_attr_store+0x24 [1]more> 0x81006a46dec0 0x802bbe30 sysfs_write_file+0x100 0x81006a46df10 0x80277d7e vfs_write+0xbe 0x81006a46df40 0x80278400 sys_write+0x50 0x81006a46df80 0x80209e6e system_call+0x7e -- Dan Aloni XIV LTD, http://www.xivstorage.com da-x (at) monatomic.org, dan (at) xiv.co.il - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix race condition about network device name allocation
Kenji Kaneshige found this race between device removal and registration. On unregister it is possible for the old device to exist, because sysfs file is still open. A new device with 'eth%d' will select the same name, but sysfs kobject register will fial. The following changes the shutdown order slightly. It hold a removes the sysfs entries earlier (on unregister_netdevice), but holds a kobject reference. Then when todo runs the actual last put free happens. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/core/dev.c | 10 ++ net/core/net-sysfs.c |8 +++- 2 files changed, 13 insertions(+), 5 deletions(-) --- 2.6.21-rc1.orig/net/core/dev.c 2007-05-11 11:02:55.0 -0700 +++ 2.6.21-rc1/net/core/dev.c 2007-05-14 08:44:52.0 -0700 @@ -3245,7 +3245,6 @@ void netdev_run_todo(void) continue; } - netdev_unregister_sysfs(dev); dev->reg_state = NETREG_UNREGISTERED; netdev_wait_allrefs(dev); @@ -3256,11 +3255,11 @@ void netdev_run_todo(void) BUG_TRAP(!dev->ip6_ptr); BUG_TRAP(!dev->dn_ptr); - /* It must be the very last action, -* after this 'dev' may point to freed up memory. -*/ if (dev->destructor) dev->destructor(dev); + + /* Free network device */ + kobject_put(&dev->dev.kobj); } out: @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev /* Notifier chain MUST detach us from master device. */ BUG_TRAP(!dev->master); + /* Remove entries from sysfs */ + netdev_unregister_sysfs(dev); + /* Finish processing unregister after unlock */ net_set_todo(dev); --- 2.6.21-rc1.orig/net/core/net-sysfs.c2007-05-11 11:02:55.0 -0700 +++ 2.6.21-rc1/net/core/net-sysfs.c 2007-05-14 08:44:52.0 -0700 @@ -456,9 +456,15 @@ static struct class net_class = { #endif }; +/* Delete sysfs entries but hold kobject reference until after all + * netdev references are gone. + */ void netdev_unregister_sysfs(struct net_device * net) { - device_del(&(net->dev)); + struct device *dev = &(net->dev); + + kobject_get(&dev->kobj); + device_del(dev); } /* Create sysfs entries for network device. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][PATCH] Fix race condition about network device name allocation
On Mon, 14 May 2007 10:33:01 +0900 Kenji Kaneshige <[EMAIL PROTECTED]> wrote: > Hi Stephen, > > Thank you for comments. I'll try your patch. > > I have one concern about your patch, though I don't know very much > about netdev codes. > > @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev > > /* Notifier chain MUST detach us from master device. */ > > BUG_TRAP(!dev->master); > > > > + /* Remove entries from sysfs */ > > + netdev_unregister_sysfs(dev); > > + > > /* Finish processing unregister after unlock */ > > net_set_todo(dev); > > > > With your patch, the netdev_unregister_sysfs() will be called > earlier than now. Does it have no side effect? I'm worrying > about if there are somebody that depend on sysfs entry until > net_run_todo() is called. The unregister_sysfs() removes the entry in /sys/class/net and then decrements the reference count. When ref count goes to zero, the kobject_release callback gets called and which does kfree(). Changing the shutdown order makes the /sys/class/net entry disappear sooner. In order to prevent the free from happening until after all the dev_put() calls have happened, the reference count for the kobject needs to be increased. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][PATCH] Fix race condition about network device name allocation
Hi, > How about the following (untested), instead. It hold a removes the sysfs > entries earlier (on unregister_netdevice), but holds a kobject reference. > Then when todo runs the actual last put free happens. I've confirmed the problem is fixed by Stephen's patch on my reproduction environment. Thanks, Kenji Kaneshige 2007-05-11 (金) の 09:25 -0700 に Stephen Hemminger さんは書きました: > On Fri, 11 May 2007 14:40:45 +0900 > Kenji Kaneshige <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I encountered the following error when I was hot-plugging network card > > using pci hotplug driver. > > > > kobject_add failed for eth8 with -EEXIST, don't try to register things with > > the same name in the same directory. > > > > Call Trace: > > [] show_stack+0x40/0xa0 > > sp=e000652cfbf0 bsp=e000652c9450 > > [] dump_stack+0x30/0x60 > > sp=e000652cfdc0 bsp=e000652c9438 > > [] kobject_shadow_add+0x3b0/0x440 > > sp=e000652cfdc0 bsp=e000652c93f0 > > [] kobject_add+0x30/0x60 > > sp=e000652cfdc0 bsp=e000652c93d0 > > [] device_add+0x160/0xf40 > > sp=e000652cfdc0 bsp=e000652c9358 > > [] netdev_register_sysfs+0xc0/0xe0 > > sp=e000652cfdc0 bsp=e000652c9328 > > [] register_netdevice+0x600/0x7e0 > > sp=e000652cfdc0 bsp=e000652c92e8 > > [] register_netdev+0x70/0xc0 > > sp=e000652cfdd0 bsp=e000652c92c0 > > [] e1000_probe+0x1710/0x1a00 [e1000] > > sp=e000652cfdd0 bsp=e000652c9258 > > [] pci_device_probe+0xc0/0x120 > > sp=e000652cfde0 bsp=e000652c9218 > > [] really_probe+0x1c0/0x300 > > sp=e000652cfde0 bsp=e000652c91c8 > > [] driver_probe_device+0x1a0/0x1c0 > > sp=e000652cfde0 bsp=e000652c9198 > > [] __device_attach+0x30/0x60 > > sp=e000652cfde0 bsp=e000652c9170 > > [] bus_for_each_drv+0x80/0x120 > > sp=e000652cfde0 bsp=e000652c9138 > > [] device_attach+0xa0/0x100 > > sp=e000652cfe00 bsp=e000652c9110 > > [] bus_attach_device+0x60/0xe0 > > sp=e000652cfe00 bsp=e000652c90e0 > > [] device_add+0x950/0xf40 > > sp=e000652cfe00 bsp=e000652c9068 > > [] pci_bus_add_device+0x20/0x100 > > sp=e000652cfe00 bsp=e000652c9040 > > [] pci_bus_add_devices+0x50/0x320 > > sp=e000652cfe00 bsp=e000652c9010 > > [] shpchp_configure_device+0x460/0x4a0 [shpchp] > > sp=e000652cfe00 bsp=e000652c8fc0 > > [] board_added+0x720/0x8c0 [shpchp] > > sp=e000652cfe00 bsp=e000652c8f80 > > [] shpchp_enable_slot+0x920/0xa40 [shpchp] > > sp=e000652cfe10 bsp=e000652c8f30 > > [] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp] > > sp=e000652cfe20 bsp=e000652c8ef8 > > [] enable_slot+0x90/0xc0 [shpchp] > > sp=e000652cfe20 bsp=e000652c8ed0 > > [] power_write_file+0x1e0/0x2a0 [pci_hotplug] > > sp=e000652cfe20 bsp=e000652c8ea0 > > [] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug] > > sp=e000652cfe20 bsp=e000652c8e68 > > [] sysfs_write_file+0x270/0x300 > > sp=e000652cfe20 bsp=e000652c8e18 > > [] vfs_write+0x1b0/0x300 > > sp=e000652cfe20 bsp=e000652c8dc0 > > [] sys_write+0x70/0xe0 > > sp=e000652cfe20 bsp=e000652c8d48 > > [] ia64_ret_from_syscall+0x0/0x20 > > sp=e000652cfe30 bsp=e000652c8d48 > > [] __kernel_syscall_via_break+0x0/0x20 > > sp=e000652d bsp=e000652c8d48 > > > > This error is caused by the race condition between dev_alloc_name() > > and unregistering net device. The dev_alloc_name() function checks > > dev_base list to find a suitable id. The net device is removed from > > the dev_base list when net device is unregistered. Those are done with > > rtnl lock held, so there is no problem. However, removing sysfs entry > > is delayed until netdev_run_todo() which is outside rtnl lock. Because > > of this, dev_alloc_name() can create a device name which is duplicated > > with the existing sysfs entry. > > > > The following patch fixes this by changing dev_alloc_name() function > > to check not only dev_base list but also todo list
Re: [BUG][PATCH] Fix race condition about network device name allocation
Hi Stephen, Thank you for comments. I'll try your patch. I have one concern about your patch, though I don't know very much about netdev codes. @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev > /* Notifier chain MUST detach us from master device. */ > BUG_TRAP(!dev->master); > > + /* Remove entries from sysfs */ > + netdev_unregister_sysfs(dev); > + > /* Finish processing unregister after unlock */ > net_set_todo(dev); > With your patch, the netdev_unregister_sysfs() will be called earlier than now. Does it have no side effect? I'm worrying about if there are somebody that depend on sysfs entry until net_run_todo() is called. Anyway, I'll try your patch. Thanks, Kenji Kaneshige 2007-05-11 (金) の 09:25 -0700 に Stephen Hemminger さんは書きました: > On Fri, 11 May 2007 14:40:45 +0900 > Kenji Kaneshige <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I encountered the following error when I was hot-plugging network card > > using pci hotplug driver. > > > > kobject_add failed for eth8 with -EEXIST, don't try to register things with > > the same name in the same directory. > > > > Call Trace: > > [] show_stack+0x40/0xa0 > > sp=e000652cfbf0 bsp=e000652c9450 > > [] dump_stack+0x30/0x60 > > sp=e000652cfdc0 bsp=e000652c9438 > > [] kobject_shadow_add+0x3b0/0x440 > > sp=e000652cfdc0 bsp=e000652c93f0 > > [] kobject_add+0x30/0x60 > > sp=e000652cfdc0 bsp=e000652c93d0 > > [] device_add+0x160/0xf40 > > sp=e000652cfdc0 bsp=e000652c9358 > > [] netdev_register_sysfs+0xc0/0xe0 > > sp=e000652cfdc0 bsp=e000652c9328 > > [] register_netdevice+0x600/0x7e0 > > sp=e000652cfdc0 bsp=e000652c92e8 > > [] register_netdev+0x70/0xc0 > > sp=e000652cfdd0 bsp=e000652c92c0 > > [] e1000_probe+0x1710/0x1a00 [e1000] > > sp=e000652cfdd0 bsp=e000652c9258 > > [] pci_device_probe+0xc0/0x120 > > sp=e000652cfde0 bsp=e000652c9218 > > [] really_probe+0x1c0/0x300 > > sp=e000652cfde0 bsp=e000652c91c8 > > [] driver_probe_device+0x1a0/0x1c0 > > sp=e000652cfde0 bsp=e000652c9198 > > [] __device_attach+0x30/0x60 > > sp=e000652cfde0 bsp=e000652c9170 > > [] bus_for_each_drv+0x80/0x120 > > sp=e000652cfde0 bsp=e000652c9138 > > [] device_attach+0xa0/0x100 > > sp=e000652cfe00 bsp=e000652c9110 > > [] bus_attach_device+0x60/0xe0 > > sp=e000652cfe00 bsp=e000652c90e0 > > [] device_add+0x950/0xf40 > > sp=e000652cfe00 bsp=e000652c9068 > > [] pci_bus_add_device+0x20/0x100 > > sp=e000652cfe00 bsp=e000652c9040 > > [] pci_bus_add_devices+0x50/0x320 > > sp=e000652cfe00 bsp=e000652c9010 > > [] shpchp_configure_device+0x460/0x4a0 [shpchp] > > sp=e000652cfe00 bsp=e000652c8fc0 > > [] board_added+0x720/0x8c0 [shpchp] > > sp=e000652cfe00 bsp=e000652c8f80 > > [] shpchp_enable_slot+0x920/0xa40 [shpchp] > > sp=e000652cfe10 bsp=e000652c8f30 > > [] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp] > > sp=e000652cfe20 bsp=e000652c8ef8 > > [] enable_slot+0x90/0xc0 [shpchp] > > sp=e000652cfe20 bsp=e000652c8ed0 > > [] power_write_file+0x1e0/0x2a0 [pci_hotplug] > > sp=e000652cfe20 bsp=e000652c8ea0 > > [] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug] > > sp=e000652cfe20 bsp=e000652c8e68 > > [] sysfs_write_file+0x270/0x300 > > sp=e000652cfe20 bsp=e000652c8e18 > > [] vfs_write+0x1b0/0x300 > > sp=e000652cfe20 bsp=e000652c8dc0 > > [] sys_write+0x70/0xe0 > > sp=e000652cfe20 bsp=e000652c8d48 > > [] ia64_ret_from_syscall+0x0/0x20 > > sp=e000652cfe30 bsp=e000652c8d48 > > [] __kernel_syscall_via_break+0x0/0x20 > > sp=e000652d bsp=e000652c8d48 > > > > This error is caused by the race condition between dev_alloc_name() > > and unregistering net device. The dev_alloc_name() function checks > > dev_base list to find a suitable id. The net device is removed from > > the dev_base list when net device is unre
Re: [BUG][PATCH] Fix race condition about network device name allocation
On Fri, 11 May 2007 14:40:45 +0900 Kenji Kaneshige <[EMAIL PROTECTED]> wrote: > Hi, > > I encountered the following error when I was hot-plugging network card > using pci hotplug driver. > > kobject_add failed for eth8 with -EEXIST, don't try to register things with > the same name in the same directory. > > Call Trace: > [] show_stack+0x40/0xa0 > sp=e000652cfbf0 bsp=e000652c9450 > [] dump_stack+0x30/0x60 > sp=e000652cfdc0 bsp=e000652c9438 > [] kobject_shadow_add+0x3b0/0x440 > sp=e000652cfdc0 bsp=e000652c93f0 > [] kobject_add+0x30/0x60 > sp=e000652cfdc0 bsp=e000652c93d0 > [] device_add+0x160/0xf40 > sp=e000652cfdc0 bsp=e000652c9358 > [] netdev_register_sysfs+0xc0/0xe0 > sp=e000652cfdc0 bsp=e000652c9328 > [] register_netdevice+0x600/0x7e0 > sp=e000652cfdc0 bsp=e000652c92e8 > [] register_netdev+0x70/0xc0 > sp=e000652cfdd0 bsp=e000652c92c0 > [] e1000_probe+0x1710/0x1a00 [e1000] > sp=e000652cfdd0 bsp=e000652c9258 > [] pci_device_probe+0xc0/0x120 > sp=e000652cfde0 bsp=e000652c9218 > [] really_probe+0x1c0/0x300 > sp=e000652cfde0 bsp=e000652c91c8 > [] driver_probe_device+0x1a0/0x1c0 > sp=e000652cfde0 bsp=e000652c9198 > [] __device_attach+0x30/0x60 > sp=e000652cfde0 bsp=e000652c9170 > [] bus_for_each_drv+0x80/0x120 > sp=e000652cfde0 bsp=e000652c9138 > [] device_attach+0xa0/0x100 > sp=e000652cfe00 bsp=e000652c9110 > [] bus_attach_device+0x60/0xe0 > sp=e000652cfe00 bsp=e000652c90e0 > [] device_add+0x950/0xf40 > sp=e000652cfe00 bsp=e000652c9068 > [] pci_bus_add_device+0x20/0x100 > sp=e000652cfe00 bsp=e000652c9040 > [] pci_bus_add_devices+0x50/0x320 > sp=e000652cfe00 bsp=e000652c9010 > [] shpchp_configure_device+0x460/0x4a0 [shpchp] > sp=e000652cfe00 bsp=e000652c8fc0 > [] board_added+0x720/0x8c0 [shpchp] > sp=e000652cfe00 bsp=e000652c8f80 > [] shpchp_enable_slot+0x920/0xa40 [shpchp] > sp=e000652cfe10 bsp=e000652c8f30 > [] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp] > sp=e000652cfe20 bsp=e000652c8ef8 > [] enable_slot+0x90/0xc0 [shpchp] > sp=e000652cfe20 bsp=e000652c8ed0 > [] power_write_file+0x1e0/0x2a0 [pci_hotplug] > sp=e000652cfe20 bsp=e000652c8ea0 > [] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug] > sp=e000652cfe20 bsp=e000652c8e68 > [] sysfs_write_file+0x270/0x300 > sp=e000652cfe20 bsp=e000652c8e18 > [] vfs_write+0x1b0/0x300 > sp=e000652cfe20 bsp=e000652c8dc0 > [] sys_write+0x70/0xe0 > sp=e000652cfe20 bsp=e000652c8d48 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e000652cfe30 bsp=e000652c8d48 > [] __kernel_syscall_via_break+0x0/0x20 > sp=e000652d bsp=e000652c8d48 > > This error is caused by the race condition between dev_alloc_name() > and unregistering net device. The dev_alloc_name() function checks > dev_base list to find a suitable id. The net device is removed from > the dev_base list when net device is unregistered. Those are done with > rtnl lock held, so there is no problem. However, removing sysfs entry > is delayed until netdev_run_todo() which is outside rtnl lock. Because > of this, dev_alloc_name() can create a device name which is duplicated > with the existing sysfs entry. > > The following patch fixes this by changing dev_alloc_name() function > to check not only dev_base list but also todo list and snapshot list > to find a suitable id. If some devices exists on the todo list or > snapshot list, sysfs entries corresponding to them are not deleted > yet. > > Thanks, > Kenji Kaneshige How about the following (untested), instead. It hold a removes the sysfs entries earlier (on unregister_netdevice), but holds a kobject reference. Then when todo runs the actual last put free happens. --- net/core/dev.c | 10 ++ net/core/net-sysfs.c |8 +++- 2 files changed, 13 insertions(+), 5 deletions(-) --- net-2.6.orig/net/core/d
Re: [BUG][PATCH] Fix race condition about network device name allocation
On Fri, 11 May 2007 14:40:45 +0900 Kenji Kaneshige <[EMAIL PROTECTED]> wrote: > Hi, > > I encountered the following error when I was hot-plugging network card > using pci hotplug driver. > > kobject_add failed for eth8 with -EEXIST, don't try to register things with > the same name in the same directory. > > Call Trace: > [] show_stack+0x40/0xa0 > sp=e000652cfbf0 bsp=e000652c9450 > [] dump_stack+0x30/0x60 > sp=e000652cfdc0 bsp=e000652c9438 > [] kobject_shadow_add+0x3b0/0x440 > sp=e000652cfdc0 bsp=e000652c93f0 > [] kobject_add+0x30/0x60 > sp=e000652cfdc0 bsp=e000652c93d0 > [] device_add+0x160/0xf40 > sp=e000652cfdc0 bsp=e000652c9358 > [] netdev_register_sysfs+0xc0/0xe0 > sp=e000652cfdc0 bsp=e000652c9328 > [] register_netdevice+0x600/0x7e0 > sp=e000652cfdc0 bsp=e000652c92e8 > [] register_netdev+0x70/0xc0 > sp=e000652cfdd0 bsp=e000652c92c0 > [] e1000_probe+0x1710/0x1a00 [e1000] > sp=e000652cfdd0 bsp=e000652c9258 > [] pci_device_probe+0xc0/0x120 > sp=e000652cfde0 bsp=e000652c9218 > [] really_probe+0x1c0/0x300 > sp=e000652cfde0 bsp=e000652c91c8 > [] driver_probe_device+0x1a0/0x1c0 > sp=e000652cfde0 bsp=e000652c9198 > [] __device_attach+0x30/0x60 > sp=e000652cfde0 bsp=e000652c9170 > [] bus_for_each_drv+0x80/0x120 > sp=e000652cfde0 bsp=e000652c9138 > [] device_attach+0xa0/0x100 > sp=e000652cfe00 bsp=e000652c9110 > [] bus_attach_device+0x60/0xe0 > sp=e000652cfe00 bsp=e000652c90e0 > [] device_add+0x950/0xf40 > sp=e000652cfe00 bsp=e000652c9068 > [] pci_bus_add_device+0x20/0x100 > sp=e000652cfe00 bsp=e000652c9040 > [] pci_bus_add_devices+0x50/0x320 > sp=e000652cfe00 bsp=e000652c9010 > [] shpchp_configure_device+0x460/0x4a0 [shpchp] > sp=e000652cfe00 bsp=e000652c8fc0 > [] board_added+0x720/0x8c0 [shpchp] > sp=e000652cfe00 bsp=e000652c8f80 > [] shpchp_enable_slot+0x920/0xa40 [shpchp] > sp=e000652cfe10 bsp=e000652c8f30 > [] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp] > sp=e000652cfe20 bsp=e000652c8ef8 > [] enable_slot+0x90/0xc0 [shpchp] > sp=e000652cfe20 bsp=e000652c8ed0 > [] power_write_file+0x1e0/0x2a0 [pci_hotplug] > sp=e000652cfe20 bsp=e000652c8ea0 > [] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug] > sp=e000652cfe20 bsp=e000652c8e68 > [] sysfs_write_file+0x270/0x300 > sp=e000652cfe20 bsp=e000652c8e18 > [] vfs_write+0x1b0/0x300 > sp=e000652cfe20 bsp=e000652c8dc0 > [] sys_write+0x70/0xe0 > sp=e000652cfe20 bsp=e000652c8d48 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e000652cfe30 bsp=e000652c8d48 > [] __kernel_syscall_via_break+0x0/0x20 > sp=e000652d bsp=e000652c8d48 > > This error is caused by the race condition between dev_alloc_name() > and unregistering net device. The dev_alloc_name() function checks > dev_base list to find a suitable id. The net device is removed from > the dev_base list when net device is unregistered. Those are done with > rtnl lock held, so there is no problem. However, removing sysfs entry > is delayed until netdev_run_todo() which is outside rtnl lock. Because > of this, dev_alloc_name() can create a device name which is duplicated > with the existing sysfs entry. > > The following patch fixes this by changing dev_alloc_name() function > to check not only dev_base list but also todo list and snapshot list > to find a suitable id. If some devices exists on the todo list or > snapshot list, sysfs entries corresponding to them are not deleted > yet. > > Thanks, > Kenji Kaneshige > > --- > Fix the race condition between dev_alloc_name() and net device > unregistration. > > dev_alloc_name checks `dev_base' list to find a suitable ID. On the > other hand, net devices are removed from that list on net device > unregistration. Both of them are done with holding rtnl lock. However, > removing sysfs entry is delayed until netdev_run_todo(),
[BUG][PATCH] Fix race condition about network device name allocation
Hi, I encountered the following error when I was hot-plugging network card using pci hotplug driver. kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory. Call Trace: [] show_stack+0x40/0xa0 sp=e000652cfbf0 bsp=e000652c9450 [] dump_stack+0x30/0x60 sp=e000652cfdc0 bsp=e000652c9438 [] kobject_shadow_add+0x3b0/0x440 sp=e000652cfdc0 bsp=e000652c93f0 [] kobject_add+0x30/0x60 sp=e000652cfdc0 bsp=e000652c93d0 [] device_add+0x160/0xf40 sp=e000652cfdc0 bsp=e000652c9358 [] netdev_register_sysfs+0xc0/0xe0 sp=e000652cfdc0 bsp=e000652c9328 [] register_netdevice+0x600/0x7e0 sp=e000652cfdc0 bsp=e000652c92e8 [] register_netdev+0x70/0xc0 sp=e000652cfdd0 bsp=e000652c92c0 [] e1000_probe+0x1710/0x1a00 [e1000] sp=e000652cfdd0 bsp=e000652c9258 [] pci_device_probe+0xc0/0x120 sp=e000652cfde0 bsp=e000652c9218 [] really_probe+0x1c0/0x300 sp=e000652cfde0 bsp=e000652c91c8 [] driver_probe_device+0x1a0/0x1c0 sp=e000652cfde0 bsp=e000652c9198 [] __device_attach+0x30/0x60 sp=e000652cfde0 bsp=e000652c9170 [] bus_for_each_drv+0x80/0x120 sp=e000652cfde0 bsp=e000652c9138 [] device_attach+0xa0/0x100 sp=e000652cfe00 bsp=e000652c9110 [] bus_attach_device+0x60/0xe0 sp=e000652cfe00 bsp=e000652c90e0 [] device_add+0x950/0xf40 sp=e000652cfe00 bsp=e000652c9068 [] pci_bus_add_device+0x20/0x100 sp=e000652cfe00 bsp=e000652c9040 [] pci_bus_add_devices+0x50/0x320 sp=e000652cfe00 bsp=e000652c9010 [] shpchp_configure_device+0x460/0x4a0 [shpchp] sp=e000652cfe00 bsp=e000652c8fc0 [] board_added+0x720/0x8c0 [shpchp] sp=e000652cfe00 bsp=e000652c8f80 [] shpchp_enable_slot+0x920/0xa40 [shpchp] sp=e000652cfe10 bsp=e000652c8f30 [] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp] sp=e000652cfe20 bsp=e000652c8ef8 [] enable_slot+0x90/0xc0 [shpchp] sp=e000652cfe20 bsp=e000652c8ed0 [] power_write_file+0x1e0/0x2a0 [pci_hotplug] sp=e000652cfe20 bsp=e000652c8ea0 [] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug] sp=e000652cfe20 bsp=e000652c8e68 [] sysfs_write_file+0x270/0x300 sp=e000652cfe20 bsp=e000652c8e18 [] vfs_write+0x1b0/0x300 sp=e000652cfe20 bsp=e000652c8dc0 [] sys_write+0x70/0xe0 sp=e000652cfe20 bsp=e000652c8d48 [] ia64_ret_from_syscall+0x0/0x20 sp=e000652cfe30 bsp=e000652c8d48 [] __kernel_syscall_via_break+0x0/0x20 sp=e000652d bsp=e000652c8d48 This error is caused by the race condition between dev_alloc_name() and unregistering net device. The dev_alloc_name() function checks dev_base list to find a suitable id. The net device is removed from the dev_base list when net device is unregistered. Those are done with rtnl lock held, so there is no problem. However, removing sysfs entry is delayed until netdev_run_todo() which is outside rtnl lock. Because of this, dev_alloc_name() can create a device name which is duplicated with the existing sysfs entry. The following patch fixes this by changing dev_alloc_name() function to check not only dev_base list but also todo list and snapshot list to find a suitable id. If some devices exists on the todo list or snapshot list, sysfs entries corresponding to them are not deleted yet. Thanks, Kenji Kaneshige --- Fix the race condition between dev_alloc_name() and net device unregistration. dev_alloc_name checks `dev_base' list to find a suitable ID. On the other hand, net devices are removed from that list on net device unregistration. Both of them are done with holding rtnl lock. However, removing sysfs entry is delayed until netdev_run_todo(), which doesn't acquire rtnl lock. Therefore, on dev_alloc_name(), device name could conflict with the device, which is unregistered, but its sysfs entry still exist. To fix this bug, change dev_alloc_name to check not only `dev_base' list but also todo list and snapshot list