Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-07 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> Paul Gortmaker wrote:
> > - extern void ether_setup(struct net_device *dev);
> > + extern void __ether_setup(struct net_device *dev);
> > + static inline void ether_setup(struct net_device *dev){
> > +   dev->owner = THIS_MODULE;
> > +   __ether_setup(dev);
> > + }
> > 
> > Ugh. Probably should just add it to each probe and be done with it...
> 
> mm..  Seeing as failure to set dev->owner is a fatal mistake,
> it would be good to enforce this via the compiler type system.
> 
> How about making THIS_MODULE an argument to register_netdevice()
> and, hence, register_netdev() and init_etherdev()?

Bear in mind that in 2.5, the THIS_MODULE registration cancer
infesting the kernel[1] will vanish with two-stage module delete[2].

http://www.wcug.wwu.edu/lists/netdev/26/msg00250.html

Rusty.

[1] And getting worse.
[2] Which was the correct solution for 2.4, only I was all out of
`get out of code freeze free' cards.
--
Hacking time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-06 Thread Andrew Morton

Paul Gortmaker wrote:
> 
> Assuming that nobody has all the MOD_..._USE_COUNT things culled
> from a tree somewhere already, I quickly hacked up the following
> script for drivers/net:

Looks good.  There's also drivers/isdn and possibly other places.

> ...
> We might want to filter the file list created by grep against VERSION_CODE
> as people with that in their driver(s) probably don't want the wholesale
> deletion of MOD_*_COUNT. (OTOH, drivers that have VERSION_CODE that
> supports 2.0.38 or oddball 2.3.x versions could probably be pruned...)

I think you're right.  eepro100 and acenic seriously care about 2.2-compatibility
but AFAICT the others just pretend to.

> That still leaves the addition of dev->owner = THIS_MODULE into
> each device probe.  One *hackish* way to do this without having to
> deal with each driver could be something like this in netdevice.h
> 
> - extern void ether_setup(struct net_device *dev);
> + extern void __ether_setup(struct net_device *dev);
> + static inline void ether_setup(struct net_device *dev){
> +   dev->owner = THIS_MODULE;
> +   __ether_setup(dev);
> + }
> 
> Ugh. Probably should just add it to each probe and be done with it...

mm..  Seeing as failure to set dev->owner is a fatal mistake,
it would be good to enforce this via the compiler type system.

How about making THIS_MODULE an argument to register_netdevice()
and, hence, register_netdev() and init_etherdev()?

> Paul. (aka. monkey #937)

:)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-05 Thread Paul Gortmaker

Andrew Morton wrote:

> y'know, if we keep working this patch for about a year we
> might end up getting it right.  Thousand monkeys and all that.

Yeah, probably still a year until the release of 2.4.0.  8)
Now where did I put those darn bananas...

> - With this patch applied, the module refcounts for netdevices
>   will show doubled values in `lsmod', unless those drivers
>   have been changed to remove the now unneeded MOD_INC/DEC_USE_COUNT
>   macros (perhaps with the above 2.2-compatibility thing).

Assuming that nobody has all the MOD_..._USE_COUNT things culled
from a tree somewhere already, I quickly hacked up the following
script for drivers/net:


#!/bin/bash
for i in `grep -l 'MOD_..._USE_COUNT;' *.c */*.c`
do
  mv $i $i~
  cat $i~ | \
  sed '/^$/{N;s/.*MOD.*COUNT;//;tz;b;:z;N;s/^\n$//;};/.*MOD.*COUNT;/d' > $i
done


It looks ugly but it zaps out the extra blank line when MOD_.*COUNT
had a blank line above and below.  I had a quick look at the resulting
diff (4200 lines!) and it looks like the post-script hand editing will
be minimal (e.g. a few arcnet drivers have MOD_*_COUNT as the only code
in an if block).

We might want to filter the file list created by grep against VERSION_CODE
as people with that in their driver(s) probably don't want the wholesale
deletion of MOD_*_COUNT. (OTOH, drivers that have VERSION_CODE that
supports 2.0.38 or oddball 2.3.x versions could probably be pruned...)

That still leaves the addition of dev->owner = THIS_MODULE into
each device probe.  One *hackish* way to do this without having to
deal with each driver could be something like this in netdevice.h

- extern void ether_setup(struct net_device *dev);
+ extern void __ether_setup(struct net_device *dev);
+ static inline void ether_setup(struct net_device *dev){
+   dev->owner = THIS_MODULE;   
+   __ether_setup(dev);
+ }

Ugh. Probably should just add it to each probe and be done with it...

Paul. (aka. monkey #937)



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-05 Thread Andrew Morton

Keith Owens wrote:
> 
> Pressed enter too soon.
> 
> /*
>  *  Call device private open method
>  */
> 
> ret = -ENODEV;
> if (dev->open && try_inc_mod_count(dev->owner)) {
> if ((ret = dev->open(dev)) && dev->owner)
> __MOD_DEC_USE_COUNT(dev->owner);
> }

y'know, if we keep working this patch for about a year we
might end up getting it right.  Thousand monkeys and all that.

The above assumes that (dev->open == 0) is an error, but
netdevices are in fact allowed to have a NULL open() method.

So hereunder is about the seventieth version of the netdevice
modules safety patch.  Go wild.

Some notes:

- I've created the generic macro SET_OWNER_MODULE and put it
  into modules.h.  It may perhaps be useful for things other
  than netdevices?  The intent here is that 2.4-only drivers
  can use:

SET_OWNER_MODULE(dev);

  in their init routines.
  
  Drivers which retain 2.2 compatibility should use:

  #if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
  #define SET_OWNER_MODULE(d)
  #else
  #undef MOD_INC_USE_COUNT
  #undef MOD_DEC_USE_COUNT
  #endif

xxx_probe()
{
...
SET_MODULE_OWNER(dev);  /* 2.4 */
...
MOD_INC_USE_COUNT;  /* 2.2 */
}

  And everything should work.

- With this patch applied, the module refcounts for netdevices
  will show doubled values in `lsmod', unless those drivers
  have been changed to remove the now unneeded MOD_INC/DEC_USE_COUNT
  macros (perhaps with the above 2.2-compatibility thing).



--- linux-2.4.0-test10/include/linux/netdevice.hSat Nov  4 16:22:49 2000
+++ linux-akpm/include/linux/netdevice.hSun Nov  5 22:15:48 2000
@@ -383,6 +383,9 @@
int (*neigh_setup)(struct net_device *dev, struct 
neigh_parms *);
int (*accept_fastpath)(struct net_device *, struct 
dst_entry*);
 
+   /* open/release and usage marking */
+   struct module *owner;
+
/* bridge stuff */
struct net_bridge_port  *br_port;
 
--- linux-2.4.0-test10/include/linux/module.h   Sat Nov  4 16:22:49 2000
+++ linux-akpm/include/linux/module.h   Sun Nov  5 22:18:16 2000
@@ -313,4 +313,10 @@
 #define EXPORT_NO_SYMBOLS
 #endif /* MODULE */
 
+#ifdef CONFIG_MODULES
+#define SET_OWNER_MODULE(some_struct) do { some_struct->owner = THIS_MODULE; } while 
+(0)
+#else
+#define SET_OWNER_MODULE(some_struct) do { } while (0)
+#endif
+
 #endif /* _LINUX_MODULE_H */
--- linux-2.4.0-test10/net/core/dev.c   Sat Nov  4 16:22:50 2000
+++ linux-akpm/net/core/dev.c   Sun Nov  5 22:31:57 2000
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 #if defined(CONFIG_NET_RADIO) || defined(CONFIG_NET_PCMCIA_RADIO)
 #include /* Note : will define WIRELESS_EXT */
 #endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
@@ -666,9 +667,15 @@
/*
 *  Call device private open method
 */
-
-   if (dev->open) 
-   ret = dev->open(dev);
+   if (try_inc_mod_count(dev->owner)) {
+   if (dev->open) {
+   ret = dev->open(dev);
+   if (ret != 0 && dev->owner)
+   __MOD_DEC_USE_COUNT(dev->owner);
+   }
+   } else {
+   ret = -ENODEV;
+   }
 
/*
 *  If it went open OK then:
@@ -783,6 +790,12 @@
 *  Tell people we are down
 */
notifier_call_chain(&netdev_chain, NETDEV_DOWN, dev);
+
+   /*
+* Drop the module refcount
+*/
+   if (dev->owner)
+   __MOD_DEC_USE_COUNT(dev->owner);
 
return(0);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Keith Owens

Pressed enter too soon.

/*
 *  Call device private open method
 */

ret = -ENODEV;
if (dev->open && try_inc_mod_count(dev->owner)) {
if ((ret = dev->open(dev)) && dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Keith Owens

On Sun, 05 Nov 2000 12:28:55 +1100, 
Andrew Morton <[EMAIL PROTECTED]> wrote:
>  CPU0  CPU1
>
>rtnl_lock()
> dev_ifsioc()
>  dev_change_flags()
>   dev_open();
>dev->open();
>vortex_open()
>sys_delete_module()
>if (!__MOD_IN_USE)
> free_module()

If dev->open() calls try_inc_use_count() before it enters the module
you will lock out concurrent module unload via module unload_lock.
There is no need for another module semaphore.

Also there is no need to test dev->owner, try_inc_use_count() already
does that.

/*
 *  Call device private open method
 */

ret = -ENODEV;
if (try_inc_mod_count(dev->owner)) {
if (dev->open && (ret = dev->open(dev)) && dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);
}

In dev_close()

/*
 *  Call device private close method
 */

if (dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andi Kleen

On Sun, Nov 05, 2000 at 12:28:55PM +1100, Andrew Morton wrote:
> Andi Kleen wrote:
> > 
> > On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> > > Andi Kleen wrote:
> > > > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > > > is nothing that would a driver prevent from being unloaded on a different
> > > > CPU while it is already executing in ->open but has not yet executed the add
> > > > yet or after it has executed the _DEC but it is still running in module code
> > > > Normally the windows are pretty small, but very long running interrupt
> > > > on one CPU hitting exactly in the wrong moment can change that.
> > >
> > > Module unload calls unregister_netdev, which grabs rtnl_lock.
> > > dev->open runs under rtnl_lock.
> > >
> > > Given this, how can the driver be unloaded if dev->open is running?
> > 
> > It does not help, because when the semaphore synchronizes it is already
> > too late -- free_module already did the zero module count check and
> > nothing is going to stop it from unloading.
> 
> aaarrrggh!!!

A simple fix at least for this case would be a recheck of module count
after free_module() executed the cleanup function.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andrew Morton

Andrew Morton wrote:
> 
> Perhaps the best thing to do here is to create a system-wide
> semaphore for module unloading. So we do a down()/up()
> in sys_delete_module() and do this in dev_open:

Yep.  I think this is right.  Jeff, this supersedes the
patch you sent to devem yesterday.


--- linux-2.4.0-test10/include/linux/netdevice.hSat Nov  4 16:22:49 2000
+++ linux-akpm/include/linux/netdevice.hSun Nov  5 12:40:00 2000
@@ -146,6 +146,11 @@
 struct neigh_parms;
 struct sk_buff;
 
+/* Centralised module refcounting for netdevices */
+struct module;
+#define SET_NETDEVICE_OWNER(dev)   \
+   do { dev->owner = THIS_MODULE; } while (0)
+
 struct netif_rx_stats
 {
unsigned total;
@@ -382,6 +387,9 @@
 unsigned char *haddr);
int (*neigh_setup)(struct net_device *dev, struct 
neigh_parms *);
int (*accept_fastpath)(struct net_device *, struct 
dst_entry*);
+
+   /* open/release and usage marking */
+   struct module *owner;
 
/* bridge stuff */
struct net_bridge_port  *br_port;
--- linux-2.4.0-test10/include/linux/module.h   Sat Nov  4 16:22:49 2000
+++ linux-akpm/include/linux/module.h   Sun Nov  5 12:37:54 2000
@@ -146,10 +146,16 @@
 #ifdef CONFIG_MODULES
 extern unsigned long get_module_symbol(char *, char *);
 extern void put_module_symbol(unsigned long);
+struct semaphore;
+extern struct semaphore mod_unload_sem;
+#define DOWN_MOD_UNLOAD_SEM() down(&mod_unload_sem)
+#define UP_MOD_UNLOAD_SEM() up(&mod_unload_sem)
 #else
 static inline unsigned long get_module_symbol(char *unused1, char *unused2) { return 
0; };
 static inline void put_module_symbol(unsigned long unused) { };
-#endif
+#define DOWN_MOD_UNLOAD_SEM() do { } while (0)
+#define UP_MOD_UNLOAD_SEM() do { } while (0)
+#endif 
 
 extern int try_inc_mod_count(struct module *mod);
 
--- linux-2.4.0-test10/net/core/dev.c   Sat Nov  4 16:22:50 2000
+++ linux-akpm/net/core/dev.c   Sun Nov  5 12:39:23 2000
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 #if defined(CONFIG_NET_RADIO) || defined(CONFIG_NET_PCMCIA_RADIO)
 #include /* Note : will define WIRELESS_EXT */
 #endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
@@ -666,9 +667,20 @@
/*
 *  Call device private open method
 */
-
-   if (dev->open) 
-   ret = dev->open(dev);
+   DOWN_MOD_UNLOAD_SEM();  /* Synchronise with sys_delete_module 
+*/
+   if (dev->owner == 0) {
+   if (dev->open)
+   ret = dev->open(dev);
+   } else {
+   if (try_inc_mod_count(dev->owner)) {
+   if (dev->open) {
+   if ((ret = dev->open(dev)) != 0)
+   __MOD_DEC_USE_COUNT(dev->owner);
+   }
+   } else
+   ret = -ENODEV;
+   }
+   UP_MOD_UNLOAD_SEM();
 
/*
 *  If it went open OK then:
@@ -783,6 +795,13 @@
 *  Tell people we are down
 */
notifier_call_chain(&netdev_chain, NETDEV_DOWN, dev);
+
+   /*
+* Drop the module refcount
+*/
+   if (dev->owner) {
+   __MOD_DEC_USE_COUNT(dev->owner);
+   }
 
return(0);
 }
--- linux-2.4.0-test10/kernel/module.c  Sat Nov  4 16:22:49 2000
+++ linux-akpm/kernel/module.c  Sun Nov  5 12:36:03 2000
@@ -44,6 +44,7 @@
 static struct module *find_module(const char *name);
 static void free_module(struct module *, int tag_freed);
 
+DECLARE_MUTEX(mod_unload_sem);
 
 /*
  * Called at boot time
@@ -369,6 +370,7 @@
return -EPERM;
 
lock_kernel();
+   down(&mod_unload_sem);
if (name_user) {
if ((error = get_mod_name(name_user, &name)) < 0)
goto out;
@@ -431,6 +433,7 @@
mod->flags &= ~MOD_JUST_FREED;
error = 0;
 out:
+   up(&mod_unload_sem);
unlock_kernel();
return error;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andrew Morton

Andi Kleen wrote:
> 
> On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> > Andi Kleen wrote:
> > > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > > is nothing that would a driver prevent from being unloaded on a different
> > > CPU while it is already executing in ->open but has not yet executed the add
> > > yet or after it has executed the _DEC but it is still running in module code
> > > Normally the windows are pretty small, but very long running interrupt
> > > on one CPU hitting exactly in the wrong moment can change that.
> >
> > Module unload calls unregister_netdev, which grabs rtnl_lock.
> > dev->open runs under rtnl_lock.
> >
> > Given this, how can the driver be unloaded if dev->open is running?
> 
> It does not help, because when the semaphore synchronizes it is already
> too late -- free_module already did the zero module count check and
> nothing is going to stop it from unloading.

aaarrrggh!!!

  CPU0  CPU1

rtnl_lock()
 dev_ifsioc()
  dev_change_flags()
   dev_open();
dev->open();
vortex_open()
sys_delete_module()
if (!__MOD_IN_USE)
 free_module()
  mod->cleanup()
  vortex_cleanup()
   pci_unregister_driver()
[ time passes ] drv->remove();
vortex_remove_one()
 unregister_netdev()
  unregister_netdevice()
  rtnl_lock()/* blocks */
...
MOD_INC_USE_COUNT;
...
rtnl_unlock()
  ...
 module_unmap();/* Not good */


We can't even fix this with a lock_kernel wrapped around
the dev->owner stuff in dev_open(), because the netdevice's
open() can sleep.

prumpf's patch

Perhaps the best thing to do here is to create a system-wide
semaphore for module unloading. So we do a down()/up()
in sys_delete_module() and do this in dev_open:

/*
 *  Call device private open method
 */

down(&mod_unload_sem);  /* sync with sys_delete_module() */
if (dev->owner == 0) {
if (dev->open)
ret = dev->open(dev);
} else {
if (try_inc_mod_count(dev->owner)) {
if (dev->open) {
if ((ret = dev->open(dev)) != 0)
__MOD_DEC_USE_COUNT(dev->owner);
}
} else
ret = -ENODEV;
}
up(&mod_unload_sem);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andi Kleen

On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> Andi Kleen wrote:
> > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > is nothing that would a driver prevent from being unloaded on a different
> > CPU while it is already executing in ->open but has not yet executed the add
> > yet or after it has executed the _DEC but it is still running in module code
> > Normally the windows are pretty small, but very long running interrupt
> > on one CPU hitting exactly in the wrong moment can change that.
> 
> Module unload calls unregister_netdev, which grabs rtnl_lock.
> dev->open runs under rtnl_lock.
> 
> Given this, how can the driver be unloaded if dev->open is running?

It does not help, because when the semaphore synchronizes it is already
too late -- free_module already did the zero module count check and 
nothing is going to stop it from unloading.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Jeff Garzik

Andi Kleen wrote:
> All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> is nothing that would a driver prevent from being unloaded on a different
> CPU while it is already executing in ->open but has not yet executed the add
> yet or after it has executed the _DEC but it is still running in module code
> Normally the windows are pretty small, but very long running interrupt
> on one CPU hitting exactly in the wrong moment can change that.

Module unload calls unregister_netdev, which grabs rtnl_lock.
dev->open runs under rtnl_lock.

Given this, how can the driver be unloaded if dev->open is running?

-- 
Jeff Garzik | Dinner is ready when
Building 1024   | the smoke alarm goes off.
MandrakeSoft|   -/usr/games/fortune
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andi Kleen

On Sat, Nov 04, 2000 at 10:36:36AM -0500, Jeff Garzik wrote:
> Andi Kleen wrote:
> > 
> > On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
> > >
> > > > *   What about dev->open and dev->stop ?
> > >
> > > Sleep all you want, we'll leave the light on for ya.
> > 
> > ... but make sure you have no module unload races (or at least not too
> > huge holes, some are probably unavoidable with the current network
> > driver interface, e.g. without moving module count management a bit up).
> > This means you should do MOD_INC_USE_COUNT very early at least to
> > minimize the windows (and DEC_USE_COUNT very late)
> 
> Can you provide a trace of a race or deadlock?  I do not see where there
> are races in the current 2.4.x code.

All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
is nothing that would a driver prevent from being unloaded on a different
CPU while it is already executing in ->open but has not yet executed the add 
yet or after it has executed the _DEC but it is still running in module code
Normally the windows are pretty small, but very long running interrupt
on one CPU hitting exactly in the wrong moment can change that.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Jeff Garzik

Andi Kleen wrote:
> 
> On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
> >
> > > *   What about dev->open and dev->stop ?
> >
> > Sleep all you want, we'll leave the light on for ya.
> 
> ... but make sure you have no module unload races (or at least not too
> huge holes, some are probably unavoidable with the current network
> driver interface, e.g. without moving module count management a bit up).
> This means you should do MOD_INC_USE_COUNT very early at least to
> minimize the windows (and DEC_USE_COUNT very late)

Can you provide a trace of a race or deadlock?  I do not see where there
are races in the current 2.4.x code.


> > dev->do_ioctl:
> >   Locking: Inside rtnl_lock() semaphore.
> >   Sleeping: OK
> 
> Just make sure you lock against your interrupt and xmit threads.

But of course :)  My doc only covered locks external to the driver.


> >   Locking: Inside dev->xmit_lock spinlock.
> >   Sleeping: NO[1]
> >
> >
> > NOTE [1]: On principle, you should not sleep when a spinlock is held.
> > However, since this spinlock is per-net-device, we only block ourselves
> > if we sleep, so the effect is mitigated.
> 
> Sounds like dangerous advice.

Probably... I changed the doc "just say no" :)

Jeff


-- 
Jeff Garzik | Dinner is ready when
Building 1024   | the smoke alarm goes off.
MandrakeSoft|   -/usr/games/fortune
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Andi Kleen

On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
> 
> > *   What about dev->open and dev->stop ?
> 
> Sleep all you want, we'll leave the light on for ya.


... but make sure you have no module unload races (or at least not too
huge holes, some are probably unavoidable with the current network
driver interface, e.g. without moving module count management a bit up). 
This means you should do MOD_INC_USE_COUNT very early at least to
minimize the windows (and DEC_USE_COUNT very late) 

> dev->do_ioctl:
>   Locking: Inside rtnl_lock() semaphore.
>   Sleeping: OK

Just make sure you lock against your interrupt and xmit threads.

>   Locking: Inside dev->xmit_lock spinlock.
>   Sleeping: NO[1]
> 
> 
> NOTE [1]: On principle, you should not sleep when a spinlock is held.
> However, since this spinlock is per-net-device, we only block ourselves
> if we sleep, so the effect is mitigated.

Sounds like dangerous advice.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Locking Between User Context and Soft IRQs in 2.4.0

2000-11-04 Thread Jeff Garzik

"Hen, Shmulik" wrote:
> We are trying to port a network driver from 2.2.x to 2.4.x and have some
> question regarding locks.
> According to the kernel locking HOWTO, we have to take extra care when
> locking between user context threads and BH/tasklet/softIRQ,
> so we learned (the hard way ;-) that when running the ioctl system call from
> an application we should use spin_lock/unlock_bh() and not
> spin_lock/unlock() inside dev->do_ioctl().

That is not necessarily true.  If you have timers or tasklets going,
sure.  I prefer kernel threads for a lot of tasks nowadays, because you
only have two cases for locking -- user and interrupt -- and you can
sleep all you want to in a kernel thread.


> *   What about the other entry points implemented in net_device ?

I wrote the attached doc, after tracing through the code.  It has not
been reviewed yet so it is not canonical, but hopefully it is
informative...


> *   We've got dev->get_stats, dev->set_mac_address,
> dev->set_mutlicast_list and others that are all called from running
> 'ifconfig' which is an application. Are they considered user context too ?

You are inside a spinlock in get_stats, so you cannot sleep.  But you
can sleep in set_multicast_list.  Not sure about set_mac_address.


> *   What about dev->open and dev->stop ?

Sleep all you want, we'll leave the light on for ya.


> *   We figured that dev->hard_start_xmit() and timer callbacks are not
> considered user context, but how can I find out if they are being run as
> SoftIRQ or as tasklets or as Bottom Halves ? (their different definitions
> require different types of protections)

I'm not sure about the context from which hard_start_xmit is called... 
Its inside a spinlock, so you shouldn't be sleeping.  timers are unique
unto themselves... but you lock against them using spin_lock_bh outside
the timer, and spin_lock inside the timer.

> wrap entire operations from top to bottom. For example, our
> dev->hard_start_xmit() will have a spin_lock() at the beginning and a
> spin_unlock() at the end of the function.

Why?  dev->xmit_lock is obtained before dev->hard_start_xmit is called,
and released after it returns.


> *   What about other calls to the kernel ? can the running thread be
> switched out of context when calling kernel entries and not be switched back
> in when they finish ? should I beware of deadlocks in such case ?

You should always beware of deadlocks!

Jeff


-- 
Jeff Garzik | Dinner is ready when
Building 1024   | the smoke alarm goes off.
MandrakeSoft|   -/usr/games/fortune



struct net_device synchronization rules
===
dev->open:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->stop:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->do_ioctl:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->get_stats:
Locking: Inside dev_base_lock spinlock.
Sleeping: NO

dev->hard_start_xmit:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]

dev->tx_timeout:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]

dev->set_multicast_list:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]


NOTE [1]: On principle, you should not sleep when a spinlock is held.
However, since this spinlock is per-net-device, we only block ourselves
if we sleep, so the effect is mitigated.




Locking Between User Context and Soft IRQs in 2.4.0

2000-10-30 Thread Hen, Shmulik

Hello,

We are trying to port a network driver from 2.2.x to 2.4.x and have some
question regarding locks.
According to the kernel locking HOWTO, we have to take extra care when
locking between user context threads and BH/tasklet/softIRQ,
so we learned (the hard way ;-) that when running the ioctl system call from
an application we should use spin_lock/unlock_bh() and not
spin_lock/unlock() inside dev->do_ioctl().

*   What about the other entry points implemented in net_device ? 
*   We've got dev->get_stats, dev->set_mac_address,
dev->set_mutlicast_list and others that are all called from running
'ifconfig' which is an application. Are they considered user context too ?
*   What about dev->open and dev->stop ?
*   We figured that dev->hard_start_xmit() and timer callbacks are not
considered user context, but how can I find out if they are being run as
SoftIRQ or as tasklets or as Bottom Halves ? (their different definitions
require different types of protections)

Our driver is actually an intermediate driver bound on top of a regular net
driver. It behaves both as a network adapter driver and a protocol at the
same time. I can safely assume that it will have to handle both transmits
and receives simultaneously (no hardware interrupts are involved). We've
decided that for the first stage we are going to implement "wide" locks that
wrap entire operations from top to bottom. For example, our
dev->hard_start_xmit() will have a spin_lock() at the beginning and a
spin_unlock() at the end of the function.
*   Will it be safe to keep the lock until after the call to the base
driver's hard_start_xmit, or do I have to release the lock just before that
?
*   Or, in our receive function, will I have to release the lock before
or after the call to netif_rx() ?
*   What about other calls to the kernel ? can the running thread be
switched out of context when calling kernel entries and not be switched back
in when they finish ? should I beware of deadlocks in such case ?


Thanks in advance,
Shmulik Hen,
Software Engineer
Linux Advanced Networking Services
Network Communications Group, Israel (NCGj)
Intel Corporation Ltd.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/