fd code multithreaded race?

2010-07-31 Thread Antti Kantee
Hi,

I'm looking at a KASSERT which is triggering quite rarely for me (in
terms of iterations):

panic: kernel diagnostic assertion dt-dt_ff[i]-ff_refcnt == 0 failed: file 
/usr/allsrc/src/sys/rump/librump/rumpkern/../../../kern/kern_descrip.c, line 
856

Upon closer examination, it seems that this can trigger while another
thread is in fd_getfile() between upping the refcount, testing for
ff_file, and fd_putfile().  Removing the KASSERT seems to restore correct
operation, but I didn't read the code far enough to see where the race
is actually handled and what stops the code from using the wrong file.

How-to-repeat:
Run tests/fs/puffs/t_fuzz mountfuzz7 in a loop.  A multiprocessor kernel
might produce a more reliable result, so set RUMP_NCPU unless you have
a multiprocessor host.  Depending on timings and how the get/put thread
runs, you might even see the refcount as 0 in the core.

Does anyone see something wrong with the analysis?  If not, I'll create
a dedidated test and file a PR.


Re: Modules loading modules?

2010-07-31 Thread Paul Goyette

On Wed, 28 Jul 2010, Andrew Doran wrote:


it seems to me the root problem is that module_mutex is held while
calling into the module startup routines.

thus, the right solution is to remove this requirement.


Yes, that's what is needed.


I'm far from convinced that's a good idea.  First, it will probably
make the module code a nightmare -- what happens when you have multiple
interleaved loads, some of which fail at some point in their dependency
stack, and let's just throw in a manual modunload to mix up things
further.  Second, and pretty much related to number one, it goes against
one of the most fundamental principles of robust code: atomic actions.


This: atomicity.  The atomic behaviour is relied upon to give all or
nothing semantics upon load and unload, like transactions in a database.

Admittely some modules (not of my making) are sloppy about this and so
break that bit of the interface contract.


If modload-from-modcmd is found necessary, sounds more like a case for
the infamous recursive lock.


Recursive lock is the way to go.  I think the same lock should also cover
all device configuration activites (i.e. autoconf) and any other
heavy lifting where we have chunks of the system coming and going.


Well, folks, here is a first pass recursive locks!  The attached diffs 
are against -current as of a few minutes ago.


Some caveats:

1. This has only been compile-tested for x86 (amd64  i386).

2. Other architectures have not even been compiled yet.  It should
   work but you never know.

3. HPPA is an exceptional exception here, since it is the only in-tree
   architecture I found that cannot use SIMPLE_LOCKS.

4. There is only one known use case for this so far:  modules loading
   other modules from within their xxx_modcmd() routine.  The specific
   use case we have involves loading the acpicpu driver/module which
   eventually results in an attempt to load acpiverbose.

It would be really nice if the community could

A. Compile-test on additional architectures, especially HPPA
B. Test to see that existing mutex operations still work correctly
C. Exercise the known use case if possible
D. Identify additional use cases


There is one place in the code (marked with great big XXX) where I am 
simply unsure if I need to use a CAS-type operation for updating the 
reference count on the recursive mutex.  I _think_ it is save to simply 
auto-increment the field, but I am totally willing for any of you 
experts to tell my why this might not be safe.   :)


I will be updating one of my home machines (amd64) to use this code in 
the next few days.  I don't plan on making any commits until I've had 
some positive testing feedback, and hopefully some expert commentary on 
the XXX.  (Some additional concensus that this is the right thing to 
do would also be appreciated!)




-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-Index: sys/arch/alpha/include/mutex.h
===
RCS file: /cvsroot/src/sys/arch/alpha/include/mutex.h,v
retrieving revision 1.4
diff -u -p -r1.4 mutex.h
--- sys/arch/alpha/include/mutex.h  28 Apr 2008 20:23:11 -  1.4
+++ sys/arch/alpha/include/mutex.h  31 Jul 2010 22:34:30 -
@@ -43,7 +43,11 @@ struct kmutex {
 
 struct kmutex {
union {
-   volatile uintptr_t  mtxa_owner;
+   struct {
+   volatile uintptr_t  mtxa_owner;
+   volatile uint8_tmtxa_recurse;
+   volatile uint8_tmtxa_unused[3];
+   } a;
struct {
volatile uint8_tmtxs_flags;
ipl_cookie_tmtxs_ipl;
@@ -53,7 +57,8 @@ struct kmutex {
__cpu_simple_lock_t mtx_lock;
 };
 
-#definemtx_owner   u.mtxa_owner
+#definemtx_owner   u.a.mtxa_owner
+#definemtx_recurse u.a.mtxa_recurse
 #definemtx_flags   u.s.mtxs_flags
 #definemtx_ipl u.s.mtxs_ipl
 
Index: sys/arch/arm/include/mutex.h
===
RCS file: /cvsroot/src/sys/arch/arm/include/mutex.h,v
retrieving revision 1.10
diff -u -p -r1.10 mutex.h
--- sys/arch/arm/include/mutex.h28 Apr 2008 20:23:14 -  1.10
+++ sys/arch/arm/include/mutex.h31 Jul 2010 22:34:31 -
@@ -57,7 +57,11 @@ struct kmutex {
 struct kmutex {
union {
/* Adaptive mutex */
-