This is bug #53457 in RT. I'm sending it to openafs-devel in case anyone who does not read the bug reports would be interested in commenting.


The recent CVS delta 'linux-and-locks-cleanup-20070202' fixes the bug introduced by 'linux-enroll-locks-20060403' (use of posix locks can cause a kernel panic), but it introduces a new bug on linux kernels older than 2.6.17.

This is because the kernel API for flock_lock_file*() changed between 2.6.16 and 2.6.17. Here is the relevant change in the linux kernel source tree:

        
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=993dfa8776308dcfd311cf77a3bbed4aa11e9868


Prior to this change (i.e. prior to linux-2.6.17), flock_lock_file*() takes as an argument a pointer to 'struct file_lock' which has been allocated by locks_alloc_lock(). It then threads this pointer into the inode's i_flock list. The caller of flock_lock_file*() therefore may not free the 'file_lock' structure afterward.

In linux-2.6.17+, flock_lock_file*() takes as an argument a pointer to 'struct file_lock', and copies the relevant data out. It then allocates a new 'struct file_lock' and puts that into the inode i_flock list. It does not retain a reference to the 'struct file_lock' pointer passed as an argument.



In the OpenAFS CVS delta 'linux-and-locks-cleanup-20070202' the following addition is made to src/afs/LINUX/osi_vnodeops.c:


        +#ifdef STRUCT_FILE_OPERATIONS_HAS_FLOCK
        +static int
        +afs_linux_flock(struct file *fp, int cmd, struct file_lock *flp) {
        ...
        ...

        +    AFS_GLOCK();
        +    code = afs_lockctl(vcp, &flock, cmd, credp);
        +    AFS_GUNLOCK();
        +
        +    if ((code == 0 || flp->fl_type == F_UNLCK) &&
        +        (cmd == F_SETLK || cmd == F_SETLKW)) {
        +       struct file_lock flp2;
        +       flp2 = *flp;
        +       flp2.fl_flags &=~ FL_SLEEP;
        +       code = flock_lock_file_wait(fp, &flp2);


Note that the argument to flock_lock_file_wait() is the address of a 'struct file_lock' allocated on the stack. On linux kernels prior to 2.6.17, this results in random stack memory getting into the inode's i_flock linked list.



The kernel will eventually crash when a future lock or unlock operation tries to traverse the i_flock list and finds random memory instead of a valid 'struct file_lock'.




This can probably be reproduced on any linux <2.6.17 kernel by using any program that calls flock() on a file in AFS. I was able to crash a machine running the RHEL4 kernel like this:

        cd /afs/somewhere
        touch ./file

        program_that_calls_flock()_and_waits ./file
        program_that_calls_lockf()_and_waits ./file




There does not seem to be a good way to find out (e.g., autoconf test) if a particular linux kernel has the 'old' or 'new' semantics of flock_lock_file*(). The argument types of the functions have not changed.



One possible solution is to always allocate a new 'struct file_lock' and pass it to the kernel. Prior to linux-2.6.17, the kernel uses the 'list_empty(&file_lock->fl_link)' test to see if the lock was actually used; this should also be safe on newer kernels, but it's relying an an implementation detail.


--- openafs/src/afs/LINUX/osi_vnodeops.c~ 2007-02-02 22:23:22.000000000 -0500 +++ openafs/src/afs/LINUX/osi_vnodeops.c 2007-02-08 11:58:33.000000000 -0500
@@ -528,6 +528,15 @@
     struct vcache *vcp = VTOAFS(FILE_INODE(fp));
     cred_t *credp = crref();
     struct AFS_FLOCK flock;
+    struct file_lock *flp2;
+
+    /* Allocate a new lock structure to give to the kernel */
+    flp2 = locks_alloc_lock();
+    if (!flp2) {
+       code = ENOMEM;
+       goto out;
+    }
+
     /* Convert to a lock format afs_lockctl understands. */
     memset((char *)&flock, 0, sizeof(flock));
     flock.l_type = flp->fl_type;
@@ -552,10 +561,9 @@

     if ((code == 0 || flp->fl_type == F_UNLCK) &&
         (cmd == F_SETLK || cmd == F_SETLKW)) {
-       struct file_lock flp2;
-       flp2 = *flp;
-       flp2.fl_flags &=~ FL_SLEEP;
-       code = flock_lock_file_wait(fp, &flp2);
+       *flp2 = *flp;
+       flp2->fl_flags &=~ FL_SLEEP;
+       code = flock_lock_file_wait(fp, flp2);
        if (code && flp->fl_type != F_UNLCK) {
            struct AFS_FLOCK flock2;
            flock2 = flock;
@@ -569,6 +577,12 @@
     flp->fl_type = flock.l_type;
     flp->fl_pid = flock.l_pid;

+    /* Free the newly allocated lock, if the kernel didn't use it */
+    if (list_empty(&flp2->fl_link)) {
+       locks_free_lock(flp2);
+    }
+
+out:
     crfree(credp);
     return -code;
 }



Alternatively, we can just pass the 'struct file_lock' handed to us by the kernel right back to flock_lock_file*(). It should be safe to clear the FL_SLEEP flag here; as far as I can tell, nothing in the kernel actually uses the FL_SLEEP flag in a file_lock structure once the structure is linked into the inode->i_flock list. FL_SLEEP is only used when a new lock is being created.



--- openafs-fix/src/afs/LINUX/osi_vnodeops.c~ 2007-02-02 22:23:22.000000000 -0500 +++ openafs-fix/src/afs/LINUX/osi_vnodeops.c 2007-02-08 11:58:58.000000000 -0500
@@ -552,10 +552,8 @@

     if ((code == 0 || flp->fl_type == F_UNLCK) &&
         (cmd == F_SETLK || cmd == F_SETLKW)) {
-       struct file_lock flp2;
-       flp2 = *flp;
-       flp2.fl_flags &=~ FL_SLEEP;
-       code = flock_lock_file_wait(fp, &flp2);
+       flp->fl_flags &=~ FL_SLEEP;
+       code = flock_lock_file_wait(fp, flp);
        if (code && flp->fl_type != F_UNLCK) {
            struct AFS_FLOCK flock2;
            flock2 = flock;



The third alternative solution would be to revert the addition of afs_linux_flock() for the time being.




Thanks,

Chris Wing
[EMAIL PROTECTED]
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to