These oops messages seem incomplete but I'm assuming a page fault.  The
traceback shows the fault at offset 0x10 in nfs_complete_unlink.  The
top of this function is:

void
nfs_complete_unlink(struct dentry *dentry)
{
        struct nfs_unlinkdata   *data;

        for(data = nfs_deletes; data != NULL; data = data->next) {
                if (dentry == data->dentry)
                        break;
        }

Offset 0x10 in the compiled code corresponds to the line "if (dentry ==
data->dentry)" with data in register rbx.  In the register dump we see
rbx is 0x0000000500000006, so the nfs_deletes list is corrupt.

The first comment in fs/nfs/unlink.c says:

 * NOTE: we rely on holding the BKL for list manipulation protection.

So I looked at which functions manipulate the nfs_deletes list and
found:

- nfs_async_unlink
  - called by nfs_sillyrename
    - called by nfs_unlink, holding BKL
    - called by nfs_rename, holding BKL
- nfs_complete_unlink
  - called by nfs_dentry_iput, holding BKL
- nfs_detach_unlinkdata
  - called by nfs_put_unlinkdata
    - called by nfs_complete_unlink
      - called by nfs_dentry_iput, holding BKL
    - called by nfs_async_unlink_release
      - called through rpc_call_ops::rpc_release
        - NOT called with BKL!

The locking was revised and probably fixed by patch
e4eff1a622edd6ab7b73acd5d8763aa2fa3fee49 which went into Linux 2.6.23.
It looks like this will apply to 2.6.18 with a tiny bit of fudging, but
it changes ABI.

Ben.

-- 
Ben Hutchings
The generation of random numbers is too important to be left to chance.
                                                            - Robert Coveyou

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to