On Sun, Mar 31, 2019 at 03:13:47PM -0400, Jan Harkes wrote: > On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote: > > One possibility is that there is a previous reference to the memory > > currently occupied by the spinlock. If the memory location is previously > > part of a rwsem structure and someone is still using it, you may get > > memory corruption. > > Ah, I hadn't even thought of that possibility. Good, it will open up
First of all, I have to thank you for your original patch because otherwise I probably would never have discovered that something was seriously wrong. Your patch made the problem visible. I ended up changing 'owner' to '_RET_IP_' and dumping the value of the clobbered coda inode spinlock and surrounding memory and found that the 'culprit' is in ext4_filemap_fault and despite it being in ext4, it is still a Coda specific problem. Effectively Coda overlays other filesystems' inodes for mmap, but the vma->vm_file still points at Coda's file. So when we use file_inode() in ext4_filemap_fault we end up with the Coda inode instead of the ext4 inode and when trying to grab ext4's mmap_sem we really just scribble over the memory region that happens to contain the Coda inode spinlock. A fix is to use vm_file->f_mapping->host instead of file_inode(vm_file). Of course everyone looks at ext4 as a canonical example so this problem has spread pretty much everywhere and I'm wondering how to best resolve this. - change file_inode() to follow file->f_mapping->host would fix most places, but maybe f_mapping is not always guaranteed to point at a usable place? - change Coda's mmap to replace vma->vm_file with the host file we'd probably no longer get notified when the last reference to the host file goes away, so we'd call coda_release and notify userspace on close() even when there are still active mmap regions. - fix every in-tree file system to use vma->vm_file->f_mapping->host. Jan diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 69d65d49837b..122d691d3eda 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -284,7 +284,7 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, vm_fault_t result; int retries = 0; handle_t *handle = NULL; - struct inode *inode = file_inode(vmf->vma->vm_file); + struct inode *inode = vmf->vma->vm_file->f_mapping->host; struct super_block *sb = inode->i_sb; /* diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b54b261ded36..62a0025ce7f8 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6211,7 +6211,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) int err; vm_fault_t ret; struct file *file = vma->vm_file; - struct inode *inode = file_inode(file); + struct inode *inode = file->f_mapping->host; struct address_space *mapping = inode->i_mapping; handle_t *handle; get_block_t *get_block; @@ -6302,7 +6302,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) { - struct inode *inode = file_inode(vmf->vma->vm_file); + struct inode *inode = vmf->vma->vm_file->f_mapping->host; vm_fault_t ret; down_read(&EXT4_I(inode)->i_mmap_sem);