[patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
From: Andrew Morton [EMAIL PROTECTED] This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we also revert. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Nick Piggin [EMAIL PROTECTED] mm/filemap.c |9 + mm/filemap.h |4 ++-- 2 files changed, 3 insertions(+), 10 deletions(-) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1957,12 +1957,6 @@ generic_file_buffered_write(struct kiocb break; } - if (unlikely(bytes == 0)) { - status = 0; - copied = 0; - goto zero_length_segment; - } - status = a_ops-prepare_write(file, page, offset, offset+bytes); if (unlikely(status)) { loff_t isize = i_size_read(inode); @@ -1992,8 +1986,7 @@ generic_file_buffered_write(struct kiocb page_cache_release(page); continue; } -zero_length_segment: - if (likely(copied = 0)) { + if (likely(copied 0)) { if (!status) status = copied; Index: linux-2.6/mm/filemap.h === --- linux-2.6.orig/mm/filemap.h +++ linux-2.6/mm/filemap.h @@ -87,7 +87,7 @@ filemap_set_next_iovec(const struct iove const struct iovec *iov = *iovp; size_t base = *basep; - do { + while (bytes) { int copy = min(bytes, iov-iov_len - base); bytes -= copy; @@ -96,7 +96,7 @@ filemap_set_next_iovec(const struct iove iov++; base = 0; } - } while (bytes); + } *iovp = iov; *basep = base; } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2
-- Here is an update against 2.6.21-mm2. Unfortunately UML broke for me, so test coverage isn't so good as the last time I posted the series. Also, several filesystems had significant clashes. Considering the amount of time it took to get them working, I won't fix them again. They aren't _broken_ as such, they'll just run slowly (but without the deadlock). The OCFS2 patch seemed to have some clashes too, so I've left that out. I'm sure Mark will take a look at that quickly if this patchset were to get merged. Thanks to Neil for some documentation suggestions and catching a bug, and to Vladimir for the reiserfs implementation (not 100% done yet, but it is a good start). - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 05/41] mm: debug write deadlocks
Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the difficult race where the page may be unmapped before calling copy_from_user. Makes the race much easier to hit. This is useful for demonstration and testing purposes, but is removed in a subsequent patch. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] mm/filemap.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1940,6 +1940,7 @@ generic_file_buffered_write(struct kiocb if (maxlen bytes) maxlen = bytes; +#ifndef CONFIG_DEBUG_VM /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the @@ -1947,6 +1948,7 @@ generic_file_buffered_write(struct kiocb * up-to-date. */ fault_in_pages_readable(buf, maxlen); +#endif page = __grab_cache_page(mapping,index,cached_page,lru_pvec); if (!page) { -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 03/41] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
From: Andrew Morton [EMAIL PROTECTED] This patch fixed the following bug: When prefaulting in the pages in generic_file_buffered_write(), we only faulted in the pages for the firts segment of the iovec. If the second of successive segment described a mmapping of the page into which we're write()ing, and that page is not up-to-date, the fault handler tries to lock the already-locked page (to bring it up to date) and deadlocks. An exploit for this bug is in writev-deadlock-demo.c, in http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz. (These demos assume blocksize PAGE_CACHE_SIZE). The problem with this fix is that it takes the kernel back to doing a single prepare_write()/commit_write() per iovec segment. So in the worst case we'll run prepare_write+commit_write 1024 times where we previously would have run it once. The other problem with the fix is that it fix all the locking problems. insert numbers obtained via ext3-tools's writev-speed.c here And apparently this change killed NFS overwrite performance, because, I suppose, it talks to the server for each prepare_write+commit_write. So just back that patch out - we'll be fixing the deadlock by other means. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Nick says: also it only ever actually papered over the bug, because after faulting in the pages, they might be unmapped or reclaimed. Signed-off-by: Nick Piggin [EMAIL PROTECTED] mm/filemap.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1927,21 +1927,14 @@ generic_file_buffered_write(struct kiocb do { unsigned long index; unsigned long offset; + unsigned long maxlen; size_t copied; offset = (pos (PAGE_CACHE_SIZE -1)); /* Within page */ index = pos PAGE_CACHE_SHIFT; bytes = PAGE_CACHE_SIZE - offset; - - /* Limit the size of the copy to the caller's write size */ - bytes = min(bytes, count); - - /* -* Limit the size of the copy to that of the current segment, -* because fault_in_pages_readable() doesn't know how to walk -* segments. -*/ - bytes = min(bytes, cur_iov-iov_len - iov_base); + if (bytes count) + bytes = count; /* * Bring in the user page that we will copy from _first_. @@ -1949,7 +1942,10 @@ generic_file_buffered_write(struct kiocb * same page as we're writing to, without it being marked * up-to-date. */ - fault_in_pages_readable(buf, bytes); + maxlen = cur_iov-iov_len - iov_base; + if (maxlen bytes) + maxlen = bytes; + fault_in_pages_readable(buf, maxlen); page = __grab_cache_page(mapping,index,cached_page,lru_pvec); if (!page) { -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 04/41] mm: clean up buffered write code
From: Andrew Morton [EMAIL PROTECTED] Rename some variables and fix some types. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Nick Piggin [EMAIL PROTECTED] mm/filemap.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1900,16 +1900,15 @@ generic_file_buffered_write(struct kiocb size_t count, ssize_t written) { struct file *file = iocb-ki_filp; - struct address_space * mapping = file-f_mapping; + struct address_space *mapping = file-f_mapping; const struct address_space_operations *a_ops = mapping-a_ops; struct inode*inode = mapping-host; longstatus = 0; struct page *page; struct page *cached_page = NULL; - size_t bytes; struct pagevec lru_pvec; const struct iovec *cur_iov = iov; /* current iovec */ - size_t iov_base = 0; /* offset in the current iovec */ + size_t iov_offset = 0;/* offset in the current iovec */ char __user *buf; pagevec_init(lru_pvec, 0); @@ -1920,31 +1919,33 @@ generic_file_buffered_write(struct kiocb if (likely(nr_segs == 1)) buf = iov-iov_base + written; else { - filemap_set_next_iovec(cur_iov, iov_base, written); - buf = cur_iov-iov_base + iov_base; + filemap_set_next_iovec(cur_iov, iov_offset, written); + buf = cur_iov-iov_base + iov_offset; } do { - unsigned long index; - unsigned long offset; - unsigned long maxlen; - size_t copied; + pgoff_t index; /* Pagecache index for current page */ + unsigned long offset; /* Offset into pagecache page */ + unsigned long maxlen; /* Bytes remaining in current iovec */ + size_t bytes; /* Bytes to write to page */ + size_t copied; /* Bytes copied from user */ - offset = (pos (PAGE_CACHE_SIZE -1)); /* Within page */ + offset = (pos (PAGE_CACHE_SIZE - 1)); index = pos PAGE_CACHE_SHIFT; bytes = PAGE_CACHE_SIZE - offset; if (bytes count) bytes = count; + maxlen = cur_iov-iov_len - iov_offset; + if (maxlen bytes) + maxlen = bytes; + /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the * same page as we're writing to, without it being marked * up-to-date. */ - maxlen = cur_iov-iov_len - iov_base; - if (maxlen bytes) - maxlen = bytes; fault_in_pages_readable(buf, maxlen); page = __grab_cache_page(mapping,index,cached_page,lru_pvec); @@ -1975,7 +1976,7 @@ generic_file_buffered_write(struct kiocb buf, bytes); else copied = filemap_copy_from_user_iovec(page, offset, - cur_iov, iov_base, bytes); + cur_iov, iov_offset, bytes); flush_dcache_page(page); status = a_ops-commit_write(file, page, offset, offset+bytes); if (status == AOP_TRUNCATED_PAGE) { @@ -1993,12 +1994,12 @@ generic_file_buffered_write(struct kiocb buf += status; if (unlikely(nr_segs 1)) { filemap_set_next_iovec(cur_iov, - iov_base, status); + iov_offset, status); if (count) buf = cur_iov-iov_base + - iov_base; + iov_offset; } else { - iov_base += status; + iov_offset += status; } } } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 11/41] fs: fix data-loss on error
New buffers against uptodate pages are simply be marked uptodate, while the buffer_new bit remains set. This causes error-case code to zero out parts of those buffers because it thinks they contain stale data: wrong, they are actually uptodate so this is a data loss situation. Fix this by actually clearning buffer_new and marking the buffer dirty. It makes sense to always clear buffer_new before setting a buffer uptodate. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/buffer.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1793,7 +1793,9 @@ static int __block_prepare_write(struct unmap_underlying_metadata(bh-b_bdev, bh-b_blocknr); if (PageUptodate(page)) { + clear_buffer_new(bh); set_buffer_uptodate(bh); + mark_buffer_dirty(bh); continue; } if (block_end to || block_start from) { -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 07/41] mm: buffered write cleanup
Quite a bit of code is used in maintaining these cached pages that are probably pretty unlikely to get used. It would require a narrow race where the page is inserted concurrently while this process is allocating a page in order to create the spare page. Then a multi-page write into an uncached part of the file, to make use of it. Next, the buffered write path (and others) uses its own LRU pagevec when it should be just using the per-CPU LRU pagevec (which will cut down on both data and code size cacheline footprint). Also, these private LRU pagevecs are emptied after just a very short time, in contrast with the per-CPU pagevecs that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required to add the pages to pagecache for a bulk write (in 4K chunks). [this gets rid of some cond_resched() calls in readahead.c and mpage.c due to clashes in -mm. What put them there, and why? ] Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/mpage.c | 12 mm/filemap.c | 144 ++--- mm/readahead.c | 28 +++ 3 files changed, 66 insertions(+), 118 deletions(-) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -666,26 +666,22 @@ EXPORT_SYMBOL(find_lock_page); struct page *find_or_create_page(struct address_space *mapping, unsigned long index, gfp_t gfp_mask) { - struct page *page, *cached_page = NULL; + struct page *page; int err; repeat: page = find_lock_page(mapping, index); if (!page) { - if (!cached_page) { - cached_page = alloc_page(gfp_mask); - if (!cached_page) - return NULL; - } - err = add_to_page_cache_lru(cached_page, mapping, - index, gfp_mask); - if (!err) { - page = cached_page; - cached_page = NULL; - } else if (err == -EEXIST) - goto repeat; + page = alloc_page(gfp_mask); + if (!page) + return NULL; + err = add_to_page_cache_lru(page, mapping, index, gfp_mask); + if (unlikely(err)) { + page_cache_release(page); + page = NULL; + if (err == -EEXIST) + goto repeat; + } } - if (cached_page) - page_cache_release(cached_page); return page; } EXPORT_SYMBOL(find_or_create_page); @@ -882,11 +878,9 @@ void do_generic_mapping_read(struct addr unsigned long prev_index; unsigned int prev_offset; loff_t isize; - struct page *cached_page; int error; struct file_ra_state ra = *_ra; - cached_page = NULL; index = *ppos PAGE_CACHE_SHIFT; next_index = index; prev_index = ra.prev_index; @@ -1053,23 +1047,20 @@ no_cached_page: * Ok, it wasn't cached, so we need to create a new * page.. */ - if (!cached_page) { - cached_page = page_cache_alloc_cold(mapping); - if (!cached_page) { - desc-error = -ENOMEM; - goto out; - } + page = page_cache_alloc_cold(mapping); + if (!page) { + desc-error = -ENOMEM; + goto out; } - error = add_to_page_cache_lru(cached_page, mapping, + error = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL); if (error) { + page_cache_release(page); if (error == -EEXIST) goto find_page; desc-error = error; goto out; } - page = cached_page; - cached_page = NULL; goto readpage; } @@ -1077,8 +1068,6 @@ out: *_ra = ra; *ppos = ((loff_t) index PAGE_CACHE_SHIFT) + offset; - if (cached_page) - page_cache_release(cached_page); if (filp) file_accessed(filp); } @@ -1561,35 +1550,28 @@ static struct page *__read_cache_page(st int (*filler)(void *,struct page*), void *data) { - struct page *page, *cached_page = NULL; + struct page *page; int err; repeat: page = find_get_page(mapping, index); if (!page) { -
[patch 09/41] mm: fix pagecache write deadlocks
Modify the core write() code so that it won't take a pagefault while holding a lock on the pagecache page. There are a number of different deadlocks possible if we try to do such a thing: 1. generic_buffered_write 2. lock_page 3.prepare_write 4. unlock_page+vmtruncate 5. copy_from_user 6. mmap_sem(r) 7. handle_mm_fault 8.lock_page (filemap_nopage) 9.commit_write 10. unlock_page a. sys_munmap / sys_mlock / others b. mmap_sem(w) c. make_pages_present d.get_user_pages e. handle_mm_fault f. lock_page (filemap_nopage) 2,8 - recursive deadlock if page is same 2,8;2,8 - ABBA deadlock is page is different 2,6;b,f - ABBA deadlock if page is same The solution is as follows: 1. If we find the destination page is uptodate, continue as normal, but use atomic usercopies which do not take pagefaults and do not zero the uncopied tail of the destination. The destination is already uptodate, so we can commit_write the full length even if there was a partial copy: it does not matter that the tail was not modified, because if it is dirtied and written back to disk it will not cause any problems (uptodate *means* that the destination page is as new or newer than the copy on disk). 1a. The above requires that fault_in_pages_readable correctly returns access information, because atomic usercopies cannot distinguish between non-present pages in a readable mapping, from lack of a readable mapping. 2. If we find the destination page is non uptodate, unlock it (this could be made slightly more optimal), then allocate a temporary page to copy the source data into. Relock the destination page and continue with the copy. However, instead of a usercopy (which might take a fault), copy the data from the pinned temporary page via the kernel address space. (also, rename maxlen to seglen, because it was confusing) This increases the CPU/memory copy cost by almost 50% on the affected workloads. That will be solved by introducing a new set of pagecache write aops in a subsequent patch. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] include/linux/pagemap.h | 11 +++- mm/filemap.c| 114 2 files changed, 104 insertions(+), 21 deletions(-) Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1889,11 +1889,12 @@ generic_file_buffered_write(struct kiocb filemap_set_next_iovec(cur_iov, nr_segs, iov_offset, written); do { + struct page *src_page; struct page *page; pgoff_t index; /* Pagecache index for current page */ unsigned long offset; /* Offset into pagecache page */ - unsigned long maxlen; /* Bytes remaining in current iovec */ - size_t bytes; /* Bytes to write to page */ + unsigned long seglen; /* Bytes remaining in current iovec */ + unsigned long bytes;/* Bytes to write to page */ size_t copied; /* Bytes copied from user */ buf = cur_iov-iov_base + iov_offset; @@ -1903,20 +1904,30 @@ generic_file_buffered_write(struct kiocb if (bytes count) bytes = count; - maxlen = cur_iov-iov_len - iov_offset; - if (maxlen bytes) - maxlen = bytes; + /* +* a non-NULL src_page indicates that we're doing the +* copy via get_user_pages and kmap. +*/ + src_page = NULL; + + seglen = cur_iov-iov_len - iov_offset; + if (seglen bytes) + seglen = bytes; -#ifndef CONFIG_DEBUG_VM /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the * same page as we're writing to, without it being marked * up-to-date. +* +* Not only is this an optimisation, but it is also required +* to check that the address is actually valid, when atomic +* usercopies are used, below. */ - fault_in_pages_readable(buf, maxlen); -#endif - + if (unlikely(fault_in_pages_readable(buf, seglen))) { + status = -EFAULT; + break; + } page = __grab_cache_page(mapping, index); if (!page) { @@ -1924,32 +1935,104 @@ generic_file_buffered_write(struct kiocb break; } + /* +* non-uptodate pages
[patch 20/41] xfs convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/xfs/linux-2.6/xfs_aops.c | 19 --- fs/xfs/linux-2.6/xfs_lrw.c | 35 --- 2 files changed, 24 insertions(+), 30 deletions(-) Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c @@ -1479,13 +1479,18 @@ xfs_vm_direct_IO( } STATIC int -xfs_vm_prepare_write( +xfs_vm_write_begin( struct file *file, - struct page *page, - unsigned intfrom, - unsigned intto) + struct address_space*mapping, + loff_t pos, + unsignedlen, + unsignedflags, + struct page **pagep, + void**fsdata) { - return block_prepare_write(page, from, to, xfs_get_blocks); + *pagep = NULL; + return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + xfs_get_blocks); } STATIC sector_t @@ -1539,8 +1544,8 @@ const struct address_space_operations xf .sync_page = block_sync_page, .releasepage= xfs_vm_releasepage, .invalidatepage = xfs_vm_invalidatepage, - .prepare_write = xfs_vm_prepare_write, - .commit_write = generic_commit_write, + .write_begin= xfs_vm_write_begin, + .write_end = generic_write_end, .bmap = xfs_vm_bmap, .direct_IO = xfs_vm_direct_IO, .migratepage= buffer_migrate_page, Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c +++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c @@ -134,45 +134,34 @@ xfs_iozero( loff_t pos,/* offset in file */ size_t count) /* size of data to zero */ { - unsignedbytes; struct page *page; struct address_space*mapping; int status; mapping = ip-i_mapping; do { - unsigned long index, offset; + unsigned offset, bytes; + void *fsdata; offset = (pos (PAGE_CACHE_SIZE -1)); /* Within page */ - index = pos PAGE_CACHE_SHIFT; bytes = PAGE_CACHE_SIZE - offset; if (bytes count) bytes = count; - status = -ENOMEM; - page = grab_cache_page(mapping, index); - if (!page) - break; - - status = mapping-a_ops-prepare_write(NULL, page, offset, - offset + bytes); + status = pagecache_write_begin(NULL, mapping, pos, bytes, + AOP_FLAG_UNINTERRUPTIBLE, + page, fsdata); if (status) - goto unlock; + break; zero_user_page(page, offset, bytes, KM_USER0); - status = mapping-a_ops-commit_write(NULL, page, offset, - offset + bytes); - if (!status) { - pos += bytes; - count -= bytes; - } - -unlock: - unlock_page(page); - page_cache_release(page); - if (status) - break; + status = pagecache_write_end(NULL, mapping, pos, bytes, bytes, + page, fsdata); + WARN_ON(status = 0); /* can't return less than zero! */ + pos += bytes; + count -= bytes; + status = 0; } while (count); return (-status); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 14/41] implement simple fs aops
Implement new aops for some of the simpler filesystems. Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/configfs/inode.c |4 ++-- fs/hugetlbfs/inode.c | 16 ++-- fs/ramfs/file-mmu.c |4 ++-- fs/ramfs/file-nommu.c |4 ++-- fs/sysfs/inode.c |4 ++-- mm/shmem.c| 35 --- 6 files changed, 46 insertions(+), 21 deletions(-) Index: linux-2.6/mm/shmem.c === --- linux-2.6.orig/mm/shmem.c +++ linux-2.6/mm/shmem.c @@ -1109,7 +1109,7 @@ static int shmem_getpage(struct inode *i * Normally, filepage is NULL on entry, and either found * uptodate immediately, or allocated and zeroed, or read * in under swappage, which is then assigned to filepage. -* But shmem_prepare_write passes in a locked filepage, +* But shmem_write_begin passes in a locked filepage, * which may be found not uptodate by other callers too, * and may need to be copied from the swappage read in. */ @@ -1454,14 +1454,35 @@ static const struct inode_operations shm static const struct inode_operations shmem_symlink_inline_operations; /* - * Normally tmpfs makes no use of shmem_prepare_write, but it + * Normally tmpfs makes no use of shmem_write_begin, but it * lets a tmpfs file be used read-write below the loop driver. */ static int -shmem_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to) +shmem_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + struct inode *inode = mapping-host; + pgoff_t index = pos PAGE_CACHE_SHIFT; + *pagep = NULL; + return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL); +} + +static int +shmem_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { - struct inode *inode = page-mapping-host; - return shmem_getpage(inode, page-index, page, SGP_WRITE, NULL); + struct inode *inode = mapping-host; + + set_page_dirty(page); + mark_page_accessed(page); + page_cache_release(page); + + if (pos+copied inode-i_size) + i_size_write(inode, pos+copied); + + return copied; } static ssize_t @@ -2357,8 +2378,8 @@ static const struct address_space_operat .writepage = shmem_writepage, .set_page_dirty = __set_page_dirty_no_writeback, #ifdef CONFIG_TMPFS - .prepare_write = shmem_prepare_write, - .commit_write = simple_commit_write, + .write_begin= shmem_write_begin, + .write_end = shmem_write_end, #endif .migratepage= migrate_page, }; Index: linux-2.6/fs/configfs/inode.c === --- linux-2.6.orig/fs/configfs/inode.c +++ linux-2.6/fs/configfs/inode.c @@ -40,8 +40,8 @@ extern struct super_block * configfs_sb; static const struct address_space_operations configfs_aops = { .readpage = simple_readpage, - .prepare_write = simple_prepare_write, - .commit_write = simple_commit_write + .write_begin= simple_write_begin, + .write_end = simple_write_end, }; static struct backing_dev_info configfs_backing_dev_info = { Index: linux-2.6/fs/sysfs/inode.c === --- linux-2.6.orig/fs/sysfs/inode.c +++ linux-2.6/fs/sysfs/inode.c @@ -20,8 +20,8 @@ extern struct super_block * sysfs_sb; static const struct address_space_operations sysfs_aops = { .readpage = simple_readpage, - .prepare_write = simple_prepare_write, - .commit_write = simple_commit_write + .write_begin= simple_write_begin, + .write_end = simple_write_end, }; static struct backing_dev_info sysfs_backing_dev_info = { Index: linux-2.6/fs/ramfs/file-mmu.c === --- linux-2.6.orig/fs/ramfs/file-mmu.c +++ linux-2.6/fs/ramfs/file-mmu.c @@ -29,8 +29,8 @@ const struct address_space_operations ramfs_aops = { .readpage = simple_readpage, - .prepare_write = simple_prepare_write, - .commit_write = simple_commit_write, + .write_begin= simple_write_begin, + .write_end = simple_write_end, .set_page_dirty = __set_page_dirty_no_writeback, }; Index: linux-2.6/fs/ramfs/file-nommu.c === --- linux-2.6.orig/fs/ramfs/file-nommu.c +++ linux-2.6/fs/ramfs/file-nommu.c @@ -29,8 +29,8 @@ static int ramfs_nommu_setattr(struct de const struct address_space_operations
[patch 26/41] hpfs convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/hpfs/file.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) Index: linux-2.6/fs/hpfs/file.c === --- linux-2.6.orig/fs/hpfs/file.c +++ linux-2.6/fs/hpfs/file.c @@ -86,25 +86,33 @@ static int hpfs_writepage(struct page *p { return block_write_full_page(page,hpfs_get_block, wbc); } + static int hpfs_readpage(struct file *file, struct page *page) { return block_read_full_page(page,hpfs_get_block); } -static int hpfs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) -{ - return cont_prepare_write(page,from,to,hpfs_get_block, - hpfs_i(page-mapping-host)-mmu_private); + +static int hpfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + hpfs_get_block, + hpfs_i(mapping-host)-mmu_private); } + static sector_t _hpfs_bmap(struct address_space *mapping, sector_t block) { return generic_block_bmap(mapping,block,hpfs_get_block); } + const struct address_space_operations hpfs_aops = { .readpage = hpfs_readpage, .writepage = hpfs_writepage, .sync_page = block_sync_page, - .prepare_write = hpfs_prepare_write, - .commit_write = generic_commit_write, + .write_begin = hpfs_write_begin, + .write_end = generic_write_end, .bmap = _hpfs_bmap }; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 17/41] ext2 convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/ext2/dir.c | 47 +-- fs/ext2/ext2.h |3 +++ fs/ext2/inode.c | 24 +--- 3 files changed, 45 insertions(+), 29 deletions(-) Index: linux-2.6/fs/ext2/inode.c === --- linux-2.6.orig/fs/ext2/inode.c +++ linux-2.6/fs/ext2/inode.c @@ -726,18 +726,21 @@ ext2_readpages(struct file *file, struct return mpage_readpages(mapping, pages, nr_pages, ext2_get_block); } -static int -ext2_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +int __ext2_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return block_prepare_write(page,from,to,ext2_get_block); + return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ext2_get_block); } static int -ext2_nobh_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +ext2_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return nobh_prepare_write(page,from,to,ext2_get_block); + *pagep = NULL; + return __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata); } static int ext2_nobh_writepage(struct page *page, @@ -773,8 +776,8 @@ const struct address_space_operations ex .readpages = ext2_readpages, .writepage = ext2_writepage, .sync_page = block_sync_page, - .prepare_write = ext2_prepare_write, - .commit_write = generic_commit_write, + .write_begin= ext2_write_begin, + .write_end = generic_write_end, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, @@ -791,8 +794,7 @@ const struct address_space_operations ex .readpages = ext2_readpages, .writepage = ext2_nobh_writepage, .sync_page = block_sync_page, - .prepare_write = ext2_nobh_prepare_write, - .commit_write = nobh_commit_write, + /* XXX: todo */ .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, Index: linux-2.6/fs/ext2/dir.c === --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c @@ -22,6 +22,7 @@ */ #include ext2.h +#include linux/buffer_head.h #include linux/pagemap.h typedef struct ext2_dir_entry_2 ext2_dirent; @@ -61,12 +62,14 @@ ext2_last_byte(struct inode *inode, unsi return last_byte; } -static int ext2_commit_chunk(struct page *page, unsigned from, unsigned to) +static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len) { - struct inode *dir = page-mapping-host; + struct address_space *mapping = page-mapping; + struct inode *dir = mapping-host; int err = 0; + dir-i_version++; - page-mapping-a_ops-commit_write(NULL, page, from, to); + block_write_end(NULL, mapping, pos, len, len, page, NULL); if (IS_DIRSYNC(dir)) err = write_one_page(page, 1); else @@ -412,16 +415,18 @@ ino_t ext2_inode_by_name(struct inode * void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de, struct page *page, struct inode *inode) { - unsigned from = (char *) de - (char *) page_address(page); - unsigned to = from + le16_to_cpu(de-rec_len); + loff_t pos = (page-index PAGE_CACHE_SHIFT) + + (char *) de - (char *) page_address(page); + unsigned len = le16_to_cpu(de-rec_len); int err; lock_page(page); - err = page-mapping-a_ops-prepare_write(NULL, page, from, to); + err = __ext2_write_begin(NULL, page-mapping, pos, len, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); BUG_ON(err); de-inode = cpu_to_le32(inode-i_ino); - ext2_set_de_type (de, inode); - err = ext2_commit_chunk(page, from, to); + ext2_set_de_type(de, inode); + err = ext2_commit_chunk(page, pos, len); ext2_put_page(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME_SEC; EXT2_I(dir)-i_flags = ~EXT2_BTREE_FL; @@ -444,7 +449,7 @@ int ext2_add_link (struct dentry *dentry unsigned long npages = dir_pages(dir); unsigned long n; char *kaddr; -
[patch 19/41] ext4 convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Convert ext4 to use write_begin()/write_end() methods. Signed-off-by: Badari Pulavarty [EMAIL PROTECTED] fs/ext4/inode.c | 147 +++- 1 file changed, 93 insertions(+), 54 deletions(-) Index: linux-2.6/fs/ext4/inode.c === --- linux-2.6.orig/fs/ext4/inode.c +++ linux-2.6/fs/ext4/inode.c @@ -1146,34 +1146,50 @@ static int do_journal_get_write_access(h return ext4_journal_get_write_access(handle, bh); } -static int ext4_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ext4_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct inode *inode = page-mapping-host; + struct inode *inode = mapping-host; int ret, needed_blocks = ext4_writepage_trans_blocks(inode); handle_t *handle; int retries = 0; + struct page *page; + pgoff_t index; + unsigned from, to; + + index = pos PAGE_CACHE_SHIFT; + from = pos (PAGE_CACHE_SIZE - 1); + to = from + len; retry: - handle = ext4_journal_start(inode, needed_blocks); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto out; + page = __grab_cache_page(mapping, index); + if (!page) + return -ENOMEM; + *pagep = page; + + handle = ext4_journal_start(inode, needed_blocks); + if (IS_ERR(handle)) { + unlock_page(page); + page_cache_release(page); + ret = PTR_ERR(handle); + goto out; } - if (test_opt(inode-i_sb, NOBH) ext4_should_writeback_data(inode)) - ret = nobh_prepare_write(page, from, to, ext4_get_block); - else - ret = block_prepare_write(page, from, to, ext4_get_block); - if (ret) - goto prepare_write_failed; - if (ext4_should_journal_data(inode)) { + ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ext4_get_block); + + if (!ret ext4_should_journal_data(inode)) { ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, do_journal_get_write_access); } -prepare_write_failed: - if (ret) + + if (ret) { ext4_journal_stop(handle); + unlock_page(page); + page_cache_release(page); + } + if (ret == -ENOSPC ext4_should_retry_alloc(inode-i_sb, retries)) goto retry; out: @@ -1185,12 +1201,12 @@ int ext4_journal_dirty_data(handle_t *ha int err = jbd2_journal_dirty_data(handle, bh); if (err) ext4_journal_abort_handle(__FUNCTION__, __FUNCTION__, - bh, handle,err); + bh, handle, err); return err; } -/* For commit_write() in data=journal mode */ -static int commit_write_fn(handle_t *handle, struct buffer_head *bh) +/* For write_end() in data=journal mode */ +static int write_end_fn(handle_t *handle, struct buffer_head *bh) { if (!buffer_mapped(bh) || buffer_freed(bh)) return 0; @@ -1205,78 +1221,100 @@ static int commit_write_fn(handle_t *han * ext4 never places buffers on inode-i_mapping-private_list. metadata * buffers are managed internally. */ -static int ext4_ordered_commit_write(struct file *file, struct page *page, -unsigned from, unsigned to) +static int ext4_ordered_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { handle_t *handle = ext4_journal_current_handle(); - struct inode *inode = page-mapping-host; + struct inode *inode = file-f_mapping-host; + unsigned from, to; int ret = 0, ret2; + from = pos (PAGE_CACHE_SIZE - 1); + to = from + len; + ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, ext4_journal_dirty_data); if (ret == 0) { /* -* generic_commit_write() will run mark_inode_dirty() if i_size +* generic_write_end() will run mark_inode_dirty() if i_size * changes. So let's piggyback the i_disksize mark_inode_dirty * into that. */ loff_t new_i_size; - new_i_size = ((loff_t)page-index PAGE_CACHE_SHIFT) + to; +
[patch 18/41] ext3 convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] Various fixes and improvements Signed-off-by: Badari Pulavarty [EMAIL PROTECTED] fs/ext3/inode.c | 136 1 file changed, 88 insertions(+), 48 deletions(-) Index: linux-2.6/fs/ext3/inode.c === --- linux-2.6.orig/fs/ext3/inode.c +++ linux-2.6/fs/ext3/inode.c @@ -1147,51 +1147,68 @@ static int do_journal_get_write_access(h return ext3_journal_get_write_access(handle, bh); } -static int ext3_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ext3_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct inode *inode = page-mapping-host; + struct inode *inode = mapping-host; int ret, needed_blocks = ext3_writepage_trans_blocks(inode); handle_t *handle; int retries = 0; + struct page *page; + pgoff_t index; + unsigned from, to; + + index = pos PAGE_CACHE_SHIFT; + from = pos (PAGE_CACHE_SIZE - 1); + to = from + len; retry: + page = __grab_cache_page(mapping, index); + if (!page) + return -ENOMEM; + *pagep = page; + handle = ext3_journal_start(inode, needed_blocks); if (IS_ERR(handle)) { + unlock_page(page); + page_cache_release(page); ret = PTR_ERR(handle); goto out; } - if (test_opt(inode-i_sb, NOBH) ext3_should_writeback_data(inode)) - ret = nobh_prepare_write(page, from, to, ext3_get_block); - else - ret = block_prepare_write(page, from, to, ext3_get_block); + ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ext3_get_block); if (ret) - goto prepare_write_failed; + goto write_begin_failed; if (ext3_should_journal_data(inode)) { ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, do_journal_get_write_access); } -prepare_write_failed: - if (ret) +write_begin_failed: + if (ret) { ext3_journal_stop(handle); + unlock_page(page); + page_cache_release(page); + } if (ret == -ENOSPC ext3_should_retry_alloc(inode-i_sb, retries)) goto retry; out: return ret; } + int ext3_journal_dirty_data(handle_t *handle, struct buffer_head *bh) { int err = journal_dirty_data(handle, bh); if (err) ext3_journal_abort_handle(__FUNCTION__, __FUNCTION__, - bh, handle,err); + bh, handle, err); return err; } -/* For commit_write() in data=journal mode */ -static int commit_write_fn(handle_t *handle, struct buffer_head *bh) +/* For write_end() in data=journal mode */ +static int write_end_fn(handle_t *handle, struct buffer_head *bh) { if (!buffer_mapped(bh) || buffer_freed(bh)) return 0; @@ -1206,78 +1223,100 @@ static int commit_write_fn(handle_t *han * ext3 never places buffers on inode-i_mapping-private_list. metadata * buffers are managed internally. */ -static int ext3_ordered_commit_write(struct file *file, struct page *page, -unsigned from, unsigned to) +static int ext3_ordered_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = page-mapping-host; + struct inode *inode = file-f_mapping-host; + unsigned from, to; int ret = 0, ret2; + from = pos (PAGE_CACHE_SIZE - 1); + to = from + len; + ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, ext3_journal_dirty_data); if (ret == 0) { /* -* generic_commit_write() will run mark_inode_dirty() if i_size +* generic_write_end() will run mark_inode_dirty() if i_size * changes. So let's piggyback the i_disksize mark_inode_dirty * into that. */ loff_t new_i_size; - new_i_size = ((loff_t)page-index PAGE_CACHE_SHIFT) + to; + new_i_size = pos + copied; if (new_i_size EXT3_I(inode)-i_disksize)
[patch 16/41] rd convert to new aops.
Also clean up various little things. I've got rid of the comment from akpm, because now that make_page_uptodate is only called from 2 places, it is pretty easy to see that the buffers are in an uptodate state at the time of the call. Actually, it was OK before my patch as well, because the memset is equivalent to reading from disk of course... however it is more explicit where the updates come from now. Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] drivers/block/rd.c | 125 ++--- 1 file changed, 73 insertions(+), 52 deletions(-) Index: linux-2.6/drivers/block/rd.c === --- linux-2.6.orig/drivers/block/rd.c +++ linux-2.6/drivers/block/rd.c @@ -104,50 +104,60 @@ static void make_page_uptodate(struct pa struct buffer_head *head = bh; do { - if (!buffer_uptodate(bh)) { - memset(bh-b_data, 0, bh-b_size); - /* -* akpm: I'm totally undecided about this. The -* buffer has just been magically brought up to -* date, but nobody should want to be reading -* it anyway, because it hasn't been used for -* anything yet. It is still in a not read -* from disk yet state. -* -* But non-uptodate buffers against an uptodate -* page are against the rules. So do it anyway. -*/ + if (!buffer_uptodate(bh)) set_buffer_uptodate(bh); - } } while ((bh = bh-b_this_page) != head); - } else { - memset(page_address(page), 0, PAGE_CACHE_SIZE); } - flush_dcache_page(page); SetPageUptodate(page); } static int ramdisk_readpage(struct file *file, struct page *page) { - if (!PageUptodate(page)) + if (!PageUptodate(page)) { + memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE); make_page_uptodate(page); + } unlock_page(page); return 0; } -static int ramdisk_prepare_write(struct file *file, struct page *page, - unsigned offset, unsigned to) -{ - if (!PageUptodate(page)) - make_page_uptodate(page); +static int ramdisk_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + struct page *page; + pgoff_t index = pos PAGE_CACHE_SHIFT; + + page = __grab_cache_page(mapping, index); + if (!page) + return -ENOMEM; + *pagep = page; return 0; } -static int ramdisk_commit_write(struct file *file, struct page *page, - unsigned offset, unsigned to) -{ +static int ramdisk_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) +{ + if (!PageUptodate(page)) { + if (copied != PAGE_CACHE_SIZE) { + void *dst; + unsigned from = pos (PAGE_CACHE_SIZE - 1); + unsigned to = from + copied; + + dst = kmap_atomic(page, KM_USER0); + memset(dst, 0, from); + memset(dst + to, 0, PAGE_CACHE_SIZE - to); + flush_dcache_page(page); + kunmap_atomic(dst, KM_USER0); + } + make_page_uptodate(page); + } + set_page_dirty(page); - return 0; + unlock_page(page); + page_cache_release(page); + return copied; } /* @@ -191,8 +201,8 @@ static int ramdisk_set_page_dirty(struct static const struct address_space_operations ramdisk_aops = { .readpage = ramdisk_readpage, - .prepare_write = ramdisk_prepare_write, - .commit_write = ramdisk_commit_write, + .write_begin= ramdisk_write_begin, + .write_end = ramdisk_write_end, .writepage = ramdisk_writepage, .set_page_dirty = ramdisk_set_page_dirty, .writepages = ramdisk_writepages, @@ -201,13 +211,14 @@ static const struct address_space_operat static int rd_blkdev_pagecache_IO(int rw, struct bio_vec *vec, sector_t sector, struct address_space *mapping) { - pgoff_t index = sector (PAGE_CACHE_SHIFT - 9); + loff_t pos = sector 9; unsigned int vec_offset =
[patch 08/41] mm: write iovec cleanup
Hide some of the open-coded nr_segs tests into the iovec helpers. This is all to simplify generic_file_buffered_write, because that gets more complex in the next patch. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] mm/filemap.c | 36 +-- mm/filemap.h | 104 +++ mm/filemap_xip.c | 17 +++- 3 files changed, 69 insertions(+), 88 deletions(-) Index: linux-2.6/mm/filemap.h === --- linux-2.6.orig/mm/filemap.h +++ linux-2.6/mm/filemap.h @@ -22,82 +22,82 @@ __filemap_copy_from_user_iovec_inatomic( /* * Copy as much as we can into the page and return the number of bytes which - * were sucessfully copied. If a fault is encountered then clear the page - * out to (offset+bytes) and return the number of bytes which were copied. - * - * NOTE: For this to work reliably we really want copy_from_user_inatomic_nocache - * to *NOT* zero any tail of the buffer that it failed to copy. If it does, - * and if the following non-atomic copy succeeds, then there is a small window - * where the target page contains neither the data before the write, nor the - * data after the write (it contains zero). A read at this time will see - * data that is inconsistent with any ordering of the read and the write. - * (This has been detected in practice). + * were sucessfully copied. If a fault is encountered then return the number of + * bytes which were copied. */ static inline size_t -filemap_copy_from_user(struct page *page, unsigned long offset, - const char __user *buf, unsigned bytes) +filemap_copy_from_user_atomic(struct page *page, unsigned long offset, + const struct iovec *iov, unsigned long nr_segs, + size_t base, size_t bytes) { char *kaddr; - int left; + size_t copied; kaddr = kmap_atomic(page, KM_USER0); - left = __copy_from_user_inatomic_nocache(kaddr + offset, buf, bytes); + if (likely(nr_segs == 1)) { + int left; + char __user *buf = iov-iov_base + base; + left = __copy_from_user_inatomic_nocache(kaddr + offset, + buf, bytes); + copied = bytes - left; + } else { + copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, + iov, base, bytes); + } kunmap_atomic(kaddr, KM_USER0); - if (left != 0) { - /* Do it the slow way */ - kaddr = kmap(page); - left = __copy_from_user_nocache(kaddr + offset, buf, bytes); - kunmap(page); - } - return bytes - left; + return copied; } /* - * This has the same sideeffects and return value as filemap_copy_from_user(). - * The difference is that on a fault we need to memset the remainder of the - * page (out to offset+bytes), to emulate filemap_copy_from_user()'s - * single-segment behaviour. + * This has the same sideeffects and return value as + * filemap_copy_from_user_atomic(). + * The difference is that it attempts to resolve faults. */ static inline size_t -filemap_copy_from_user_iovec(struct page *page, unsigned long offset, - const struct iovec *iov, size_t base, size_t bytes) +filemap_copy_from_user(struct page *page, unsigned long offset, + const struct iovec *iov, unsigned long nr_segs, +size_t base, size_t bytes) { char *kaddr; size_t copied; - kaddr = kmap_atomic(page, KM_USER0); - copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov, -base, bytes); - kunmap_atomic(kaddr, KM_USER0); - if (copied != bytes) { - kaddr = kmap(page); - copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov, -base, bytes); - if (bytes - copied) - memset(kaddr + offset + copied, 0, bytes - copied); - kunmap(page); + kaddr = kmap(page); + if (likely(nr_segs == 1)) { + int left; + char __user *buf = iov-iov_base + base; + left = __copy_from_user_nocache(kaddr + offset, buf, bytes); + copied = bytes - left; + } else { + copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, + iov, base, bytes); } + kunmap(page); return copied; } static inline void -filemap_set_next_iovec(const struct iovec **iovp, size_t *basep, size_t bytes)
[patch 10/41] mm: buffered write iterator
Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. Cc: Linux Memory Management [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] include/linux/fs.h | 33 mm/filemap.c | 144 +++-- mm/filemap.h | 103 - 3 files changed, 150 insertions(+), 130 deletions(-) Index: linux-2.6/include/linux/fs.h === --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -404,6 +404,39 @@ struct page; struct address_space; struct writeback_control; +struct iov_iter { + const struct iovec *iov; + unsigned long nr_segs; + size_t iov_offset; + size_t count; +}; + +size_t iov_iter_copy_from_user_atomic(struct page *page, + struct iov_iter *i, unsigned long offset, size_t bytes); +size_t iov_iter_copy_from_user(struct page *page, + struct iov_iter *i, unsigned long offset, size_t bytes); +void iov_iter_advance(struct iov_iter *i, size_t bytes); +int iov_iter_fault_in_readable(struct iov_iter *i); +size_t iov_iter_single_seg_count(struct iov_iter *i); + +static inline void iov_iter_init(struct iov_iter *i, + const struct iovec *iov, unsigned long nr_segs, + size_t count, size_t written) +{ + i-iov = iov; + i-nr_segs = nr_segs; + i-iov_offset = 0; + i-count = count + written; + + iov_iter_advance(i, written); +} + +static inline size_t iov_iter_count(struct iov_iter *i) +{ + return i-count; +} + + struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -30,7 +30,7 @@ #include linux/security.h #include linux/syscalls.h #include linux/cpuset.h -#include filemap.h +#include linux/hardirq.h /* for BUG_ON(!in_atomic()) only */ #include internal.h /* @@ -1696,8 +1696,7 @@ int remove_suid(struct dentry *dentry) } EXPORT_SYMBOL(remove_suid); -size_t -__filemap_copy_from_user_iovec_inatomic(char *vaddr, +static size_t __iovec_copy_from_user_inatomic(char *vaddr, const struct iovec *iov, size_t base, size_t bytes) { size_t copied = 0, left = 0; @@ -1720,6 +1719,110 @@ __filemap_copy_from_user_iovec_inatomic( } /* + * Copy as much as we can into the page and return the number of bytes which + * were sucessfully copied. If a fault is encountered then return the number of + * bytes which were copied. + */ +size_t iov_iter_copy_from_user_atomic(struct page *page, + struct iov_iter *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + BUG_ON(!in_atomic()); + kaddr = kmap_atomic(page, KM_USER0); + if (likely(i-nr_segs == 1)) { + int left; + char __user *buf = i-iov-iov_base + i-iov_offset; + left = __copy_from_user_inatomic_nocache(kaddr + offset, + buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i-iov, i-iov_offset, bytes); + } + kunmap_atomic(kaddr, KM_USER0); + + return copied; +} + +/* + * This has the same sideeffects and return value as + * iov_iter_copy_from_user_atomic(). + * The difference is that it attempts to resolve faults. + * Page must not be locked. + */ +size_t iov_iter_copy_from_user(struct page *page, + struct iov_iter *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + kaddr = kmap(page); + if (likely(i-nr_segs == 1)) { + int left; + char __user *buf = i-iov-iov_base + i-iov_offset; + left = __copy_from_user_nocache(kaddr + offset, buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i-iov, i-iov_offset, bytes); + } + kunmap(page); + return copied; +} + +static void __iov_iter_advance_iov(struct iov_iter *i, size_t bytes) +{ + if (likely(i-nr_segs == 1)) { + i-iov_offset += bytes; + } else { + const struct iovec *iov = i-iov; + size_t base = i-iov_offset; + + while (bytes) { + int copy = min(bytes, iov-iov_len - base); + + bytes -= copy; +
[patch 32/41] smb convert to new aops.
Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/smbfs/file.c | 34 +- 1 file changed, 25 insertions(+), 9 deletions(-) Index: linux-2.6/fs/smbfs/file.c === --- linux-2.6.orig/fs/smbfs/file.c +++ linux-2.6/fs/smbfs/file.c @@ -290,29 +290,45 @@ out: * If the writer ends up delaying the write, the writer needs to * increment the page use counts until he is done with the page. */ -static int smb_prepare_write(struct file *file, struct page *page, -unsigned offset, unsigned to) -{ +static int smb_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + pgoff_t index = pos PAGE_CACHE_SHIFT; + *pagep = __grab_cache_page(mapping, index); + if (!*pagep) + return -ENOMEM; return 0; } -static int smb_commit_write(struct file *file, struct page *page, - unsigned offset, unsigned to) +static int smb_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { int status; + unsigned offset = pos (PAGE_CACHE_SIZE - 1); - status = -EFAULT; lock_kernel(); - status = smb_updatepage(file, page, offset, to-offset); + status = smb_updatepage(file, page, offset, copied); unlock_kernel(); + + if (!status) { + if (!PageUptodate(page) copied == PAGE_CACHE_SIZE) + SetPageUptodate(page); + status = copied; + } + + unlock_page(page); + page_cache_release(page); + return status; } const struct address_space_operations smb_file_aops = { .readpage = smb_readpage, .writepage = smb_writepage, - .prepare_write = smb_prepare_write, - .commit_write = smb_commit_write + .write_begin = smb_write_begin, + .write_end = smb_write_end, }; /* -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 33/41] GFS2 convert to new aops.
From: Steven Whitehouse [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] fs/gfs2/ops_address.c | 209 +- 1 file changed, 125 insertions(+), 84 deletions(-) Index: linux-2.6/fs/gfs2/ops_address.c === --- linux-2.6.orig/fs/gfs2/ops_address.c +++ linux-2.6/fs/gfs2/ops_address.c @@ -17,6 +17,7 @@ #include linux/mpage.h #include linux/fs.h #include linux/writeback.h +#include linux/swap.h #include linux/gfs2_ondisk.h #include linux/lm_interface.h @@ -348,45 +349,49 @@ out_unlock: } /** - * gfs2_prepare_write - Prepare to write a page to a file + * gfs2_write_begin - Begin to write to a file * @file: The file to write to - * @page: The page which is to be prepared for writing - * @from: From (byte range within page) - * @to: To (byte range within page) + * @mapping: The mapping in which to write + * @pos: The file offset at which to start writing + * @len: Length of the write + * @flags: Various flags + * @pagep: Pointer to return the page + * @fsdata: Pointer to return fs data (unused by GFS2) * * Returns: errno */ -static int gfs2_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int gfs2_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct gfs2_inode *ip = GFS2_I(page-mapping-host); - struct gfs2_sbd *sdp = GFS2_SB(page-mapping-host); + struct gfs2_inode *ip = GFS2_I(mapping-host); + struct gfs2_sbd *sdp = GFS2_SB(mapping-host); unsigned int data_blocks, ind_blocks, rblocks; int alloc_required; int error = 0; - loff_t pos = ((loff_t)page-index PAGE_CACHE_SHIFT) + from; - loff_t end = ((loff_t)page-index PAGE_CACHE_SHIFT) + to; struct gfs2_alloc *al; - unsigned int write_len = to - from; + pgoff_t index = pos PAGE_CACHE_SHIFT; + unsigned from = pos (PAGE_CACHE_SIZE - 1); + unsigned to = from + len; + struct page *page; - - gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, ip-i_gh); + gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME, ip-i_gh); error = gfs2_glock_nq_atime(ip-i_gh); - if (unlikely(error)) { - if (error == GLR_TRYFAILED) { - unlock_page(page); - error = AOP_TRUNCATED_PAGE; - yield(); - } + if (unlikely(error)) goto out_uninit; - } - gfs2_write_calc_reserv(ip, write_len, data_blocks, ind_blocks); + error = -ENOMEM; + page = __grab_cache_page(mapping, index); + *pagep = page; + if (!page) + goto out_unlock; + + gfs2_write_calc_reserv(ip, len, data_blocks, ind_blocks); - error = gfs2_write_alloc_required(ip, pos, write_len, alloc_required); + error = gfs2_write_alloc_required(ip, pos, len, alloc_required); if (error) - goto out_unlock; + goto out_putpage; ip-i_alloc.al_requested = 0; @@ -418,7 +423,7 @@ static int gfs2_prepare_write(struct fil goto out; if (gfs2_is_stuffed(ip)) { - if (end sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) { + if (pos + len sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) { error = gfs2_unstuff_dinode(ip, page); if (error == 0) goto prepare_write; @@ -440,6 +445,10 @@ out_qunlock: out_alloc_put: gfs2_alloc_put(ip); } +out_putpage: + page_cache_release(page); + if (pos + len ip-i_inode.i_size) + vmtruncate(ip-i_inode, ip-i_inode.i_size); out_unlock: gfs2_glock_dq_m(1, ip-i_gh); out_uninit: @@ -450,96 +459,128 @@ out_uninit: } /** - * gfs2_commit_write - Commit write to a file + * gfs2_stuffed_write_end - Write end for stuffed files + * @inode: The inode + * @dibh: The buffer_head containing the on-disk inode + * @pos: The file position + * @len: The length of the write + * @copied: How much was actually copied by the VFS + * @page: The page + * + * This copies the data from the page into the inode block after + * the inode data structure itself. + * + * Returns: errno + */ +static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh, + loff_t pos, unsigned len, unsigned copied, + struct page *page) +{ + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_sbd *sdp = GFS2_SB(inode); + u64 to = pos + copied; +
[patch 37/41] ufs convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/ufs/dir.c | 50 +++--- fs/ufs/inode.c | 23 +++ 2 files changed, 50 insertions(+), 23 deletions(-) Index: linux-2.6/fs/ufs/inode.c === --- linux-2.6.orig/fs/ufs/inode.c +++ linux-2.6/fs/ufs/inode.c @@ -558,24 +558,39 @@ static int ufs_writepage(struct page *pa { return block_write_full_page(page,ufs_getfrag_block,wbc); } + static int ufs_readpage(struct file *file, struct page *page) { return block_read_full_page(page,ufs_getfrag_block); } -static int ufs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) + +int __ufs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return block_prepare_write(page,from,to,ufs_getfrag_block); + return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ufs_getfrag_block); } + +static int ufs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return __ufs_write_begin(file, mapping, pos, len, flags, pagep, fsdata); +} + static sector_t ufs_bmap(struct address_space *mapping, sector_t block) { return generic_block_bmap(mapping,block,ufs_getfrag_block); } + const struct address_space_operations ufs_aops = { .readpage = ufs_readpage, .writepage = ufs_writepage, .sync_page = block_sync_page, - .prepare_write = ufs_prepare_write, - .commit_write = generic_commit_write, + .write_begin = ufs_write_begin, + .write_end = generic_write_end, .bmap = ufs_bmap }; Index: linux-2.6/fs/ufs/dir.c === --- linux-2.6.orig/fs/ufs/dir.c +++ linux-2.6/fs/ufs/dir.c @@ -38,12 +38,14 @@ static inline int ufs_match(struct super return !memcmp(name, de-d_name, len); } -static int ufs_commit_chunk(struct page *page, unsigned from, unsigned to) +static int ufs_commit_chunk(struct page *page, loff_t pos, unsigned len) { - struct inode *dir = page-mapping-host; + struct address_space *mapping = page-mapping; + struct inode *dir = mapping-host; int err = 0; + dir-i_version++; - page-mapping-a_ops-commit_write(NULL, page, from, to); + block_write_end(NULL, mapping, pos, len, len, page, NULL); if (IS_DIRSYNC(dir)) err = write_one_page(page, 1); else @@ -81,16 +83,20 @@ ino_t ufs_inode_by_name(struct inode *di void ufs_set_link(struct inode *dir, struct ufs_dir_entry *de, struct page *page, struct inode *inode) { - unsigned from = (char *) de - (char *) page_address(page); - unsigned to = from + fs16_to_cpu(dir-i_sb, de-d_reclen); + loff_t pos = (page-index PAGE_CACHE_SHIFT) + + (char *) de - (char *) page_address(page); + unsigned len = fs16_to_cpu(dir-i_sb, de-d_reclen); int err; lock_page(page); - err = page-mapping-a_ops-prepare_write(NULL, page, from, to); + err = __ufs_write_begin(NULL, page-mapping, pos, len, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); BUG_ON(err); + de-d_ino = cpu_to_fs32(dir-i_sb, inode-i_ino); ufs_set_de_type(dir-i_sb, de, inode-i_mode); - err = ufs_commit_chunk(page, from, to); + + err = ufs_commit_chunk(page, pos, len); ufs_put_page(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME_SEC; mark_inode_dirty(dir); @@ -312,7 +318,7 @@ int ufs_add_link(struct dentry *dentry, unsigned long npages = ufs_dir_pages(dir); unsigned long n; char *kaddr; - unsigned from, to; + loff_t pos; int err; UFSD(ENTER, name %s, namelen %u\n, name, namelen); @@ -367,9 +373,10 @@ int ufs_add_link(struct dentry *dentry, return -EINVAL; got_it: - from = (char*)de - (char*)page_address(page); - to = from + rec_len; - err = page-mapping-a_ops-prepare_write(NULL, page, from, to); + pos = (page-index PAGE_CACHE_SHIFT) + + (char*)de - (char*)page_address(page); + err = __ufs_write_begin(NULL, page-mapping, pos, rec_len, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); if (err) goto out_unlock; if (de-d_ino) { @@ -386,7 +393,7 @@ got_it: de-d_ino = cpu_to_fs32(sb, inode-i_ino); ufs_set_de_type(sb, de, inode-i_mode); - err = ufs_commit_chunk(page, from,
[patch 40/41] minix convert to new aops.
Cc: Andries Brouwer [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/minix/dir.c | 43 +-- fs/minix/inode.c | 23 +++ 2 files changed, 44 insertions(+), 22 deletions(-) Index: linux-2.6/fs/minix/inode.c === --- linux-2.6.orig/fs/minix/inode.c +++ linux-2.6/fs/minix/inode.c @@ -347,24 +347,39 @@ static int minix_writepage(struct page * { return block_write_full_page(page, minix_get_block, wbc); } + static int minix_readpage(struct file *file, struct page *page) { return block_read_full_page(page,minix_get_block); } -static int minix_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) + +int __minix_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return block_prepare_write(page,from,to,minix_get_block); + return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + minix_get_block); } + +static int minix_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return __minix_write_begin(file, mapping, pos, len, flags, pagep, fsdata); +} + static sector_t minix_bmap(struct address_space *mapping, sector_t block) { return generic_block_bmap(mapping,block,minix_get_block); } + static const struct address_space_operations minix_aops = { .readpage = minix_readpage, .writepage = minix_writepage, .sync_page = block_sync_page, - .prepare_write = minix_prepare_write, - .commit_write = generic_commit_write, + .write_begin = minix_write_begin, + .write_end = generic_write_end, .bmap = minix_bmap }; Index: linux-2.6/fs/minix/dir.c === --- linux-2.6.orig/fs/minix/dir.c +++ linux-2.6/fs/minix/dir.c @@ -9,6 +9,7 @@ */ #include minix.h +#include linux/buffer_head.h #include linux/highmem.h #include linux/smp_lock.h @@ -48,11 +49,12 @@ static inline unsigned long dir_pages(st return (inode-i_size+PAGE_CACHE_SIZE-1)PAGE_CACHE_SHIFT; } -static int dir_commit_chunk(struct page *page, unsigned from, unsigned to) +static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len) { - struct inode *dir = (struct inode *)page-mapping-host; + struct address_space *mapping = page-mapping; + struct inode *dir = mapping-host; int err = 0; - page-mapping-a_ops-commit_write(NULL, page, from, to); + block_write_end(NULL, mapping, pos, len, len, page, NULL); if (IS_DIRSYNC(dir)) err = write_one_page(page, 1); else @@ -220,7 +222,7 @@ int minix_add_link(struct dentry *dentry char *kaddr, *p; minix_dirent *de; minix3_dirent *de3; - unsigned from, to; + loff_t pos; int err; char *namx = NULL; __u32 inumber; @@ -272,9 +274,9 @@ int minix_add_link(struct dentry *dentry return -EINVAL; got_it: - from = p - (char*)page_address(page); - to = from + sbi-s_dirsize; - err = page-mapping-a_ops-prepare_write(NULL, page, from, to); + pos = (page-index PAGE_CACHE_SHIFT) + p - (char*)page_address(page); + err = __minix_write_begin(NULL, page-mapping, pos, sbi-s_dirsize, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); if (err) goto out_unlock; memcpy (namx, name, namelen); @@ -285,7 +287,7 @@ got_it: memset (namx + namelen, 0, sbi-s_dirsize - namelen - 2); de-inode = inode-i_ino; } - err = dir_commit_chunk(page, from, to); + err = dir_commit_chunk(page, pos, sbi-s_dirsize); dir-i_mtime = dir-i_ctime = CURRENT_TIME_SEC; mark_inode_dirty(dir); out_put: @@ -302,15 +304,16 @@ int minix_delete_entry(struct minix_dir_ struct address_space *mapping = page-mapping; struct inode *inode = (struct inode*)mapping-host; char *kaddr = page_address(page); - unsigned from = (char*)de - kaddr; - unsigned to = from + minix_sb(inode-i_sb)-s_dirsize; + loff_t pos = (page-index PAGE_CACHE_SHIFT) + (char*)de - kaddr; + unsigned len = minix_sb(inode-i_sb)-s_dirsize; int err; lock_page(page); - err = mapping-a_ops-prepare_write(NULL, page, from, to); + err = __minix_write_begin(NULL, mapping, pos, len, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); if (err == 0) { de-inode = 0; -
[patch 28/41] qnx4 convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/qnx4/inode.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) Index: linux-2.6/fs/qnx4/inode.c === --- linux-2.6.orig/fs/qnx4/inode.c +++ linux-2.6/fs/qnx4/inode.c @@ -433,16 +433,21 @@ static int qnx4_writepage(struct page *p { return block_write_full_page(page,qnx4_get_block, wbc); } + static int qnx4_readpage(struct file *file, struct page *page) { return block_read_full_page(page,qnx4_get_block); } -static int qnx4_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) -{ - struct qnx4_inode_info *qnx4_inode = qnx4_i(page-mapping-host); - return cont_prepare_write(page, from, to, qnx4_get_block, - qnx4_inode-mmu_private); + +static int qnx4_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + struct qnx4_inode_info *qnx4_inode = qnx4_i(mapping-host); + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + qnx4_get_block, + qnx4_inode-mmu_private); } static sector_t qnx4_bmap(struct address_space *mapping, sector_t block) { @@ -452,8 +457,8 @@ static const struct address_space_operat .readpage = qnx4_readpage, .writepage = qnx4_writepage, .sync_page = block_sync_page, - .prepare_write = qnx4_prepare_write, - .commit_write = generic_commit_write, + .write_begin= qnx4_write_begin, + .write_end = generic_write_end, .bmap = qnx4_bmap }; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 35/41] hostfs convert to new aops.
This also gets rid of a lot of useless read_file stuff. And also optimises the full page write case by marking a !uptodate page uptodate. Cc: Jeff Dike [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/hostfs/hostfs_kern.c | 70 +++- 1 file changed, 28 insertions(+), 42 deletions(-) Index: linux-2.6/fs/hostfs/hostfs_kern.c === --- linux-2.6.orig/fs/hostfs/hostfs_kern.c +++ linux-2.6/fs/hostfs/hostfs_kern.c @@ -466,56 +466,42 @@ int hostfs_readpage(struct file *file, s return err; } -int hostfs_prepare_write(struct file *file, struct page *page, -unsigned int from, unsigned int to) +int hostfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - char *buffer; - long long start, tmp; - int err; + pgoff_t index = pos PAGE_CACHE_SHIFT; - start = (long long) page-index PAGE_CACHE_SHIFT; - buffer = kmap(page); - if(from != 0){ - tmp = start; - err = read_file(FILE_HOSTFS_I(file)-fd, tmp, buffer, - from); - if(err 0) goto out; - } - if(to != PAGE_CACHE_SIZE){ - start += to; - err = read_file(FILE_HOSTFS_I(file)-fd, start, buffer + to, - PAGE_CACHE_SIZE - to); - if(err 0) goto out; - } - err = 0; - out: - kunmap(page); - return err; + *pagep = __grab_cache_page(mapping, index); + if (!*pagep) + return -ENOMEM; + return 0; } -int hostfs_commit_write(struct file *file, struct page *page, unsigned from, -unsigned to) +int hostfs_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { - struct address_space *mapping = page-mapping; struct inode *inode = mapping-host; - char *buffer; - long long start; - int err = 0; + void *buffer; + unsigned from = pos (PAGE_CACHE_SIZE - 1); + int err; - start = (((long long) page-index) PAGE_CACHE_SHIFT) + from; buffer = kmap(page); - err = write_file(FILE_HOSTFS_I(file)-fd, start, buffer + from, -to - from); - if(err 0) err = 0; - - /* Actually, if !err, write_file has added to-from to start, so, despite -* the appearance, we are comparing i_size against the _last_ written -* location, as we should. */ + err = write_file(FILE_HOSTFS_I(file)-fd, pos, buffer + from, copied); + kunmap(page); - if(!err (start inode-i_size)) - inode-i_size = start; + if (!PageUptodate(page) err == PAGE_CACHE_SIZE) + SetPageUptodate(page); + unlock_page(page); + page_cache_release(page); + + /* If err 0, write_file has added err to pos, so we are comparing +* i_size against the last byte written. +*/ + if (err 0 (pos inode-i_size)) + inode-i_size = pos; - kunmap(page); return err; } @@ -523,8 +509,8 @@ static const struct address_space_operat .writepage = hostfs_writepage, .readpage = hostfs_readpage, .set_page_dirty = __set_page_dirty_nobuffers, - .prepare_write = hostfs_prepare_write, - .commit_write = hostfs_commit_write + .write_begin= hostfs_write_begin, + .write_end = hostfs_write_end, }; static int init_inode(struct inode *inode, struct dentry *dentry) -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 22/41] fat convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/fat/inode.c | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) Index: linux-2.6/fs/fat/inode.c === --- linux-2.6.orig/fs/fat/inode.c +++ linux-2.6/fs/fat/inode.c @@ -140,19 +140,24 @@ static int fat_readpages(struct file *fi return mpage_readpages(mapping, pages, nr_pages, fat_get_block); } -static int fat_prepare_write(struct file *file, struct page *page, -unsigned from, unsigned to) +static int fat_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return cont_prepare_write(page, from, to, fat_get_block, - MSDOS_I(page-mapping-host)-mmu_private); + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + fat_get_block, + MSDOS_I(mapping-host)-mmu_private); } -static int fat_commit_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int fat_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *pagep, void *fsdata) { - struct inode *inode = page-mapping-host; - int err = generic_commit_write(file, page, from, to); - if (!err !(MSDOS_I(inode)-i_attrs ATTR_ARCH)) { + struct inode *inode = mapping-host; + int err; + err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata); + if (!(err 0) !(MSDOS_I(inode)-i_attrs ATTR_ARCH)) { inode-i_mtime = inode-i_ctime = CURRENT_TIME_SEC; MSDOS_I(inode)-i_attrs |= ATTR_ARCH; mark_inode_dirty(inode); @@ -201,8 +206,8 @@ static const struct address_space_operat .writepage = fat_writepage, .writepages = fat_writepages, .sync_page = block_sync_page, - .prepare_write = fat_prepare_write, - .commit_write = fat_commit_write, + .write_begin= fat_write_begin, + .write_end = fat_write_end, .direct_IO = fat_direct_IO, .bmap = _fat_bmap }; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 21/41] fs: new cont helpers
Rework the generic block cont routines to handle the new aops. Supporting cont_prepare_write would take quite a lot of code to support, so remove it instead (and we later convert all filesystems to use it). write_begin gets passed AOP_FLAG_CONT_EXPAND when called from generic_cont_expand, so filesystems can avoid the old hacks they used. Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/buffer.c | 204 +--- include/linux/buffer_head.h |5 - include/linux/fs.h |1 mm/filemap.c|5 + 4 files changed, 110 insertions(+), 105 deletions(-) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -2133,14 +2133,14 @@ int block_read_full_page(struct page *pa } /* utility function for filesystems that need to do work on expanding - * truncates. Uses prepare/commit_write to allow the filesystem to + * truncates. Uses filesystem pagecache writes to allow the filesystem to * deal with the hole. */ -static int __generic_cont_expand(struct inode *inode, loff_t size, -pgoff_t index, unsigned int offset) +int generic_cont_expand_simple(struct inode *inode, loff_t size) { struct address_space *mapping = inode-i_mapping; struct page *page; + void *fsdata; unsigned long limit; int err; @@ -2153,140 +2153,134 @@ static int __generic_cont_expand(struct if (size inode-i_sb-s_maxbytes) goto out; - err = -ENOMEM; - page = grab_cache_page(mapping, index); - if (!page) - goto out; - err = mapping-a_ops-prepare_write(NULL, page, offset, offset); - if (err) { - /* -* -prepare_write() may have instantiated a few blocks -* outside i_size. Trim these off again. -*/ - unlock_page(page); - page_cache_release(page); - vmtruncate(inode, inode-i_size); + err = pagecache_write_begin(NULL, mapping, size, 0, + AOP_FLAG_UNINTERRUPTIBLE|AOP_FLAG_CONT_EXPAND, + page, fsdata); + if (err) goto out; - } - err = mapping-a_ops-commit_write(NULL, page, offset, offset); + err = pagecache_write_end(NULL, mapping, size, 0, 0, page, fsdata); + BUG_ON(err 0); - unlock_page(page); - page_cache_release(page); - if (err 0) - err = 0; out: return err; } int generic_cont_expand(struct inode *inode, loff_t size) { - pgoff_t index; unsigned int offset; offset = (size (PAGE_CACHE_SIZE - 1)); /* Within page */ /* ugh. in prepare/commit_write, if from==to==start of block, we - ** skip the prepare. make sure we never send an offset for the start - ** of a block - */ +* skip the prepare. make sure we never send an offset for the start +* of a block. +* XXX: actually, this should be handled in those filesystems by +* checking for the AOP_FLAG_CONT_EXPAND flag. +*/ if ((offset (inode-i_sb-s_blocksize - 1)) == 0) { /* caller must handle this extra byte. */ - offset++; + size++; } - index = size PAGE_CACHE_SHIFT; - - return __generic_cont_expand(inode, size, index, offset); -} - -int generic_cont_expand_simple(struct inode *inode, loff_t size) -{ - loff_t pos = size - 1; - pgoff_t index = pos PAGE_CACHE_SHIFT; - unsigned int offset = (pos (PAGE_CACHE_SIZE - 1)) + 1; - - /* prepare/commit_write can handle even if from==to==start of block. */ - return __generic_cont_expand(inode, size, index, offset); + return generic_cont_expand_simple(inode, size); } -/* - * For moronic filesystems that do not allow holes in file. - * We may have to extend the file. - */ - -int cont_prepare_write(struct page *page, unsigned offset, - unsigned to, get_block_t *get_block, loff_t *bytes) +int cont_expand_zero(struct file *file, struct address_space *mapping, + loff_t pos, loff_t *bytes) { - struct address_space *mapping = page-mapping; struct inode *inode = mapping-host; - struct page *new_page; - pgoff_t pgpos; - long status; - unsigned zerofrom; unsigned blocksize = 1 inode-i_blkbits; + struct page *page; + void *fsdata; + pgoff_t index, curidx; + loff_t curpos; + unsigned zerofrom, offset, len; + int err = 0; - while(page-index (pgpos = *bytesPAGE_CACHE_SHIFT)) { - status = -ENOMEM; - new_page = grab_cache_page(mapping, pgpos); -
[patch 25/41] hfsplus convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/hfsplus/extents.c | 21 + fs/hfsplus/inode.c | 20 2 files changed, 21 insertions(+), 20 deletions(-) Index: linux-2.6/fs/hfsplus/inode.c === --- linux-2.6.orig/fs/hfsplus/inode.c +++ linux-2.6/fs/hfsplus/inode.c @@ -26,10 +26,14 @@ static int hfsplus_writepage(struct page return block_write_full_page(page, hfsplus_get_block, wbc); } -static int hfsplus_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) -{ - return cont_prepare_write(page, from, to, hfsplus_get_block, - HFSPLUS_I(page-mapping-host).phys_size); +static int hfsplus_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + hfsplus_get_block, + HFSPLUS_I(mapping-host).phys_size); } static sector_t hfsplus_bmap(struct address_space *mapping, sector_t block) @@ -113,8 +117,8 @@ const struct address_space_operations hf .readpage = hfsplus_readpage, .writepage = hfsplus_writepage, .sync_page = block_sync_page, - .prepare_write = hfsplus_prepare_write, - .commit_write = generic_commit_write, + .write_begin= hfsplus_write_begin, + .write_end = generic_write_end, .bmap = hfsplus_bmap, .releasepage= hfsplus_releasepage, }; @@ -123,8 +127,8 @@ const struct address_space_operations hf .readpage = hfsplus_readpage, .writepage = hfsplus_writepage, .sync_page = block_sync_page, - .prepare_write = hfsplus_prepare_write, - .commit_write = generic_commit_write, + .write_begin= hfsplus_write_begin, + .write_end = generic_write_end, .bmap = hfsplus_bmap, .direct_IO = hfsplus_direct_IO, .writepages = hfsplus_writepages, Index: linux-2.6/fs/hfsplus/extents.c === --- linux-2.6.orig/fs/hfsplus/extents.c +++ linux-2.6/fs/hfsplus/extents.c @@ -443,21 +443,18 @@ void hfsplus_file_truncate(struct inode if (inode-i_size HFSPLUS_I(inode).phys_size) { struct address_space *mapping = inode-i_mapping; struct page *page; - u32 size = inode-i_size - 1; + void *fsdata; + u32 size = inode-i_size; int res; - page = grab_cache_page(mapping, size PAGE_CACHE_SHIFT); - if (!page) - return; - size = PAGE_CACHE_SIZE - 1; - size++; - res = mapping-a_ops-prepare_write(NULL, page, size, size); - if (!res) - res = mapping-a_ops-commit_write(NULL, page, size, size); + res = pagecache_write_begin(NULL, mapping, size, 0, + AOP_FLAG_UNINTERRUPTIBLE, + page, fsdata); if (res) - inode-i_size = HFSPLUS_I(inode).phys_size; - unlock_page(page); - page_cache_release(page); + return; + res = pagecache_write_end(NULL, mapping, size, 0, 0, page, fsdata); + if (res 0) + return; mark_inode_dirty(inode); return; } else if (inode-i_size == HFSPLUS_I(inode).phys_size) -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 24/41] hfs convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/hfs/extent.c | 19 --- fs/hfs/inode.c | 20 2 files changed, 20 insertions(+), 19 deletions(-) Index: linux-2.6/fs/hfs/inode.c === --- linux-2.6.orig/fs/hfs/inode.c +++ linux-2.6/fs/hfs/inode.c @@ -34,10 +34,14 @@ static int hfs_readpage(struct file *fil return block_read_full_page(page, hfs_get_block); } -static int hfs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) -{ - return cont_prepare_write(page, from, to, hfs_get_block, - HFS_I(page-mapping-host)-phys_size); +static int hfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + hfs_get_block, + HFS_I(mapping-host)-phys_size); } static sector_t hfs_bmap(struct address_space *mapping, sector_t block) @@ -118,8 +122,8 @@ const struct address_space_operations hf .readpage = hfs_readpage, .writepage = hfs_writepage, .sync_page = block_sync_page, - .prepare_write = hfs_prepare_write, - .commit_write = generic_commit_write, + .write_begin= hfs_write_begin, + .write_end = generic_write_end, .bmap = hfs_bmap, .releasepage= hfs_releasepage, }; @@ -128,8 +132,8 @@ const struct address_space_operations hf .readpage = hfs_readpage, .writepage = hfs_writepage, .sync_page = block_sync_page, - .prepare_write = hfs_prepare_write, - .commit_write = generic_commit_write, + .write_begin= hfs_write_begin, + .write_end = generic_write_end, .bmap = hfs_bmap, .direct_IO = hfs_direct_IO, .writepages = hfs_writepages, Index: linux-2.6/fs/hfs/extent.c === --- linux-2.6.orig/fs/hfs/extent.c +++ linux-2.6/fs/hfs/extent.c @@ -464,23 +464,20 @@ void hfs_file_truncate(struct inode *ino (long long)HFS_I(inode)-phys_size, inode-i_size); if (inode-i_size HFS_I(inode)-phys_size) { struct address_space *mapping = inode-i_mapping; + void *fsdata; struct page *page; int res; + /* XXX: Can use generic_cont_expand? */ size = inode-i_size - 1; - page = grab_cache_page(mapping, size PAGE_CACHE_SHIFT); - if (!page) - return; - size = PAGE_CACHE_SIZE - 1; - size++; - res = mapping-a_ops-prepare_write(NULL, page, size, size); - if (!res) - res = mapping-a_ops-commit_write(NULL, page, size, size); + res = pagecache_write_begin(NULL, mapping, size+1, 0, + AOP_FLAG_UNINTERRUPTIBLE, page, fsdata); + if (!res) { + res = pagecache_write_end(NULL, mapping, size+1, 0, 0, + page, fsdata); + } if (res) inode-i_size = HFS_I(inode)-phys_size; - unlock_page(page); - page_cache_release(page); - mark_inode_dirty(inode); return; } else if (inode-i_size == HFS_I(inode)-phys_size) return; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 39/41] sysv convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/sysv/dir.c | 45 + fs/sysv/itree.c | 23 +++ 2 files changed, 44 insertions(+), 24 deletions(-) Index: linux-2.6/fs/sysv/itree.c === --- linux-2.6.orig/fs/sysv/itree.c +++ linux-2.6/fs/sysv/itree.c @@ -453,23 +453,38 @@ static int sysv_writepage(struct page *p { return block_write_full_page(page,get_block,wbc); } + static int sysv_readpage(struct file *file, struct page *page) { return block_read_full_page(page,get_block); } -static int sysv_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to) + +int __sysv_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return block_prepare_write(page,from,to,get_block); + return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + get_block); } + +static int sysv_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ + *pagep = NULL; + return __sysv_write_begin(file, mapping, pos, len, flags, pagep, fsdata); +} + static sector_t sysv_bmap(struct address_space *mapping, sector_t block) { return generic_block_bmap(mapping,block,get_block); } + const struct address_space_operations sysv_aops = { .readpage = sysv_readpage, .writepage = sysv_writepage, .sync_page = block_sync_page, - .prepare_write = sysv_prepare_write, - .commit_write = generic_commit_write, + .write_begin = sysv_write_begin, + .write_end = generic_write_end, .bmap = sysv_bmap }; Index: linux-2.6/fs/sysv/dir.c === --- linux-2.6.orig/fs/sysv/dir.c +++ linux-2.6/fs/sysv/dir.c @@ -37,12 +37,13 @@ static inline unsigned long dir_pages(st return (inode-i_size+PAGE_CACHE_SIZE-1)PAGE_CACHE_SHIFT; } -static int dir_commit_chunk(struct page *page, unsigned from, unsigned to) +static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len) { - struct inode *dir = (struct inode *)page-mapping-host; + struct address_space *mapping = page-mapping; + struct inode *dir = mapping-host; int err = 0; - page-mapping-a_ops-commit_write(NULL, page, from, to); + block_write_end(NULL, mapping, pos, len, len, page, NULL); if (IS_DIRSYNC(dir)) err = write_one_page(page, 1); else @@ -186,7 +187,7 @@ int sysv_add_link(struct dentry *dentry, unsigned long npages = dir_pages(dir); unsigned long n; char *kaddr; - unsigned from, to; + loff_t pos; int err; /* We take care of directory expansion in the same loop */ @@ -212,16 +213,17 @@ int sysv_add_link(struct dentry *dentry, return -EINVAL; got_it: - from = (char*)de - (char*)page_address(page); - to = from + SYSV_DIRSIZE; + pos = (page-index PAGE_CACHE_SHIFT) + + (char*)de - (char*)page_address(page); lock_page(page); - err = page-mapping-a_ops-prepare_write(NULL, page, from, to); + err = __sysv_write_begin(NULL, page-mapping, pos, SYSV_DIRSIZE, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); if (err) goto out_unlock; memcpy (de-name, name, namelen); memset (de-name + namelen, 0, SYSV_DIRSIZE - namelen - 2); de-inode = cpu_to_fs16(SYSV_SB(inode-i_sb), inode-i_ino); - err = dir_commit_chunk(page, from, to); + err = dir_commit_chunk(page, pos, SYSV_DIRSIZE); dir-i_mtime = dir-i_ctime = CURRENT_TIME_SEC; mark_inode_dirty(dir); out_page: @@ -238,15 +240,15 @@ int sysv_delete_entry(struct sysv_dir_en struct address_space *mapping = page-mapping; struct inode *inode = (struct inode*)mapping-host; char *kaddr = (char*)page_address(page); - unsigned from = (char*)de - kaddr; - unsigned to = from + SYSV_DIRSIZE; + loff_t pos = (page-index PAGE_CACHE_SHIFT) + (char *)de - kaddr; int err; lock_page(page); - err = mapping-a_ops-prepare_write(NULL, page, from, to); + err = __sysv_write_begin(NULL, mapping, pos, SYSV_DIRSIZE, + AOP_FLAG_UNINTERRUPTIBLE, page, NULL); BUG_ON(err); de-inode = 0; - err = dir_commit_chunk(page, from, to); + err = dir_commit_chunk(page, pos, SYSV_DIRSIZE); dir_put_page(page); inode-i_ctime = inode-i_mtime = CURRENT_TIME_SEC;
[patch 23/41] adfs convert to new aops.
Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/adfs/inode.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) Index: linux-2.6/fs/adfs/inode.c === --- linux-2.6.orig/fs/adfs/inode.c +++ linux-2.6/fs/adfs/inode.c @@ -61,10 +61,14 @@ static int adfs_readpage(struct file *fi return block_read_full_page(page, adfs_get_block); } -static int adfs_prepare_write(struct file *file, struct page *page, unsigned int from, unsigned int to) +static int adfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - return cont_prepare_write(page, from, to, adfs_get_block, - ADFS_I(page-mapping-host)-mmu_private); + *pagep = NULL; + return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + adfs_get_block, + ADFS_I(mapping-host)-mmu_private); } static sector_t _adfs_bmap(struct address_space *mapping, sector_t block) @@ -76,8 +80,8 @@ static const struct address_space_operat .readpage = adfs_readpage, .writepage = adfs_writepage, .sync_page = block_sync_page, - .prepare_write = adfs_prepare_write, - .commit_write = generic_commit_write, + .write_begin= adfs_write_begin, + .write_end = generic_write_end, .bmap = _adfs_bmap }; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 36/41] jffs2 convert to new aops.
Cc: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Cc: Linux Filesystems linux-fsdevel@vger.kernel.org Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/jffs2/file.c | 105 +++- 1 file changed, 66 insertions(+), 39 deletions(-) Index: linux-2.6/fs/jffs2/file.c === --- linux-2.6.orig/fs/jffs2/file.c +++ linux-2.6/fs/jffs2/file.c @@ -19,10 +19,12 @@ #include linux/jffs2.h #include nodelist.h -static int jffs2_commit_write (struct file *filp, struct page *pg, - unsigned start, unsigned end); -static int jffs2_prepare_write (struct file *filp, struct page *pg, - unsigned start, unsigned end); +static int jffs2_write_end(struct file *filp, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *pg, void *fsdata); +static int jffs2_write_begin(struct file *filp, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata); static int jffs2_readpage (struct file *filp, struct page *pg); int jffs2_fsync(struct file *filp, struct dentry *dentry, int datasync) @@ -65,8 +67,8 @@ const struct inode_operations jffs2_file const struct address_space_operations jffs2_file_address_operations = { .readpage = jffs2_readpage, - .prepare_write =jffs2_prepare_write, - .commit_write = jffs2_commit_write + .write_begin = jffs2_write_begin, + .write_end =jffs2_write_end, }; static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg) @@ -119,15 +121,23 @@ static int jffs2_readpage (struct file * return ret; } -static int jffs2_prepare_write (struct file *filp, struct page *pg, - unsigned start, unsigned end) +static int jffs2_write_begin(struct file *filp, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct inode *inode = pg-mapping-host; + struct page *pg; + struct inode *inode = mapping-host; struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode); - uint32_t pageofs = pg-index PAGE_CACHE_SHIFT; + pgoff_t index = pos PAGE_CACHE_SHIFT; + uint32_t pageofs = pos (PAGE_CACHE_SIZE - 1); int ret = 0; - D1(printk(KERN_DEBUG jffs2_prepare_write()\n)); + pg = __grab_cache_page(mapping, index); + if (!pg) + return -ENOMEM; + *pagep = pg; + + D1(printk(KERN_DEBUG jffs2_write_begin()\n)); if (pageofs inode-i_size) { /* Make new hole frag from old EOF to new page */ @@ -142,7 +152,7 @@ static int jffs2_prepare_write (struct f ret = jffs2_reserve_space(c, sizeof(ri), alloc_len, ALLOC_NORMAL, JFFS2_SUMMARY_INODE_SIZE); if (ret) - return ret; + goto out_page; down(f-sem); memset(ri, 0, sizeof(ri)); @@ -172,7 +182,7 @@ static int jffs2_prepare_write (struct f ret = PTR_ERR(fn); jffs2_complete_reservation(c); up(f-sem); - return ret; + goto out_page; } ret = jffs2_add_full_dnode_to_inode(c, f, fn); if (f-metadata) { @@ -181,65 +191,79 @@ static int jffs2_prepare_write (struct f f-metadata = NULL; } if (ret) { - D1(printk(KERN_DEBUG Eep. add_full_dnode_to_inode() failed in prepare_write, returned %d\n, ret)); + D1(printk(KERN_DEBUG Eep. add_full_dnode_to_inode() failed in write_begin, returned %d\n, ret)); jffs2_mark_node_obsolete(c, fn-raw); jffs2_free_full_dnode(fn); jffs2_complete_reservation(c); up(f-sem); - return ret; + goto out_page; } jffs2_complete_reservation(c); inode-i_size = pageofs; up(f-sem); } - /* Read in the page if it wasn't already present, unless it's a whole page */ - if (!PageUptodate(pg) (start || end PAGE_CACHE_SIZE)) { + /* +* Read in the page if it wasn't already present. Cannot optimize away +* the whole page write case until jffs2_write_end can handle the +* case of a short-copy. +*/ + if (!PageUptodate(pg)) { down(f-sem); ret = jffs2_do_readpage_nolock(inode, pg); up(f-sem); + if (ret) +
[RFC][PATCH 1/14] Add union mount documentation
From: Bharata B Rao [EMAIL PROTECTED] Subject: Add union mount documentation. This is an attempt to document some of the implementation details and issues of union mount. Signed-off-by: Bharata B Rao [EMAIL PROTECTED] Signed-off-by: Jan Blunck [EMAIL PROTECTED] --- Documentation/union-mounts.txt | 538 + 1 files changed, 538 insertions(+) --- /dev/null +++ b/Documentation/union-mounts.txt @@ -0,0 +1,538 @@ +VFS BASED UNION MOUNT += + +1. Overview +2. Union stack +3. Lookup +4. Readdir + 4.1 Duplicate elimination + 4.2 Preserving state + 4.3 File offset problem + 4.4 Altered lseek behaviour + 4.5 TODO +5. Copyup +6. Whiteout + 6.1. Creation and deletion + 6.2. Whiteout filetype support + 6.3. Directory renaming +7. Usage +8. State of the code +9. Extracted (old)mail comments + +1. Overview +--- +Union mount allows mounting of two or more filesystems transparently on +a single mount point. The contents(files or directories) of all the +filesystems become visible at the mount point after a union mount. If +there are files of same name in multiple layers, only the topmost files remain +visible in a union mount. However (currently) common named directories are +again union-ed to present a unified view at the subdir level. + +In this approach of unioning filesystems, the layering information of +different components of the union mount are maintained at the VFS layer. +Hence we call this a VFS based union mount. + +2. Union stack +-- +Union stack reflects the stacking of two or more filesystems of the +union mount. The stacking or the layering information is maintained +as part of dentry structures of the mountpoint and mount root. + +The union stack information in the dentry structure looks like this: + +struct dentry { + ... + +#ifdef CONFIG_UNION_MOUNT + struct dentry *d_overlaid; /* overlaid directory */ + struct dentry *d_topmost; /* topmost directory */ + struct union_info *d_union; /* union stack info */ +#endif + ... +}; + +struct union_info { + struct mutex u_mutex; + atomic_t u_count; +}; + +There is one union_info shared by all dentries which are part of +a union and u_count member holds the number of references to the union +stack. When this reaches zero, the union stack ceases to exist and +the union_info is freed. + +Union stack is essentially a singly linked list of dentries of the union +with d_topmost as the head of the list and d_overlaid points +to the next member of the stack. The walking of union stack is guarded by +the u_mutex member. + +dget() references every dentry of the overlaid union stack to make sure +that no dentry of the stack is discarded from memory while others are +still in use. Since walking of union stack is protected by a mutex, +dget() can now sleep. + +dput() also walks the union stack and releases references to all the +dentries that are part of the union. If a dentry's reference count +in a union stack reaches zero, it implies that the dentries above it +in the stack must also be unused and the union stack can be safely +destroyed at this point. + +Since dget() can sleep with union mount, it becomes necessary to +fix many callers of dget() to release and re-acquire any spinlocks +they are holding until they acquire the union lock(mutex). + +3. Lookup +- +With union mount, it becomes necessary to lookup pathnames not only +in the topmost filesystem but also in the underlying filesystems. + +In case of looking up a filename, the lookup routines as a rule return +the match from the topmost layer. However if the file is not found +in the topmost layer, the lookup routines have been modified to +find the file in the underlying filesystems of the union stack. + +When looking up a directory under a union mount point, the lookup +code has been modified to build a union stack (if necessary). + +When looking up a name in a union directory, it is necessary to +guarantee that the returned union stack remains valid. Hence +concurrent lookups are prevented by obtaining the mutex lock during +lookups. + +4. Readdir +-- +The core functionality of union mount, viz., the merged view of +multiple directories is provided by the readdir()/getdents() routines. +This is achieved by reading the contents of every directory of the union +stack and by merging the result. + +4.1 Duplicate elimination + +The directory entries are read starting from the top layer and they +are maintained in a cache. Subsequently when the entries from the bottom layers +of the union stack are read, they are checked for duplicates (in the cache) +before being passed out to the user space. Since there can be mulitple +readdir()/getdents() calls to read a single directory, the cache is made to +persist across these calls. So we need to maintain this cache and the +associated state across readdir calls. + +4.2
[RFC][PATCH 2/14] Add a new mount flag (MNT_UNION) for union mount
From: Jan Blunck [EMAIL PROTECTED] Subject: Add a new mount flag (MNT_UNION) for union mount. Introduce MNT_UNION, MS_UNION and FS_WHT flags. There are the necessary flags for doing mount /dev/hda3 /mnt -o union You need additional patches for util-linux for that to work. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/namespace.c| 14 +- include/linux/fs.h|2 ++ include/linux/mount.h |1 + 3 files changed, 16 insertions(+), 1 deletion(-) --- a/fs/namespace.c +++ b/fs/namespace.c @@ -442,6 +442,7 @@ static int show_vfsmnt(struct seq_file * { MNT_NODIRATIME, ,nodiratime }, { MNT_RELATIME, ,relatime }, { MNT_NOMNT, ,nomnt }, + { MNT_UNION, ,union }, { 0, NULL } }; struct proc_fs_info *fs_infop; @@ -1256,6 +1257,14 @@ int do_add_mount(struct vfsmount *newmnt if (S_ISLNK(newmnt-mnt_root-d_inode-i_mode)) goto unlock; + /* Unions couldn't be writable if the filesystem +* doesn't know about whiteouts */ + err = -ENOTSUPP; + if ((mnt_flags MNT_UNION) + !(newmnt-mnt_sb-s_flags MS_RDONLY) + !(newmnt-mnt_sb-s_type-fs_flags FS_WHT)) + goto unlock; + /* some flags may have been set earlier */ newmnt-mnt_flags |= mnt_flags; if ((err = graft_tree(newmnt, nd))) @@ -1562,9 +1571,12 @@ long do_mount(char *dev_name, char *dir_ mnt_flags |= MNT_RELATIME; if (flags MS_NOMNT) mnt_flags |= MNT_NOMNT; + if (flags MS_UNION) + mnt_flags |= MNT_UNION; flags = ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_NOMNT); + MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_NOMNT | + MS_UNION); /* ... and get the mountpoint */ retval = path_lookup(dir_name, LOOKUP_FOLLOW, nd); --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -97,6 +97,7 @@ extern int dir_notify_enable; #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_SAFE 8 /* Safe to mount by unprivileged users */ +#define FS_WHT 16 #define FS_REVAL_DOT 16384 /* Check the paths ., .. for staleness */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() * during rename() internally. @@ -113,6 +114,7 @@ extern int dir_notify_enable; #define MS_REMOUNT 32 /* Alter flags of a mounted FS */ #define MS_MANDLOCK64 /* Allow mandatory locks on an FS */ #define MS_DIRSYNC 128 /* Directory modifications are synchronous */ +#define MS_UNION 256 /* Union mount */ #define MS_NOATIME 1024/* Do not update access times. */ #define MS_NODIRATIME 2048/* Do not update directory access times */ #define MS_BIND4096 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -36,6 +36,7 @@ struct mnt_namespace; #define MNT_SHARED 0x1000 /* if the vfsmount is a shared mount */ #define MNT_UNBINDABLE 0x2000 /* if the vfsmount is a unbindable mount */ #define MNT_PNODE_MASK 0x3000 /* propogation flag mask */ +#define MNT_UNION 0x4000 /* if the vfsmount is a union mount */ struct vfsmount { struct list_head mnt_hash; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 3/14] Add the whiteout file type
From: Jan Blunck [EMAIL PROTECTED] Subject: Add the whiteout file type A white-out stops the VFS from further lookups of the white-outs name and returns -ENOENT. This is the same behaviour as if the filename isn't found. This can be used in combination with union mounts to virtually delete (white-out) files by creating a file with this file type. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- include/linux/stat.h |2 ++ 1 files changed, 2 insertions(+) --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -10,6 +10,7 @@ #if defined(__KERNEL__) || !defined(__GLIBC__) || (__GLIBC__ 2) #define S_IFMT 0017 +#define S_IFWHT 016 /* whiteout */ #define S_IFSOCK 014 #define S_IFLNK 012 #define S_IFREG 010 @@ -28,6 +29,7 @@ #define S_ISBLK(m) (((m) S_IFMT) == S_IFBLK) #define S_ISFIFO(m)(((m) S_IFMT) == S_IFIFO) #define S_ISSOCK(m)(((m) S_IFMT) == S_IFSOCK) +#define S_ISWHT(m) (((m) S_IFMT) == S_IFWHT) #define S_IRWXU 00700 #define S_IRUSR 00400 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 4/14] Add config options for union mount
From: Jan Blunck [EMAIL PROTECTED] Subject: Add config options for union mount Introduces two new config options for union mount: CONFIG_UNION_MOUNT - Enables union mount CONFIG_UNION_MOUNT_DEBUG - Enables debugging support for union mount. Also adds debugging routines. FIXME: this needs some work. printk'ing isn't the right method for getting good debugging output. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/Kconfig | 16 + include/linux/union_debug.h | 76 2 files changed, 92 insertions(+) --- a/fs/Kconfig +++ b/fs/Kconfig @@ -551,6 +551,22 @@ config INOTIFY_USER If unsure, say Y. +config UNION_MOUNT + bool Union mount support (EXPERIMENTAL) + depends on EXPERIMENTAL + ---help--- + If you say Y here, you will be able to mount file systems as + union mount stacks. This is a VFS based implementation and + should work with all file systems. If unsure, say N. + +config UNION_MOUNT_DEBUG + bool Union mount debugging output + depends on UNION_MOUNT + ---help--- + If you say Y here, the union mount debugging code will be + compiled in. You have activate the appropriate UNION_MOUNT_DEBUG + flags in file:include/linux/union.h, too. + config QUOTA bool Quota support help --- /dev/null +++ b/include/linux/union_debug.h @@ -0,0 +1,76 @@ +/* + * VFS based union mount for Linux + * + * Copyright © 2004-2007 IBM Corporation + * Author(s): Jan Blunck ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + */ +#ifndef __LINUX_UNION_DEBUG_H +#define __LINUX_UNION_DEBUG_H + +#ifdef __KERNEL__ + +#ifdef CONFIG_UNION_MOUNT_DEBUG + +#include linux/sched.h + +#ifndef UNION_MOUNT_DEBUG +#define UNION_MOUNT_DEBUG 0 +#endif /* UNION_MOUNT_DEBUG */ +#ifndef UNION_MOUNT_DEBUG_DCACHE +#define UNION_MOUNT_DEBUG_DCACHE 0 +#endif /* UNION_MOUNT_DEBUG_DCACHE */ +#ifndef UNION_MOUNT_DEBUG_LOCK +#define UNION_MOUNT_DEBUG_LOCK 0 +#endif /* UNION_MOUNT_DEBUG_LOCK */ +#ifndef UNION_MOUNT_DEBUG_READDIR +#define UNION_MOUNT_DEBUG_READDIR 0 +#endif /* UNION_MOUNT_DEBUG_READDIR */ + +/* + * The really excessive debugging output is triggered by + * the user id () which is accessing the union stack + */ +#define UM_DEBUG(fmt, args...) \ +do { \ + if (UNION_MOUNT_DEBUG) \ + printk(KERN_DEBUG %s: fmt, __FUNCTION__, ## args); \ +} while (0) +#define UM_DEBUG_UID(fmt, args...) \ +do { \ + if (UNION_MOUNT_DEBUG (current-uid == ))\ + printk(KERN_DEBUG %s: fmt, __FUNCTION__, ## args); \ +} while (0) +#define UM_DEBUG_DCACHE(fmt, args...) \ +do { \ + if (UNION_MOUNT_DEBUG_DCACHE (current-uid == )) \ + printk(KERN_DEBUG %s: fmt, __FUNCTION__, ## args); \ +} while (0) +#define UM_DEBUG_LOCK(fmt, args...)\ +do { \ + if (UNION_MOUNT_DEBUG_LOCK (current-uid == )) \ + printk(KERN_DEBUG %s: fmt, __FUNCTION__, ## args); \ +} while (0) +#define UM_DEBUG_READDIR(fmt, args...) \ +do { \ + if (UNION_MOUNT_DEBUG_READDIR (current-uid == ))\ + printk(KERN_DEBUG %s: fmt, __FUNCTION__, ## args); \ +} while (0) + +#else /* CONFIG_UNION_MOUNT_DEBUG */ + +#define UM_DEBUG(fmt, args...) do { /* empty */ } while (0) +#define UM_DEBUG_UID(fmt, args...) do { /* empty */ } while (0) +#define UM_DEBUG_DCACHE(fmt, args...) do { /* empty */ } while (0) +#define UM_DEBUG_LOCK(fmt, args...) do { /* empty */ } while (0) +#define UM_DEBUG_READDIR(fmt, args...) do { /* empty */ } while (0) + +#endif /* CONFIG_UNION_MOUNT_DEBUG */ + +#endif /* __KERNEL__ */ +#endif /* __LINUX_UNION_DEBUG_H */ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 5/14] Introduce union stack
From: Jan Blunck [EMAIL PROTECTED] Subject: Introduce union stack. Adds union stack infrastructure to the dentry structure and provides locking routines to walk the union stack. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/Makefile |2 fs/dcache.c |5 fs/union.c | 53 + include/linux/dcache.h |6 + include/linux/dcache_union.h | 248 +++ 5 files changed, 314 insertions(+) --- a/fs/Makefile +++ b/fs/Makefile @@ -49,6 +49,8 @@ obj-$(CONFIG_FS_POSIX_ACL)+= posix_acl. obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_GENERIC_ACL) += generic_acl.o +obj-$(CONFIG_UNION_MOUNT) += union.o + obj-$(CONFIG_QUOTA)+= dquot.o obj-$(CONFIG_QFMT_V1) += quota_v1.o obj-$(CONFIG_QFMT_V2) += quota_v2.o --- a/fs/dcache.c +++ b/fs/dcache.c @@ -936,6 +936,11 @@ struct dentry *d_alloc(struct dentry * p #ifdef CONFIG_PROFILING dentry-d_cookie = NULL; #endif +#ifdef CONFIG_UNION_MOUNT + dentry-d_overlaid = NULL; + dentry-d_topmost = NULL; + dentry-d_union = NULL; +#endif INIT_HLIST_NODE(dentry-d_hash); INIT_LIST_HEAD(dentry-d_lru); INIT_LIST_HEAD(dentry-d_subdirs); --- /dev/null +++ b/fs/union.c @@ -0,0 +1,53 @@ +/* + * VFS based union mount for Linux + * + * Copyright © 2004-2007 IBM Corporation + * Author(s): Jan Blunck ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/fs.h + +struct union_info * union_alloc(void) +{ + struct union_info *info; + + info = kmalloc(sizeof(*info), GFP_ATOMIC); + if (!info) + return NULL; + + mutex_init(info-u_mutex); + mutex_lock(info-u_mutex); + atomic_set(info-u_count, 1); + UM_DEBUG_LOCK(allocate union %p\n, info); + return info; +} + +struct union_info * union_get(struct union_info *info) +{ + BUG_ON(!info); + BUG_ON(!atomic_read(info-u_count)); + atomic_inc(info-u_count); + UM_DEBUG_LOCK(get union %p (count=%d)\n, info, + atomic_read(info-u_count)); + return info; +} + +void union_put(struct union_info *info) +{ + BUG_ON(!info); + UM_DEBUG_LOCK(put union %p (count=%d)\n, info, + atomic_read(info-u_count)); + atomic_dec(info-u_count); + + if (!atomic_read(info-u_count)) { + UM_DEBUG_LOCK(free union %p\n, info); + kfree(info); + } + + return; +} --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -93,6 +93,12 @@ struct dentry { struct dentry *d_parent;/* parent directory */ struct qstr d_name; +#ifdef CONFIG_UNION_MOUNT + struct dentry *d_overlaid; /* overlaid directory */ + struct dentry *d_topmost; /* topmost directory */ + struct union_info *d_union; /* union directory info */ +#endif + struct list_head d_lru; /* LRU list */ /* * d_child and d_rcu can share memory --- /dev/null +++ b/include/linux/dcache_union.h @@ -0,0 +1,248 @@ +/* + * VFS based union mount for Linux + * + * Copyright © 2004-2007 IBM Corporation + * Author(s): Jan Blunck ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + */ +#ifndef __LINUX_DCACHE_UNION_H +#define __LINUX_DCACHE_UNION_H +#ifdef __KERNEL__ + +#include linux/union_debug.h +#include linux/fs_struct.h +#include asm/atomic.h +#include asm/semaphore.h + +#ifdef CONFIG_UNION_MOUNT + +/* + * This is the union info object, that describes general information about this + * union directory + * + * u_mutex protects the union stack against modification. You can reach it + * through the d_union field in struct dentry. Hold it when you are walking + * or modifing the union stack ! + */ +struct union_info { + atomic_t u_count; + struct mutex u_mutex; +}; + +/* allocate/de-allocate */ +extern struct union_info *union_alloc(void); +extern struct union_info *union_get(struct union_info *); +extern void union_put(struct union_info *); + +/* + * These are the functions for locking a dentry's union. When one + * want to acquire a denties union lock, use: + * + * - union_lock() when you can sleep, + * - union_lock_spinlock() when you are holding a spinlock (that + * you CAN savely give up and reacquire again) + * - union_lock_readlock() when you are holding a readlock (that + * you CAN savely give up and
[RFC][PATCH 7/14] Union-mount mounting
From: Jan Blunck [EMAIL PROTECTED] Subject: Union-mount mounting Adds union mount support to mount() and umount() system calls. Sets up the union stack during mount and destroys it during unmount. TODO: bind and move mounts aren't yet supported with union mounts. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/namespace.c| 90 ++ fs/union.c| 71 +++ include/linux/fs.h|3 + include/linux/union.h | 33 ++ 4 files changed, 190 insertions(+), 7 deletions(-) --- a/fs/namespace.c +++ b/fs/namespace.c @@ -169,7 +169,7 @@ void mnt_set_mountpoint(struct vfsmount struct vfsmount *child_mnt) { child_mnt-mnt_parent = mntget(mnt); - child_mnt-mnt_mountpoint = dget(dentry); + child_mnt-mnt_mountpoint = __dget(dentry); dentry-d_mounted++; } @@ -294,6 +294,10 @@ static struct vfsmount *clone_mnt(struct if (!mnt) goto alloc_failed; + /* +* As of now, cloning of union mounted mnt isn't permitted. +*/ + BUG_ON(mnt-mnt_flags MNT_UNION); mnt-mnt_flags = old-mnt_flags; atomic_inc(sb-s_active); mnt-mnt_sb = sb; @@ -579,16 +583,20 @@ void release_mounts(struct list_head *he mnt = list_first_entry(head, struct vfsmount, mnt_hash); list_del_init(mnt-mnt_hash); if (mnt-mnt_parent != mnt) { - struct dentry *dentry; - struct vfsmount *m; + struct path old_nd; spin_lock(vfsmount_lock); - dentry = mnt-mnt_mountpoint; - m = mnt-mnt_parent; + old_nd.dentry = mnt-mnt_mountpoint; + old_nd.mnt = mnt-mnt_parent; mnt-mnt_mountpoint = mnt-mnt_root; mnt-mnt_parent = mnt; + detach_mnt_union(mnt, old_nd); spin_unlock(vfsmount_lock); - dput(dentry); - mntput(m); + if (mnt-mnt_flags MNT_UNION) { + UM_DEBUG(shrink the mountpoint's dcache\n); + shrink_dcache_sb(old_nd.dentry-d_sb); + } + __dput(old_nd.dentry); + mntput(old_nd.mnt); } mntput(mnt); } @@ -621,6 +629,9 @@ static int do_umount(struct vfsmount *mn struct super_block *sb = mnt-mnt_sb; int retval; LIST_HEAD(umount_list); +#ifdef CONFIG_UNION_MOUNT + struct union_info *uinfo = NULL; +#endif retval = security_sb_umount(mnt, flags); if (retval) @@ -685,6 +696,14 @@ static int do_umount(struct vfsmount *mn } down_write(namespace_sem); +#ifdef CONFIG_UNION_MOUNT + /* +* Grab a reference to the union_info which gets detached +* from the dentries in release_mounts(). +*/ + if (mnt-mnt_flags MNT_UNION) + uinfo = union_lock_and_get(mnt-mnt_root); +#endif spin_lock(vfsmount_lock); event++; @@ -699,6 +718,15 @@ static int do_umount(struct vfsmount *mn security_sb_umount_busy(mnt); up_write(namespace_sem); release_mounts(umount_list); +#ifdef CONFIG_UNION_MOUNT + if (uinfo) { + if (atomic_read(uinfo-u_count) == 1) + /* We are the last user of this union_info */ + union_release(uinfo); + else + union_put_and_unlock(uinfo); + } +#endif return retval; } @@ -941,6 +969,9 @@ static int attach_recursive_mnt(struct v set_mnt_shared(p); } + if (source_mnt-mnt_flags MNT_UNION) + union_alloc_dentry(nd-dentry); + spin_lock(vfsmount_lock); if (parent_nd) { detach_mnt(source_mnt, parent_nd); @@ -948,6 +979,7 @@ static int attach_recursive_mnt(struct v touch_mnt_namespace(current-nsproxy-mnt_ns); } else { mnt_set_mountpoint(dest_mnt, dest_dentry, source_mnt); + attach_mnt_union(source_mnt, nd); commit_tree(source_mnt); } @@ -956,6 +988,7 @@ static int attach_recursive_mnt(struct v commit_tree(child); } spin_unlock(vfsmount_lock); + union_unlock(nd-dentry); return 0; } @@ -1003,6 +1036,12 @@ static int do_change_type(struct nameida if (nd-dentry != nd-mnt-mnt_root) return -EINVAL; + /* +* Don't change the type of union mounts +*/ + if (nd-mnt-mnt_flags MNT_UNION) + return -EINVAL; +
[RFC][PATCH 8/14] Union-mount lookup
From: Jan Blunck [EMAIL PROTECTED] Subject: Union-mount lookup Modifies the vfs lookup routines to work with union mounted directories. The existing lookup routines generally lookup for a pathname only in the topmost or given directory. The changed versions of the lookup routines search for the pathname in the entire union mounted stack. Also they have been modified to setup the union stack during lookup from dcache cache and from real_lookup(). Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/dcache.c| 16 + fs/namei.c | 78 +- fs/namespace.c | 35 ++ fs/union.c | 598 + include/linux/dcache.h | 17 + include/linux/namei.h |4 include/linux/union.h | 49 7 files changed, 786 insertions(+), 11 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1286,7 +1286,7 @@ struct dentry * d_lookup(struct dentry * return dentry; } -struct dentry * __d_lookup(struct dentry * parent, struct qstr * name) +struct dentry * __d_lookup_single(struct dentry *parent, struct qstr *name) { unsigned int len = name-len; unsigned int hash = name-hash; @@ -1371,6 +1371,20 @@ out: return dentry; } +struct dentry * d_lookup_single(struct dentry *parent, struct qstr *name) +{ + struct dentry *dentry; + unsigned long seq; + +do { +seq = read_seqbegin(rename_lock); +dentry = __d_lookup_single(parent, name); +if (dentry) + break; + } while (read_seqretry(rename_lock, seq)); + return dentry; +} + /** * d_validate - verify dentry provided from insecure source * @dentry: The dentry alleged to be valid child of @dparent --- a/fs/namei.c +++ b/fs/namei.c @@ -374,6 +374,33 @@ void release_open_intent(struct nameidat } static inline struct dentry * +do_revalidate_single(struct dentry *dentry, struct nameidata *nd) +{ + int status = dentry-d_op-d_revalidate(dentry, nd); + if (unlikely(status = 0)) { + /* +* The dentry failed validation. +* If d_revalidate returned 0 attempt to invalidate +* the dentry otherwise d_revalidate is asking us +* to return a fail status. +*/ + if (!status) { + if (!d_invalidate(dentry)) { + __dput_single(dentry); + dentry = NULL; + } + } else { + __dput_single(dentry); + dentry = ERR_PTR(status); + } + } + return dentry; +} + +/* + * FIXME: We need a union aware revalidate here! + */ +static inline struct dentry * do_revalidate(struct dentry *dentry, struct nameidata *nd) { int status = dentry-d_op-d_revalidate(dentry, nd); @@ -403,16 +430,16 @@ do_revalidate(struct dentry *dentry, str */ static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd) { - struct dentry * dentry = __d_lookup(parent, name); + struct dentry *dentry = __d_lookup_single(parent, name); /* lockess __d_lookup may fail due to concurrent d_move() * in some unrelated directory, so try with d_lookup */ if (!dentry) - dentry = d_lookup(parent, name); + dentry = d_lookup_single(parent, name); if (dentry dentry-d_op dentry-d_op-d_revalidate) - dentry = do_revalidate(dentry, nd); + dentry = do_revalidate_single(dentry, nd); return dentry; } @@ -465,7 +492,7 @@ ok: * make sure that nobody added the entry to the dcache in the meantime.. * SMP-safe */ -static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd) +struct dentry * real_lookup_single(struct dentry *parent, struct qstr *name, struct nameidata *nd) { struct dentry * result; struct inode *dir = parent-d_inode; @@ -485,7 +512,7 @@ static struct dentry * real_lookup(struc * * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup */ - result = d_lookup(parent, name); + result = d_lookup_single(parent, name); if (!result) { struct dentry * dentry = d_alloc(parent, name); result = ERR_PTR(-ENOMEM); @@ -506,7 +533,7 @@ static struct dentry * real_lookup(struc */ mutex_unlock(dir-i_mutex); if (result-d_op result-d_op-d_revalidate) { - result = do_revalidate(result, nd); + result = do_revalidate_single(result, nd); if (!result) result = ERR_PTR(-ENOENT); } @@ -699,7 +726,7 @@ static int __follow_mount(struct path *p return res; } -static
[RFC][PATCH 9/14] Union-mount readdir
From: Bharata B Rao [EMAIL PROTECTED] Subject: Union mount readdir This modifies the readdir()/getdents() routines to read directory entries from toplevel and the lower directories of a union and present a merged view. The directory entries are read starting from the top layer and they are maintained in a cache. Subsequently when the entries from the bottom layers of the union stack are read they are checked for duplicates (in the cache) before being passed out to the user space. There can be multiple calls to readdir/getdents routines for reading the entries of a single directory. And union directory cache is maitained across these calls. Signed-off-by: Bharata B Rao [EMAIL PROTECTED] Signed-off-by: Jan Blunck [EMAIL PROTECTED] --- fs/aio.c |8 fs/file_table.c | 14 - fs/read_write.c |7 fs/readdir.c |2 fs/union.c | 404 +++ include/linux/dcache_union.h | 27 ++ include/linux/union.h| 22 ++ 7 files changed, 475 insertions(+), 9 deletions(-) --- a/fs/aio.c +++ b/fs/aio.c @@ -21,6 +21,7 @@ #include linux/sched.h #include linux/fs.h +#include linux/mount.h #include linux/file.h #include linux/mm.h #include linux/mman.h @@ -486,6 +487,13 @@ static void aio_fput_routine(struct work /* Complete the fput */ __fput(req-ki_filp); + /* +* __fput no longer releases the dentry and vfsmnt, thanks to +* to union mount. Hence do this manually. +*/ + dput(req-ki_filp-f_path.dentry); + mntput(req-ki_filp-f_path.mnt); + /* Link the iocb into the context's free list */ spin_lock_irq(ctx-ctx_lock); really_put_req(ctx, req); --- a/fs/file_table.c +++ b/fs/file_table.c @@ -141,8 +141,14 @@ EXPORT_SYMBOL(get_empty_filp); void fastcall fput(struct file *file) { - if (atomic_dec_and_test(file-f_count)) + struct dentry *dentry = file-f_path.dentry; + struct vfsmount *mnt = file-f_path.mnt; + + if (atomic_dec_and_test(file-f_count)) { __fput(file); + dput(dentry); + mntput(mnt); + } } EXPORT_SYMBOL(fput); @@ -152,9 +158,7 @@ EXPORT_SYMBOL(fput); */ void fastcall __fput(struct file *file) { - struct dentry *dentry = file-f_path.dentry; - struct vfsmount *mnt = file-f_path.mnt; - struct inode *inode = dentry-d_inode; + struct inode *inode = file-f_path.dentry-d_inode; might_sleep(); @@ -180,8 +184,6 @@ void fastcall __fput(struct file *file) file-f_path.dentry = NULL; file-f_path.mnt = NULL; file_free(file); - dput(dentry); - mntput(mnt); } struct file fastcall *fget(unsigned int fd) --- a/fs/read_write.c +++ b/fs/read_write.c @@ -15,6 +15,7 @@ #include linux/module.h #include linux/syscalls.h #include linux/pagemap.h +#include linux/union.h #include read_write.h #include asm/uaccess.h @@ -123,6 +124,12 @@ loff_t vfs_llseek(struct file *file, lof if (file-f_op file-f_op-llseek) fn = file-f_op-llseek; } + +#ifdef CONFIG_UNION_MOUNT + if (S_ISDIR(file-f_path.dentry-d_inode-i_mode) + unlikely(file-f_path.dentry-d_overlaid)) + return union_dir_llseek(file, offset, origin); +#endif return fn(file, offset, origin); } EXPORT_SYMBOL(vfs_llseek); --- a/fs/readdir.c +++ b/fs/readdir.c @@ -33,7 +33,7 @@ int vfs_readdir(struct file *file, filld mutex_lock(inode-i_mutex); res = -ENOENT; if (!IS_DEADDIR(inode)) { - res = file-f_op-readdir(file, buf, filler); + res = do_readdir(file, buf, filler); file_accessed(file); } mutex_unlock(inode-i_mutex); --- a/fs/union.c +++ b/fs/union.c @@ -14,6 +14,7 @@ #include linux/namei.h #include linux/module.h #include linux/mount.h +#include linux/file.h struct union_info * union_alloc(void) { @@ -26,6 +27,8 @@ struct union_info * union_alloc(void) mutex_init(info-u_mutex); mutex_lock(info-u_mutex); atomic_set(info-u_count, 1); + INIT_LIST_HEAD(info-u_rdcache); + info-u_cookie = 0; UM_DEBUG_LOCK(allocate union %p\n, info); return info; } @@ -40,6 +43,7 @@ struct union_info * union_get(struct uni return info; } +static void release_rdstates(struct union_info *info); void union_put(struct union_info *info) { BUG_ON(!info); @@ -49,6 +53,7 @@ void union_put(struct union_info *info) if (!atomic_read(info-u_count)) { UM_DEBUG_LOCK(free union %p\n, info); + release_rdstates(info); kfree(info); } @@ -968,3 +973,402 @@ int follow_union_mount(struct vfsmount * return res; } + +/* + *
[RFC][PATCH 10/14] In-kernel file copy between union mounted filesystems
From: Jan Blunck [EMAIL PROTECTED] Subject: In-kernel file copy between union mounted filesystems This patch introduces in-kernel file copy between union mounted filesystems. When a file is opened for writing but resides on a lower (thus read-only) layer of the union stack it is copied to the topmost union layer first. This patch uses the do_splice_direct() for doing the in-kernel file copy. Signed-off-by: Bharata B Rao [EMAIL PROTECTED] Signed-off-by: Jan Blunck [EMAIL PROTECTED] --- fs/namei.c| 46 + fs/union.c| 415 ++ include/linux/namei.h |2 include/linux/union.h | 14 + 4 files changed, 476 insertions(+), 1 deletion(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -830,8 +830,17 @@ done: path-mnt = mnt; path-dentry = dentry; - if (nd-dentry-d_sb != dentry-d_sb) + /* +* This should be checked after the following of unions. +* Otherwise we might run into trouble creating directories +* on mountpoints. :( +* But maybe we shouldn't set the LAST_LOWLEVEL flag here +* at all ... */ + if (nd-dentry-d_sb != dentry-d_sb) { path-mnt = find_mnt(dentry); + UM_DEBUG_UID(Setting LAST_LOWLEVEL for %s\n, name-name); + nd-um_flags |= LAST_LOWLEVEL; + } __follow_mount(path); follow_union_mount(path-mnt, path-dentry); @@ -950,6 +959,14 @@ static fastcall int __link_path_walk(con if (err) break; + if ((nd-flags LOOKUP_TOPMOST) + (nd-um_flags LAST_LOWLEVEL)) { + err = union_create_topdir(nd,next.dentry,next.mnt); + if (err) + goto out_dput; + nd-um_flags = ~LAST_LOWLEVEL; + } + err = -ENOENT; inode = next.dentry-d_inode; if (!inode) @@ -1005,6 +1022,15 @@ last_component: err = do_lookup(nd, this, next); if (err) break; + + if ((nd-flags LOOKUP_TOPMOST) + (nd-um_flags LAST_LOWLEVEL)) { + err = union_create_topdir(nd,next.dentry,next.mnt); + if (err) + goto out_dput; + nd-um_flags = ~LAST_LOWLEVEL; + } + inode = next.dentry-d_inode; if ((lookup_flags LOOKUP_FOLLOW) inode inode-i_op inode-i_op-follow_link) { @@ -1177,6 +1203,7 @@ static int fastcall do_path_lookup(int d nd-last_type = LAST_ROOT; /* if there are only slashes... */ nd-flags = flags; + nd-um_flags = 0; nd-depth = 0; if (*name=='/') { @@ -1756,9 +1783,18 @@ int open_namei(int dfd, const char *path nd, flag); if (error) return error; + /* test for WRONLY and RDWR - flag's special lower bits */ + if (flag 0x2) { + UM_DEBUG_UID(\%s\ opened for writing\n, pathname); + error = union_copyup(nd, flag); + if (error) + return error; + } goto ok; } + UM_DEBUG_UID(open called with O_CREATE\n); + /* * Create - we need to know the parent. */ @@ -1775,6 +1811,8 @@ int open_namei(int dfd, const char *path if (nd-last_type != LAST_NORM || nd-last.name[nd-last.len]) goto exit; + UM_DEBUG_UID(do_last now\n); + dir = nd-dentry; nd-flags = ~LOOKUP_PARENT; mutex_lock(dir-d_inode-i_mutex); @@ -1828,6 +1866,12 @@ do_last: error = -EISDIR; if (path.dentry-d_inode S_ISDIR(path.dentry-d_inode-i_mode)) goto exit; + + if (flag 0x2) { + error = union_copyup(nd, flag); + if (error) + goto exit; + } ok: error = may_open(nd, acc_mode, flag); if (error) --- a/fs/union.c +++ b/fs/union.c @@ -15,6 +15,11 @@ #include linux/module.h #include linux/mount.h #include linux/file.h +#include linux/mm.h +#include linux/quotaops.h +#include linux/dnotify.h +#include linux/security.h +#include linux/pipe_fs_i.h struct union_info * union_alloc(void) { @@ -305,6 +310,53 @@ void __dput_union(struct dentry *dentry) return; } +/* + * union_relookup_topmost - lookup and create the topmost path to dentry + * @nd: pointer to nameidata + * @flags: lookup flags + */ +int union_relookup_topmost(struct nameidata *nd, int flags) +{ + int err; + char *kbuf, *name; + struct nameidata this; + + UM_DEBUG_UID(relookup the topmost dir for %s\n, +nd-dentry-d_name.name); + +
[RFC][PATCH 11/14] VFS whiteout handling
From: Jan Blunck [EMAIL PROTECTED] Subject: VFS whiteout handling Introduce white-out handling in the VFS. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/inode.c| 17 + fs/namei.c| 476 -- fs/readdir.c | 10 + fs/union.c| 104 ++ include/linux/fs.h|4 include/linux/union.h |6 6 files changed, 605 insertions(+), 12 deletions(-) --- a/fs/inode.c +++ b/fs/inode.c @@ -1421,6 +1421,21 @@ void __init inode_init(unsigned long mem INIT_HLIST_HEAD(inode_hashtable[loop]); } +/* + * Dummy default file-operations: + * Never open a whiteout. This is always a bug. + */ +static int whiteout_no_open(struct inode *irrelevant, struct file *dontcare) +{ + printk(Attemp to open a whiteout!\n); + WARN_ON(1); + return -ENXIO; +} + +static struct file_operations def_wht_fops = { + .open = whiteout_no_open, +}; + void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) { inode-i_mode = mode; @@ -1434,6 +1449,8 @@ void init_special_inode(struct inode *in inode-i_fop = def_fifo_fops; else if (S_ISSOCK(mode)) inode-i_fop = bad_sock_fops; + else if (S_ISWHT(mode)) + inode-i_fop = def_wht_fops; else printk(KERN_DEBUG init_special_inode: bogus i_mode (%o)\n, mode); --- a/fs/namei.c +++ b/fs/namei.c @@ -969,7 +969,7 @@ static fastcall int __link_path_walk(con err = -ENOENT; inode = next.dentry-d_inode; - if (!inode) + if (!inode || S_ISWHT(inode-i_mode)) goto out_dput; err = -ENOTDIR; if (!inode-i_op) @@ -1043,6 +1043,12 @@ last_component: err = -ENOENT; if (!inode) break; + if (S_ISWHT(inode-i_mode)) { + UM_DEBUG_UID(found a whiteout\n); + break; + //if (!(nd-flags LOOKUP_WHT)) + //break; + } if (lookup_flags LOOKUP_DIRECTORY) { err = -ENOTDIR; if (!inode-i_op || !inode-i_op-lookup) @@ -1556,7 +1562,7 @@ static int may_delete(struct inode *dir, static inline int may_create(struct inode *dir, struct dentry *child, struct nameidata *nd) { - if (child-d_inode) + if (child-d_inode !S_ISWHT(child-d_inode-i_mode)) return -EEXIST; if (IS_DEADDIR(dir)) return -ENOENT; @@ -1623,6 +1629,82 @@ void unlock_rename(struct dentry *p1, st } } +/* + * __vfs_unlink_whiteout - Unlink a single whiteout from the system + * @dir: parent directory + * @dentry: the whiteout itself + * + * This is for unlinking a single whiteout. Don't use vfs_unlink() because we + * don't want any notification stuff etc. but basically it is the same stuff. + */ +static int +__vfs_unlink_whiteout(struct inode *dir, struct dentry *dentry) +{ + int error = may_delete(dir, dentry, 0); + + if (error) + return error; + + if (!dir-i_op || !dir-i_op-unlink) + return -EPERM; + + DQUOT_INIT(dir); + + mutex_lock(dentry-d_inode-i_mutex); + if (d_mountpoint(dentry)) + error = -EBUSY; + else { + error = security_inode_unlink(dir, dentry); + if (!error) + error = dir-i_op-unlink(dir, dentry); + } + mutex_unlock(dentry-d_inode-i_mutex); + + /* We don't d_delete() NFS sillyrenamed files--they still exist. */ + if (!error !(dentry-d_flags DCACHE_NFSFS_RENAMED)) { + d_delete(dentry); + //inode_dir_notify(dir, DN_DELETE); + } + return error; +} + +/* + * vfs_unlink_whiteout - unlink and relookup the whiteout + * + * This is what you want to call from vfs_* functions to remove a whiteout. It + * unlinks the whiteout dentry and relookups it afterwards. + */ +static int +vfs_unlink_whiteout(struct inode *dir, struct dentry **dp) +{ + struct dentry *dentry = *dp; + struct dentry *parent = dentry-d_parent; + struct qstr name; + int error; + + BUG_ON(dir != parent-d_inode); + + error = -ENOMEM; + name.name = kmalloc(dentry-d_name.len, GFP_KERNEL); + if (!name.name) + goto out; + strncpy((char *)name.name, dentry-d_name.name, dentry-d_name.len); + name.len = dentry-d_name.len; + name.hash = dentry-d_name.hash; + + error = __vfs_unlink_whiteout(dir, dentry); + if (error) + goto out_freename; + + __dput_single(dentry); + *dp = __lookup_hash_single(name, parent, NULL); +
[RFC][PATCH 12/14] ext2 whiteout support
From: Jan Blunck [EMAIL PROTECTED] Subject: ext2 whiteout support Introduce whiteout support to ext2. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- fs/ext2/dir.c |2 ++ fs/ext2/namei.c | 17 + fs/ext2/super.c | 11 ++- include/linux/ext2_fs.h |4 4 files changed, 33 insertions(+), 1 deletion(-) --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -218,6 +218,7 @@ static unsigned char ext2_filetype_table [EXT2_FT_FIFO] = DT_FIFO, [EXT2_FT_SOCK] = DT_SOCK, [EXT2_FT_SYMLINK] = DT_LNK, + [EXT2_FT_WHT] = DT_WHT, }; #define S_SHIFT 12 @@ -229,6 +230,7 @@ static unsigned char ext2_type_by_mode[S [S_IFIFO S_SHIFT]= EXT2_FT_FIFO, [S_IFSOCK S_SHIFT] = EXT2_FT_SOCK, [S_IFLNK S_SHIFT]= EXT2_FT_SYMLINK, + [S_IFWHT S_SHIFT]= EXT2_FT_WHT, }; static inline void ext2_set_de_type(ext2_dirent *de, struct inode *inode) --- a/fs/ext2/namei.c +++ b/fs/ext2/namei.c @@ -288,6 +288,22 @@ static int ext2_rmdir (struct inode * di return err; } +static int ext2_whiteout(struct inode *dir, struct dentry *dentry) +{ + struct inode *inode; + int err; + + inode = ext2_new_inode (dir, S_IFWHT | S_IRUGO); + err = PTR_ERR(inode); + if (IS_ERR(inode)) + goto out; + + mark_inode_dirty(inode); + err = ext2_add_nondir(dentry, inode); +out: + return err; +} + static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry, struct inode * new_dir, struct dentry * new_dentry ) { @@ -382,6 +398,7 @@ const struct inode_operations ext2_dir_i .mkdir = ext2_mkdir, .rmdir = ext2_rmdir, .mknod = ext2_mknod, + .whiteout = ext2_whiteout, .rename = ext2_rename, #ifdef CONFIG_EXT2_FS_XATTR .setxattr = generic_setxattr, --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -754,6 +754,15 @@ static int ext2_fill_super(struct super_ ext2_xip_verify_sb(sb); /* see if bdev supports xip, unset EXT2_MOUNT_XIP if not */ + if ((sb-s_flags MS_UNION) !(sb-s_flags MS_RDONLY)) { + if (!EXT2_HAS_INCOMPAT_FEATURE(sb, + EXT2_FEATURE_INCOMPAT_WHITEOUT)) { + sb-s_flags |= MS_RDONLY; + ext2_warning(sb, __FUNCTION__, + no whiteout support, mounting filesystem read-only); + } + } + if (le32_to_cpu(es-s_rev_level) == EXT2_GOOD_OLD_REV (EXT2_HAS_COMPAT_FEATURE(sb, ~0U) || EXT2_HAS_RO_COMPAT_FEATURE(sb, ~0U) || @@ -1292,7 +1301,7 @@ static struct file_system_type ext2_fs_t .name = ext2, .get_sb = ext2_get_sb, .kill_sb= kill_block_super, - .fs_flags = FS_REQUIRES_DEV, + .fs_flags = FS_REQUIRES_DEV | FS_WHT, }; static int __init init_ext2_fs(void) --- a/include/linux/ext2_fs.h +++ b/include/linux/ext2_fs.h @@ -61,6 +61,7 @@ #define EXT2_ROOT_INO 2 /* Root inode */ #define EXT2_BOOT_LOADER_INO5 /* Boot loader inode */ #define EXT2_UNDEL_DIR_INO 6 /* Undelete directory inode */ +#define EXT2_WHT_INO7 /* Whiteout inode */ /* First non-reserved inode for old ext2 filesystems */ #define EXT2_GOOD_OLD_FIRST_INO11 @@ -479,10 +480,12 @@ struct ext2_super_block { #define EXT3_FEATURE_INCOMPAT_RECOVER 0x0004 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV 0x0008 #define EXT2_FEATURE_INCOMPAT_META_BG 0x0010 +#define EXT2_FEATURE_INCOMPAT_WHITEOUT 0x0020 #define EXT2_FEATURE_INCOMPAT_ANY 0x #define EXT2_FEATURE_COMPAT_SUPP EXT2_FEATURE_COMPAT_EXT_ATTR #define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE| \ +EXT2_FEATURE_INCOMPAT_WHITEOUT| \ EXT2_FEATURE_INCOMPAT_META_BG) #define EXT2_FEATURE_RO_COMPAT_SUPP(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \ @@ -549,6 +552,7 @@ enum { EXT2_FT_FIFO, EXT2_FT_SOCK, EXT2_FT_SYMLINK, + EXT2_FT_WHT, EXT2_FT_MAX }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 13/14] ext3 whiteout support
From: Bharata B Rao [EMAIL PROTECTED] Subject: ext3 whiteout support Introduce whiteout support for ext3. Signed-off-by: Bharata B Rao [EMAIL PROTECTED] Signed-off-by: Jan Blunck [EMAIL PROTECTED] --- fs/ext3/dir.c |2 - fs/ext3/namei.c | 62 fs/ext3/super.c | 11 +++- include/linux/ext3_fs.h |5 +++ 4 files changed, 72 insertions(+), 8 deletions(-) --- a/fs/ext3/dir.c +++ b/fs/ext3/dir.c @@ -29,7 +29,7 @@ #include linux/rbtree.h static unsigned char ext3_filetype_table[] = { - DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK + DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK, DT_WHT }; static int ext3_readdir(struct file *, void *, filldir_t); --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -1071,6 +1071,7 @@ static unsigned char ext3_type_by_mode[S [S_IFIFO S_SHIFT]= EXT3_FT_FIFO, [S_IFSOCK S_SHIFT] = EXT3_FT_SOCK, [S_IFLNK S_SHIFT]= EXT3_FT_SYMLINK, + [S_IFWHT S_SHIFT]= EXT3_FT_WHT, }; static inline void ext3_set_de_type(struct super_block *sb, @@ -1786,7 +1787,7 @@ out_stop: /* * routine to check that the specified directory is empty (for rmdir) */ -static int empty_dir (struct inode * inode) +static int empty_dir (handle_t *handle, struct inode * inode) { unsigned long offset; struct buffer_head * bh; @@ -1848,8 +1849,28 @@ static int empty_dir (struct inode * ino continue; } if (le32_to_cpu(de-inode)) { - brelse (bh); - return 0; + /* If this is a whiteout, remove it */ + if (de-file_type == EXT3_FT_WHT) { + unsigned long ino = le32_to_cpu(de-inode); + struct inode *tmp_inode = iget(inode-i_sb, ino); + if (!tmp_inode) { + brelse (bh); + return 0; + } + + if (ext3_delete_entry(handle, inode, de, bh)) { + iput(tmp_inode); + brelse (bh); + return 0; + } + + tmp_inode-i_ctime = inode-i_ctime; + tmp_inode-i_nlink--; + iput(tmp_inode); + } else { + brelse (bh); + return 0; + } } offset += le16_to_cpu(de-rec_len); de = (struct ext3_dir_entry_2 *) @@ -2031,7 +2052,7 @@ static int ext3_rmdir (struct inode * di goto end_rmdir; retval = -ENOTEMPTY; - if (!empty_dir (inode)) + if (!empty_dir (handle, inode)) goto end_rmdir; retval = ext3_delete_entry(handle, dir, de, bh); @@ -2060,6 +2081,36 @@ end_rmdir: return retval; } +static int ext3_whiteout(struct inode *dir, struct dentry *dentry) +{ + struct inode * inode; + int err, retries = 0; + handle_t *handle; + +retry: + handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir-i_sb) + + EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 + + 2*EXT3_QUOTA_INIT_BLOCKS(dir-i_sb)); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + if (IS_DIRSYNC(dir)) + handle-h_sync = 1; + + inode = ext3_new_inode (handle, dir, S_IFWHT | S_IRUGO); + err = PTR_ERR(inode); + if (IS_ERR(inode)) + goto out_stop; + + err = ext3_add_nondir(handle, dentry, inode); + +out_stop: + ext3_journal_stop(handle); + if (err == -ENOSPC ext3_should_retry_alloc(dir-i_sb, retries)) + goto retry; + return err; +} + static int ext3_unlink(struct inode * dir, struct dentry *dentry) { int retval; @@ -2261,7 +2312,7 @@ static int ext3_rename (struct inode * o if (S_ISDIR(old_inode-i_mode)) { if (new_inode) { retval = -ENOTEMPTY; - if (!empty_dir (new_inode)) + if (!empty_dir (handle, new_inode)) goto end_rename; } retval = -EIO; @@ -2377,6 +2428,7 @@ const struct inode_operations ext3_dir_i .mkdir = ext3_mkdir, .rmdir = ext3_rmdir, .mknod = ext3_mknod, + .whiteout = ext3_whiteout, .rename = ext3_rename, .setattr= ext3_setattr, #ifdef CONFIG_EXT3_FS_XATTR --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -1492,6
[RFC][PATCH 14/14] tmpfs whiteout support
From: Jan Blunck [EMAIL PROTECTED] Subject: tmpfs whiteout support Introduce whiteout support to tmpfs. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- mm/shmem.c |9 - 1 files changed, 8 insertions(+), 1 deletion(-) --- a/mm/shmem.c +++ b/mm/shmem.c @@ -74,7 +74,7 @@ #define LATENCY_LIMIT 64 /* Pretend that each entry is of this size in directory's i_size */ -#define BOGO_DIRENT_SIZE 20 +#define BOGO_DIRENT_SIZE 1 /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */ enum sgp_type { @@ -1772,6 +1772,11 @@ static int shmem_create(struct inode *di return shmem_mknod(dir, dentry, mode | S_IFREG, 0); } +static int shmem_whiteout(struct inode *dir, struct dentry *dentry) +{ + return shmem_mknod(dir, dentry, S_IRUGO | S_IWUGO | S_IFWHT, 0); +} + /* * Link a file.. */ @@ -2399,6 +2404,7 @@ static const struct inode_operations shm .rmdir = shmem_rmdir, .mknod = shmem_mknod, .rename = shmem_rename, + .whiteout = shmem_whiteout, #endif #ifdef CONFIG_TMPFS_POSIX_ACL .setattr= shmem_notify_change, @@ -2453,6 +2459,7 @@ static struct file_system_type tmpfs_fs_ .name = tmpfs, .get_sb = shmem_get_sb, .kill_sb= kill_litter_super, + .fs_flags = FS_WHT, }; static struct vfsmount *shm_mnt; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 9/14] Union-mount readdir
On 5/14/07, Bharata B Rao [EMAIL PROTECTED] wrote: +/* This is a copy from fs/readdir.c */ +struct getdents_callback { + struct linux_dirent __user *current_dir; + struct linux_dirent __user *previous; + int count; + int error; +}; This should go into a header file. +static int union_cache_find_entry(struct list_head *uc_list, + const char *name, int namelen) +{ + struct union_cache_entry *p; + int ret = 0; + + list_for_each_entry(p, uc_list, list) { + if (p-name.len != namelen) + continue; + if (strncmp(p-name.name, name, namelen) == 0) { + ret = 1; + break; + } + } + return ret; +} Why not use strlen instead of having both string and length as parameter? +static struct file * __dentry_open_read(struct dentry *dentry, + struct vfsmount *mnt, int flags) +{ + struct file *f; + struct inode *inode; + int error; + + error = -ENFILE; + f = get_empty_filp(); + if (!f) + goto out; This is the only case where error is not explicitly set to a different value before hitting out/cleanup = consider setting conditionally. so long, Carsten - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 9/14] Union-mount readdir
On Mon, May 14, 2007 at 12:43:43PM +0200, Carsten Otte wrote: On 5/14/07, Bharata B Rao [EMAIL PROTECTED] wrote: +/* This is a copy from fs/readdir.c */ +struct getdents_callback { + struct linux_dirent __user *current_dir; + struct linux_dirent __user *previous; + int count; + int error; +}; This should go into a header file. Yes ideally. As the comment above says, it is copied from fs/readdir.c and we should be using the definition from there. But that needs touching additional files and we wanted to avoid that for this initial RFC post. +static int union_cache_find_entry(struct list_head *uc_list, + const char *name, int namelen) +{ + struct union_cache_entry *p; + int ret = 0; + + list_for_each_entry(p, uc_list, list) { + if (p-name.len != namelen) + continue; + if (strncmp(p-name.name, name, namelen) == 0) { + ret = 1; + break; + } + } + return ret; +} Why not use strlen instead of having both string and length as parameter? All generic filldir routines in fs/readdir.c (filldir, fillonedir and filldir64) don't depend on the dirent-d_name to be NULL terminated and put a 0 themselves at the end. Hence we are also not depending on the name string to be NULL terminated. +static struct file * __dentry_open_read(struct dentry *dentry, + struct vfsmount *mnt, int flags) +{ + struct file *f; + struct inode *inode; + int error; + + error = -ENFILE; + f = get_empty_filp(); + if (!f) + goto out; This is the only case where error is not explicitly set to a different value before hitting out/cleanup = consider setting conditionally. Sure can be done. Again this routine is copied from dentry_open() and hence it is like that atm. Thanks for your review. Regards, Bharata. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 45/45] Fix file_permission()
We cannot easily switch from file_permission() to vfs_permission() everywhere, so fix file_permission() to not use a NULL nameidata for the remaining users. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/namei.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -296,7 +296,13 @@ int vfs_permission(struct nameidata *nd, */ int file_permission(struct file *file, int mask) { - return permission(file-f_path.dentry-d_inode, mask, NULL); + struct nameidata nd; + + nd.dentry = file-f_path.dentry; + nd.mnt = file-f_path.mnt; + nd.flags = LOOKUP_ACCESS; + + return permission(nd.dentry-d_inode, mask, nd); } /* -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 24/45] Pass struct vfsmount to the inode_getxattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |2 +- include/linux/security.h | 13 - security/dummy.c |3 ++- security/selinux/hooks.c |3 ++- 4 files changed, 13 insertions(+), 8 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -116,7 +116,7 @@ vfs_getxattr(struct dentry *dentry, stru if (error) return error; - error = security_inode_getxattr(dentry, name); + error = security_inode_getxattr(dentry, mnt, name); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -391,7 +391,7 @@ struct request_sock; * @value identified by @name for @dentry and @mnt. * @inode_getxattr: * Check permission before obtaining the extended attributes - * identified by @name for @dentry. + * identified by @name for @dentry and @mnt. * Return 0 if permission is granted. * @inode_listxattr: * Check permission before obtaining the list of extended attribute @@ -1248,7 +1248,8 @@ struct security_operations { struct vfsmount *mnt, char *name, void *value, size_t size, int flags); - int (*inode_getxattr) (struct dentry *dentry, char *name); + int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name); int (*inode_listxattr) (struct dentry *dentry); int (*inode_removexattr) (struct dentry *dentry, char *name); const char *(*inode_xattr_getsuffix) (void); @@ -1782,11 +1783,12 @@ static inline void security_inode_post_s security_ops-inode_post_setxattr (dentry, mnt, name, value, size, flags); } -static inline int security_inode_getxattr (struct dentry *dentry, char *name) +static inline int security_inode_getxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { if (unlikely (IS_PRIVATE (dentry-d_inode))) return 0; - return security_ops-inode_getxattr (dentry, name); + return security_ops-inode_getxattr (dentry, mnt, name); } static inline int security_inode_listxattr (struct dentry *dentry) @@ -2487,7 +2489,8 @@ static inline void security_inode_post_s int flags) { } -static inline int security_inode_getxattr (struct dentry *dentry, char *name) +static inline int security_inode_getxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -368,7 +368,8 @@ static void dummy_inode_post_setxattr (s { } -static int dummy_inode_getxattr (struct dentry *dentry, char *name) +static int dummy_inode_getxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2393,7 +2393,8 @@ static void selinux_inode_post_setxattr( return; } -static int selinux_inode_getxattr (struct dentry *dentry, char *name) +static int selinux_inode_getxattr (struct dentry *dentry, struct vfsmount *mnt, + char *name) { return dentry_has_perm(current, NULL, dentry, FILE__GETATTR); } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 42/45] AppArmor: add lock subtyping so lockdep does not report false dependencies
AppArmor uses lock subtyping to avoid false positives from lockdep. The profile lock is often taken nested, but it is guaranteed to be in a lock safe order and not the same lock when done, so it is safe. A third lock type (aa_lock_task_release) is given to the profile lock when it is taken in soft irq context during task release (aa_release). This is to avoid a false positive between the task lock and the profile lock. In task context the profile lock wraps the task lock with irqs off, but the kernel takes the task lock with irqs enabled. This won't ever result in a deadlock because aa_release doesn't need to take the task lock of the dead task that is released. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] --- security/apparmor/apparmor.h |7 +++ security/apparmor/inline.h| 25 ++--- security/apparmor/locking.txt | 21 +++-- security/apparmor/main.c |6 +++--- 4 files changed, 43 insertions(+), 16 deletions(-) --- a/security/apparmor/apparmor.h +++ b/security/apparmor/apparmor.h @@ -185,6 +185,13 @@ struct aa_audit { #define AA_CHECK_DIR 2 /* file type is directory */ #define AA_CHECK_MANGLE4 /* leave extra room for name mangling */ +/* lock subtypes so lockdep does not raise false dependencies */ +enum aa_lock_class { + aa_lock_normal, + aa_lock_nested, + aa_lock_task_release +}; + /* main.c */ extern int alloc_null_complain_profile(void); extern void free_null_complain_profile(void); --- a/security/apparmor/inline.h +++ b/security/apparmor/inline.h @@ -99,7 +99,8 @@ static inline void aa_free_task_context( * While the profile is locked, local interrupts are disabled. This also * gives us RCU reader safety. */ -static inline void lock_profile(struct aa_profile *profile) +static inline void lock_profile_nested(struct aa_profile *profile, + enum aa_lock_class lock_class) { /* We always lock top-level profiles instead of children. */ if (profile) @@ -112,7 +113,13 @@ static inline void lock_profile(struct a * the task_free_security hook, which may run in RCU context. */ if (profile) - spin_lock_irqsave(profile-lock, profile-int_flags); + spin_lock_irqsave_nested(profile-lock, profile-int_flags, +lock_class); +} + +static inline void lock_profile(struct aa_profile *profile) +{ + lock_profile_nested(profile, aa_lock_normal); } /** @@ -161,17 +168,21 @@ static inline void lock_both_profiles(st */ if (!profile1 || profile1 == profile2) { if (profile2) - spin_lock_irqsave(profile2-lock, profile2-int_flags); + spin_lock_irqsave_nested(profile2-lock, +profile2-int_flags, +aa_lock_normal); } else if (profile1 profile2) { /* profile1 cannot be NULL here. */ - spin_lock_irqsave(profile1-lock, profile1-int_flags); + spin_lock_irqsave_nested(profile1-lock, profile1-int_flags, +aa_lock_normal); if (profile2) - spin_lock(profile2-lock); + spin_lock_nested(profile2-lock, aa_lock_nested); } else { /* profile2 cannot be NULL here. */ - spin_lock_irqsave(profile2-lock, profile2-int_flags); - spin_lock(profile1-lock); + spin_lock_irqsave_nested(profile2-lock, profile2-int_flags, +aa_lock_normal); + spin_lock_nested(profile1-lock, aa_lock_nested); } } --- a/security/apparmor/locking.txt +++ b/security/apparmor/locking.txt @@ -51,9 +51,18 @@ list, and can sleep. This ensures that p won't race with itself. We release the profile_list_lock as soon as possible to avoid stalling exec during profile loading/replacement/removal. -lock_dep reports a false 'possible irq lock inversion dependency detected' -when the profile lock is taken in aa_release. This is due to that the -task_lock is often taken inside the profile lock but other kernel code -takes the task_lock with interrupts enabled. A deadlock will not actually -occur because apparmor does not take the task_lock in hard_irq or soft_irq -context. +AppArmor uses lock subtyping to avoid false positives from lockdep. The +profile lock is often taken nested, but it is guaranteed to be in a lock +safe order and not the same lock when done, so it is safe. + +A third lock type (aa_lock_task_release) is given to the profile lock +when it is taken in soft irq context during task release (aa_release). +This is to avoid a false positive between the task lock and the profile
[RFD Patch 3/4] Dont use a NULL nameidata in xattr_permission()
Create nameidata2 struct xattr_permission so that it does not pass NULL to permission. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/xattr.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -25,8 +25,16 @@ * because different namespaces have very different rules. */ static int -xattr_permission(struct inode *inode, const char *name, int mask) +xattr_permission(struct dentry *dentry, struct vfsmount *mnt, const char *name, +int mask) { + struct inode *inode = dentry-d_inode; + struct nameidata2 nd; + + nd.dentry = dentry; + nd.mnt = mnt; + nd.flags = 0; + /* * We can never set or remove an extended attribute on a read-only * filesystem or on an immutable / append-only inode. @@ -65,7 +73,7 @@ xattr_permission(struct inode *inode, co return -EPERM; } - return permission(inode, mask, NULL); + return permission(inode, mask, nd); } int @@ -75,7 +83,7 @@ vfs_setxattr(struct dentry *dentry, stru struct inode *inode = dentry-d_inode; int error; - error = xattr_permission(inode, name, MAY_WRITE); + error = xattr_permission(dentry, mnt, name, MAY_WRITE); if (error) return error; @@ -112,7 +120,7 @@ vfs_getxattr(struct dentry *dentry, stru struct inode *inode = dentry-d_inode; int error; - error = xattr_permission(inode, name, MAY_READ); + error = xattr_permission(dentry, mnt, name, MAY_READ); if (error) return error; @@ -174,7 +182,7 @@ vfs_removexattr(struct dentry *dentry, s if (!inode-i_op-removexattr) return -EOPNOTSUPP; - error = xattr_permission(inode, name, MAY_WRITE); + error = xattr_permission(dentry, mnt, name, MAY_WRITE); if (error) return error; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 32/45] Enable LSM hooks to distinguish operations on file descriptors from operations on pathnames
Struct iattr already contains ia_file since commit cc4e69de from Miklos (which is related to commit befc649c). Use this to pass struct file down the setattr hooks. This allows LSMs to distinguish operations on file descriptors from operations on paths. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] Cc: Miklos Szeredi [EMAIL PROTECTED] --- fs/nfsd/vfs.c | 12 +++- fs/open.c | 16 +++- include/linux/fs.h |3 +++ 3 files changed, 21 insertions(+), 10 deletions(-) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -383,7 +383,7 @@ static ssize_t nfsd_getxattr(struct dent { ssize_t buflen; - buflen = vfs_getxattr(dentry, mnt, key, NULL, 0); + buflen = vfs_getxattr(dentry, mnt, key, NULL, 0, NULL); if (buflen = 0) return buflen; @@ -391,7 +391,7 @@ static ssize_t nfsd_getxattr(struct dent if (!*buf) return -ENOMEM; - return vfs_getxattr(dentry, mnt, key, *buf, buflen); + return vfs_getxattr(dentry, mnt, key, *buf, buflen, NULL); } #endif @@ -417,7 +417,7 @@ set_nfsv4_acl_one(struct dentry *dentry, goto out; } - error = vfs_setxattr(dentry, mnt, key, buf, len, 0); + error = vfs_setxattr(dentry, mnt, key, buf, len, 0, NULL); out: kfree(buf); return error; @@ -1992,12 +1992,14 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i mnt = fhp-fh_export-ex_mnt; if (size) - error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size,0); + error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0, +NULL); else { if (!S_ISDIR(inode-i_mode) type == ACL_TYPE_DEFAULT) error = 0; else { - error = vfs_removexattr(fhp-fh_dentry, mnt, name); + error = vfs_removexattr(fhp-fh_dentry, mnt, name, + NULL); if (error == -ENODATA) error = 0; } --- a/fs/open.c +++ b/fs/open.c @@ -522,6 +522,8 @@ asmlinkage long sys_fchmod(unsigned int mode = inode-i_mode; newattrs.ia_mode = (mode S_IALLUGO) | (inode-i_mode ~S_IALLUGO); newattrs.ia_valid = ATTR_MODE | ATTR_CTIME; + newattrs.ia_valid |= ATTR_FILE; + newattrs.ia_file = file; err = notify_change(dentry, file-f_path.mnt, newattrs); mutex_unlock(inode-i_mutex); @@ -572,7 +574,7 @@ asmlinkage long sys_chmod(const char __u } static int chown_common(struct dentry * dentry, struct vfsmount *mnt, - uid_t user, gid_t group) + uid_t user, gid_t group, struct file *file) { struct inode * inode; int error; @@ -600,6 +602,10 @@ static int chown_common(struct dentry * } if (!S_ISDIR(inode-i_mode)) newattrs.ia_valid |= ATTR_KILL_SUID|ATTR_KILL_SGID; + if (file) { + newattrs.ia_file = file; + newattrs.ia_valid |= ATTR_FILE; + } mutex_lock(inode-i_mutex); error = notify_change(dentry, mnt, newattrs); mutex_unlock(inode-i_mutex); @@ -615,7 +621,7 @@ asmlinkage long sys_chown(const char __u error = user_path_walk(filename, nd); if (error) goto out; - error = chown_common(nd.dentry, nd.mnt, user, group); + error = chown_common(nd.dentry, nd.mnt, user, group, NULL); path_release(nd); out: return error; @@ -635,7 +641,7 @@ asmlinkage long sys_fchownat(int dfd, co error = __user_walk_fd(dfd, filename, follow, nd); if (error) goto out; - error = chown_common(nd.dentry, nd.mnt, user, group); + error = chown_common(nd.dentry, nd.mnt, user, group, NULL); path_release(nd); out: return error; @@ -649,7 +655,7 @@ asmlinkage long sys_lchown(const char __ error = user_path_walk_link(filename, nd); if (error) goto out; - error = chown_common(nd.dentry, nd.mnt, user, group); + error = chown_common(nd.dentry, nd.mnt, user, group, NULL); path_release(nd); out: return error; @@ -668,7 +674,7 @@ asmlinkage long sys_fchown(unsigned int dentry = file-f_path.dentry; audit_inode(NULL, dentry-d_inode); - error = chown_common(dentry, file-f_path.mnt, user, group); + error = chown_common(dentry, file-f_path.mnt, user, group, file); fput(file); out: return error; --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -351,6 +351,9 @@ struct iattr { * Not an attribute, but an auxilary info for filesystems wanting to * implement an ftruncate() like method. NOTE: filesystem should * check for (ia_valid ATTR_FILE), and not for
[AppArmor 06/45] Pass struct vfsmount to the inode_mkdir LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |8 ++-- security/dummy.c |2 +- security/selinux/hooks.c |3 ++- 4 files changed, 10 insertions(+), 5 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1946,7 +1946,7 @@ int vfs_mkdir(struct inode *dir, struct return -EPERM; mode = (S_IRWXUGO|S_ISVTX); - error = security_inode_mkdir(dir, dentry, mode); + error = security_inode_mkdir(dir, dentry, mnt, mode); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -308,6 +308,7 @@ struct request_sock; * associated with inode strcture @dir. * @dir containst the inode structure of parent of the directory to be created. * @dentry contains the dentry structure of new directory. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the mode of new directory. * Return 0 if permission is granted. * @inode_rmdir: @@ -1213,7 +1214,8 @@ struct security_operations { int (*inode_unlink) (struct inode *dir, struct dentry *dentry); int (*inode_symlink) (struct inode *dir, struct dentry *dentry, const char *old_name); - int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, int mode); + int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, int mode, dev_t dev); @@ -1650,11 +1652,12 @@ static inline int security_inode_symlink static inline int security_inode_mkdir (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode) { if (unlikely (IS_PRIVATE (dir))) return 0; - return security_ops-inode_mkdir (dir, dentry, mode); + return security_ops-inode_mkdir (dir, dentry, mnt, mode); } static inline int security_inode_rmdir (struct inode *dir, @@ -2371,6 +2374,7 @@ static inline int security_inode_symlink static inline int security_inode_mkdir (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -288,7 +288,7 @@ static int dummy_inode_symlink (struct i } static int dummy_inode_mkdir (struct inode *inode, struct dentry *dentry, - int mask) + struct vfsmount *mnt, int mask) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2207,7 +2207,8 @@ static int selinux_inode_symlink(struct return may_create(dir, dentry, SECCLASS_LNK_FILE); } -static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry, int mask) +static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mask) { return may_create(dir, dentry, SECCLASS_DIR); } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 20/45] Pass struct vfsmount to the inode_rename LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |6 -- include/linux/security.h | 18 +- security/dummy.c |4 +++- security/selinux/hooks.c |8 ++-- 4 files changed, 26 insertions(+), 10 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2417,7 +2417,8 @@ static int vfs_rename_dir(struct inode * return error; } - error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry); + error = security_inode_rename(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); if (error) return error; @@ -2451,7 +2452,8 @@ static int vfs_rename_other(struct inode struct inode *target; int error; - error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry); + error = security_inode_rename(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -336,8 +336,10 @@ struct request_sock; * Check for permission to rename a file or directory. * @old_dir contains the inode structure for parent of the old link. * @old_dentry contains the dentry structure of the old link. + * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL). * @new_dir contains the inode structure for parent of the new link. * @new_dentry contains the dentry structure of the new link. + * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL). * Return 0 if permission is granted. * @inode_readlink: * Check the permission to read the symbolic link. @@ -1230,7 +1232,9 @@ struct security_operations { int (*inode_mknod) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, -struct inode *new_dir, struct dentry *new_dentry); +struct vfsmount *old_mnt, +struct inode *new_dir, struct dentry *new_dentry, +struct vfsmount *new_mnt); int (*inode_readlink) (struct dentry *dentry, struct vfsmount *mnt); int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd); int (*inode_permission) (struct inode *inode, int mask, struct nameidata *nd); @@ -1696,14 +1700,16 @@ static inline int security_inode_mknod ( static inline int security_inode_rename (struct inode *old_dir, struct dentry *old_dentry, +struct vfsmount *old_mnt, struct inode *new_dir, -struct dentry *new_dentry) +struct dentry *new_dentry, +struct vfsmount *new_mnt) { if (unlikely (IS_PRIVATE (old_dentry-d_inode) || (new_dentry-d_inode IS_PRIVATE (new_dentry-d_inode return 0; - return security_ops-inode_rename (old_dir, old_dentry, - new_dir, new_dentry); + return security_ops-inode_rename (old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); } static inline int security_inode_readlink (struct dentry *dentry, @@ -2419,8 +2425,10 @@ static inline int security_inode_mknod ( static inline int security_inode_rename (struct inode *old_dir, struct dentry *old_dentry, +struct vfsmount *old_mnt, struct inode *new_dir, -struct dentry *new_dentry) +struct dentry *new_dentry, +struct vfsmount *new_mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -310,8 +310,10 @@ static int dummy_inode_mknod (struct ino static int dummy_inode_rename (struct inode *old_inode, struct dentry *old_dentry, + struct vfsmount *old_mnt, struct inode *new_inode, - struct dentry *new_dentry) + struct dentry *new_dentry, + struct vfsmount *new_mnt) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2238,8 +2238,12 @@ static int selinux_inode_mknod(struct in
[AppArmor 14/45] Add a struct vfsmount parameter to vfs_rmdir()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |4 +++- fs/namei.c|4 ++-- fs/nfsd/nfs4recover.c |2 +- fs/nfsd/vfs.c |8 +--- fs/reiserfs/xattr.c |2 +- include/linux/fs.h|2 +- 6 files changed, 13 insertions(+), 9 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -542,14 +542,16 @@ out: static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry) { struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; int rc; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); dget(dentry); lower_dir_dentry = lock_parent(lower_dentry); dget(lower_dentry); - rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry); + rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt); dput(lower_dentry); if (!rc) d_delete(lower_dentry); --- a/fs/namei.c +++ b/fs/namei.c @@ -2024,7 +2024,7 @@ void dentry_unhash(struct dentry *dentry spin_unlock(dcache_lock); } -int vfs_rmdir(struct inode *dir, struct dentry *dentry) +int vfs_rmdir(struct inode *dir, struct dentry *dentry,struct vfsmount *mnt) { int error = may_delete(dir, dentry, 1); @@ -2088,7 +2088,7 @@ static long do_rmdir(int dfd, const char error = PTR_ERR(dentry); if (IS_ERR(dentry)) goto exit2; - error = vfs_rmdir(nd.dentry-d_inode, dentry); + error = vfs_rmdir(nd.dentry-d_inode, dentry, nd.mnt); dput(dentry); exit2: mutex_unlock(nd.dentry-d_inode-i_mutex); --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -276,7 +276,7 @@ nfsd4_clear_clid_dir(struct dentry *dir, * a kernel from the future */ nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file); mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT); - status = vfs_rmdir(dir-d_inode, dentry); + status = vfs_rmdir(dir-d_inode, dentry, rec_dir.mnt); mutex_unlock(dir-d_inode-i_mutex); return status; } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1666,6 +1666,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru char *fname, int flen) { struct dentry *dentry, *rdentry; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1680,6 +1681,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru fh_lock_nested(fhp, I_MUTEX_PARENT); dentry = fhp-fh_dentry; dirp = dentry-d_inode; + exp = fhp-fh_export; rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); @@ -1697,21 +1699,21 @@ nfsd_unlink(struct svc_rqst *rqstp, stru if (type != S_IFDIR) { /* It's UNLINK */ #ifdef MSNFS - if ((fhp-fh_export-ex_flags NFSEXP_MSNFS) + if ((exp-ex_flags NFSEXP_MSNFS) (atomic_read(rdentry-d_count) 1)) { host_err = -EPERM; } else #endif host_err = vfs_unlink(dirp, rdentry); } else { /* It's RMDIR */ - host_err = vfs_rmdir(dirp, rdentry); + host_err = vfs_rmdir(dirp, rdentry, exp-ex_mnt); } dput(rdentry); if (host_err) goto out_nfserr; - if (EX_ISSYNC(fhp-fh_export)) + if (EX_ISSYNC(exp)) host_err = nfsd_sync_dir(dentry); out_nfserr: --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -775,7 +775,7 @@ int reiserfs_delete_xattrs(struct inode if (dir-d_inode-i_nlink = 2) { root = get_xa_root(inode-i_sb, XATTR_REPLACE); reiserfs_write_lock_xattrs(inode-i_sb); - err = vfs_rmdir(root-d_inode, dir); + err = vfs_rmdir(root-d_inode, dir, NULL); reiserfs_write_unlock_xattrs(inode-i_sb); dput(root); } else { --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -996,7 +996,7 @@ extern int vfs_mkdir(struct inode *, str extern int vfs_mknod(struct inode *, struct dentry *, struct vfsmount *, int, dev_t); extern int vfs_symlink(struct inode *, struct dentry *, struct vfsmount *, const char *, int); extern int vfs_link(struct dentry *, struct vfsmount *, struct inode *, struct dentry *, struct vfsmount *); -extern int vfs_rmdir(struct inode *, struct dentry *); +extern int vfs_rmdir(struct inode *, struct dentry *, struct vfsmount *); extern int vfs_unlink(struct inode *, struct dentry *); extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); -- - To unsubscribe from this list: send
[AppArmor 22/45] Pass struct vfsmount to the inode_setxattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |4 ++-- include/linux/security.h | 40 +--- security/commoncap.c |4 ++-- security/dummy.c |9 ++--- security/selinux/hooks.c |8 ++-- 5 files changed, 41 insertions(+), 24 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -80,7 +80,7 @@ vfs_setxattr(struct dentry *dentry, stru return error; mutex_lock(inode-i_mutex); - error = security_inode_setxattr(dentry, name, value, size, flags); + error = security_inode_setxattr(dentry, mnt, name, value, size, flags); if (error) goto out; error = -EOPNOTSUPP; @@ -88,7 +88,7 @@ vfs_setxattr(struct dentry *dentry, stru error = inode-i_op-setxattr(dentry, name, value, size, flags); if (!error) { fsnotify_xattr(dentry); - security_inode_post_setxattr(dentry, name, value, + security_inode_post_setxattr(dentry, mnt, name, value, size, flags); } } else if (!strncmp(name, XATTR_SECURITY_PREFIX, --- a/include/linux/security.h +++ b/include/linux/security.h @@ -49,7 +49,7 @@ extern void cap_capset_set (struct task_ extern int cap_bprm_set_security (struct linux_binprm *bprm); extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe); extern int cap_bprm_secureexec(struct linux_binprm *bprm); -extern int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags); +extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, char *name); extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); extern void cap_task_reparent_to_init (struct task_struct *p); @@ -384,11 +384,11 @@ struct request_sock; * inode. * @inode_setxattr: * Check permission before setting the extended attributes - * @value identified by @name for @dentry. + * @value identified by @name for @dentry and @mnt. * Return 0 if permission is granted. * @inode_post_setxattr: * Update inode security field after successful setxattr operation. - * @value identified by @name for @dentry. + * @value identified by @name for @dentry and @mnt. * @inode_getxattr: * Check permission before obtaining the extended attributes * identified by @name for @dentry. @@ -1242,9 +1242,11 @@ struct security_operations { struct iattr *attr); int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry); void (*inode_delete) (struct inode *inode); - int (*inode_setxattr) (struct dentry *dentry, char *name, void *value, - size_t size, int flags); - void (*inode_post_setxattr) (struct dentry *dentry, char *name, void *value, + int (*inode_setxattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name, void *value, size_t size, int flags); + void (*inode_post_setxattr) (struct dentry *dentry, +struct vfsmount *mnt, +char *name, void *value, size_t size, int flags); int (*inode_getxattr) (struct dentry *dentry, char *name); int (*inode_listxattr) (struct dentry *dentry); @@ -1760,20 +1762,24 @@ static inline void security_inode_delete security_ops-inode_delete (inode); } -static inline int security_inode_setxattr (struct dentry *dentry, char *name, +static inline int security_inode_setxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name, void *value, size_t size, int flags) { if (unlikely (IS_PRIVATE (dentry-d_inode))) return 0; - return security_ops-inode_setxattr (dentry, name, value, size, flags); + return security_ops-inode_setxattr (dentry, mnt, name, value, size, +flags); } -static inline void security_inode_post_setxattr (struct dentry *dentry, char *name, - void *value, size_t size, int flags) +static inline void security_inode_post_setxattr (struct dentry *dentry, +struct vfsmount *mnt, +char *name, void *value, +size_t size, int flags) { if (unlikely (IS_PRIVATE
[AppArmor 19/45] Add struct vfsmount parameters to vfs_rename()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |7 ++- fs/namei.c | 19 --- fs/nfsd/vfs.c |3 ++- include/linux/fs.h |2 +- 4 files changed, 21 insertions(+), 10 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -598,19 +598,24 @@ ecryptfs_rename(struct inode *old_dir, s { int rc; struct dentry *lower_old_dentry; + struct vfsmount *lower_old_mnt; struct dentry *lower_new_dentry; + struct vfsmount *lower_new_mnt; struct dentry *lower_old_dir_dentry; struct dentry *lower_new_dir_dentry; lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry); + lower_old_mnt = ecryptfs_dentry_to_lower_mnt(old_dentry); lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry); + lower_new_mnt = ecryptfs_dentry_to_lower_mnt(new_dentry); dget(lower_old_dentry); dget(lower_new_dentry); lower_old_dir_dentry = dget_parent(lower_old_dentry); lower_new_dir_dentry = dget_parent(lower_new_dentry); lock_rename(lower_old_dir_dentry, lower_new_dir_dentry); rc = vfs_rename(lower_old_dir_dentry-d_inode, lower_old_dentry, - lower_new_dir_dentry-d_inode, lower_new_dentry); + lower_old_mnt, lower_new_dir_dentry-d_inode, + lower_new_dentry, lower_new_mnt); if (rc) goto out_lock; fsstack_copy_attr_all(new_dir, lower_new_dir_dentry-d_inode, NULL); --- a/fs/namei.c +++ b/fs/namei.c @@ -2401,7 +2401,8 @@ asmlinkage long sys_link(const char __us *locking]. */ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { int error = 0; struct inode *target; @@ -2444,7 +2445,8 @@ static int vfs_rename_dir(struct inode * } static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { struct inode *target; int error; @@ -2472,7 +2474,8 @@ static int vfs_rename_other(struct inode } int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { int error; int is_dir = S_ISDIR(old_dentry-d_inode-i_mode); @@ -2501,9 +2504,11 @@ int vfs_rename(struct inode *old_dir, st old_name = fsnotify_oldname_init(old_dentry-d_name.name); if (is_dir) - error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry); + error = vfs_rename_dir(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); else - error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry); + error = vfs_rename_other(old_dir, old_dentry, old_mnt, +new_dir, new_dentry, new_mnt); if (!error) { const char *new_name = old_dentry-d_name.name; fsnotify_move(old_dir, new_dir, old_name, new_name, is_dir, @@ -2575,8 +2580,8 @@ static int do_rename(int olddfd, const c if (new_dentry == trap) goto exit5; - error = vfs_rename(old_dir-d_inode, old_dentry, - new_dir-d_inode, new_dentry); + error = vfs_rename(old_dir-d_inode, old_dentry, oldnd.mnt, + new_dir-d_inode, new_dentry, newnd.mnt); exit5: dput(new_dentry); exit4: --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1630,7 +1630,8 @@ nfsd_rename(struct svc_rqst *rqstp, stru host_err = -EPERM; } else #endif - host_err = vfs_rename(fdir, odentry, tdir, ndentry); + host_err = vfs_rename(fdir, odentry, ffhp-fh_export-ex_mnt, + tdir, ndentry, tfhp-fh_export-ex_mnt); if (!host_err EX_ISSYNC(tfhp-fh_export)) { host_err = nfsd_sync_dir(tdentry); if (!host_err) --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -998,7 +998,7 @@ extern int vfs_symlink(struct inode *, s extern int vfs_link(struct dentry *, struct vfsmount *, struct inode *, struct dentry *, struct vfsmount *); extern int
[RFD Patch 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs
lkml-explanatory.txt -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 16/45] Call lsm hook before unhashing dentry in vfs_rmdir()
If we unhash the dentry before calling the security_inode_rmdir hook, we cannot compute the file's pathname in the hook anymore. AppArmor needs to know the filename in order to decide whether a file may be deleted, though. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/namei.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2034,6 +2034,10 @@ int vfs_rmdir(struct inode *dir, struct if (!dir-i_op || !dir-i_op-rmdir) return -EPERM; + error = security_inode_rmdir(dir, dentry, mnt); + if (error) + return error; + DQUOT_INIT(dir); mutex_lock(dentry-d_inode-i_mutex); @@ -2041,12 +2045,9 @@ int vfs_rmdir(struct inode *dir, struct if (d_mountpoint(dentry)) error = -EBUSY; else { - error = security_inode_rmdir(dir, dentry, mnt); - if (!error) { - error = dir-i_op-rmdir(dir, dentry); - if (!error) - dentry-d_inode-i_flags |= S_DEAD; - } + error = dir-i_op-rmdir(dir, dentry); + if (!error) + dentry-d_inode-i_flags |= S_DEAD; } mutex_unlock(dentry-d_inode-i_mutex); if (!error) { -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 30/45] Make d_path() consistent across mount operations
The path that __d_path() computes can become slightly inconsistent when it races with mount operations: it grabs the vfsmount_lock when traversing mount points but immediately drops it again, only to re-grab it when it reaches the next mount point. The result is that the filename computed is not always consisent, and the file may never have had that name. (This is unlikely, but still possible.) Fix this by grabbing the vfsmount_lock when the first mount point is reached, and holding onto it until the d_cache lookup is completed. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/dcache.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1783,7 +1783,7 @@ static char *__d_path(struct dentry *den struct dentry *root, struct vfsmount *rootmnt, char *buffer, int buflen, int fail_deleted) { - int namelen, is_slash; + int namelen, is_slash, vfsmount_locked = 0; if (buflen 2) return ERR_PTR(-ENAMETOOLONG); @@ -1806,14 +1806,14 @@ static char *__d_path(struct dentry *den struct dentry * parent; if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) { - spin_lock(vfsmount_lock); - if (vfsmnt-mnt_parent == vfsmnt) { - spin_unlock(vfsmount_lock); - goto global_root; + if (!vfsmount_locked) { + spin_lock(vfsmount_lock); + vfsmount_locked = 1; } + if (vfsmnt-mnt_parent == vfsmnt) + goto global_root; dentry = vfsmnt-mnt_mountpoint; vfsmnt = vfsmnt-mnt_parent; - spin_unlock(vfsmount_lock); continue; } parent = dentry-d_parent; @@ -1832,6 +1832,8 @@ static char *__d_path(struct dentry *den *--buffer = '/'; out: + if (vfsmount_locked) + spin_unlock(vfsmount_lock); spin_unlock(dcache_lock); return buffer; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 05/45] Add struct vfsmount parameter to vfs_mkdir()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |5 - fs/namei.c|5 +++-- fs/nfsd/nfs4recover.c |3 ++- fs/nfsd/vfs.c |8 +--- include/linux/fs.h|2 +- 5 files changed, 15 insertions(+), 8 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -509,11 +509,14 @@ static int ecryptfs_mkdir(struct inode * { int rc; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); lower_dir_dentry = lock_parent(lower_dentry); - rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, mode); + rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt, + mode); if (rc || !lower_dentry-d_inode) goto out; rc = ecryptfs_interpose(lower_dentry, dentry, dir-i_sb, 0); --- a/fs/namei.c +++ b/fs/namei.c @@ -1934,7 +1934,8 @@ asmlinkage long sys_mknod(const char __u return sys_mknodat(AT_FDCWD, filename, mode, dev); } -int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +int vfs_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, + int mode) { int error = may_create(dir, dentry, NULL); @@ -1978,7 +1979,7 @@ asmlinkage long sys_mkdirat(int dfd, con if (!IS_POSIXACL(nd.dentry-d_inode)) mode = ~current-fs-umask; - error = vfs_mkdir(nd.dentry-d_inode, dentry, mode); + error = vfs_mkdir(nd.dentry-d_inode, dentry, nd.mnt, mode); dput(dentry); out_unlock: mutex_unlock(nd.dentry-d_inode-i_mutex); --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -156,7 +156,8 @@ nfsd4_create_clid_dir(struct nfs4_client dprintk(NFSD: nfsd4_create_clid_dir: DIRECTORY EXISTS\n); goto out_put; } - status = vfs_mkdir(rec_dir.dentry-d_inode, dentry, S_IRWXU); + status = vfs_mkdir(rec_dir.dentry-d_inode, dentry, rec_dir.mnt, + S_IRWXU); out_put: dput(dentry); out_unlock: --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1115,6 +1115,7 @@ nfsd_create(struct svc_rqst *rqstp, stru int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild = NULL; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1131,6 +1132,7 @@ nfsd_create(struct svc_rqst *rqstp, stru goto out; dentry = fhp-fh_dentry; + exp = fhp-fh_export; dirp = dentry-d_inode; err = nfserr_notdir; @@ -1147,7 +1149,7 @@ nfsd_create(struct svc_rqst *rqstp, stru host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) goto out_nfserr; - err = fh_compose(resfhp, fhp-fh_export, dchild, fhp); + err = fh_compose(resfhp, exp, dchild, fhp); if (err) goto out; } else { @@ -1186,7 +1188,7 @@ nfsd_create(struct svc_rqst *rqstp, stru host_err = vfs_create(dirp, dchild, iap-ia_mode, NULL); break; case S_IFDIR: - host_err = vfs_mkdir(dirp, dchild, iap-ia_mode); + host_err = vfs_mkdir(dirp, dchild, exp-ex_mnt, iap-ia_mode); break; case S_IFCHR: case S_IFBLK: @@ -1201,7 +1203,7 @@ nfsd_create(struct svc_rqst *rqstp, stru if (host_err 0) goto out_nfserr; - if (EX_ISSYNC(fhp-fh_export)) { + if (EX_ISSYNC(exp)) { err = nfserrno(nfsd_sync_dir(dentry)); write_inode_now(dchild-d_inode, 1); } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -992,7 +992,7 @@ extern void unlock_super(struct super_bl */ extern int vfs_permission(struct nameidata *, int); extern int vfs_create(struct inode *, struct dentry *, int, struct nameidata *); -extern int vfs_mkdir(struct inode *, struct dentry *, int); +extern int vfs_mkdir(struct inode *, struct dentry *, struct vfsmount *, int); extern int vfs_mknod(struct inode *, struct dentry *, int, dev_t); extern int vfs_symlink(struct inode *, struct dentry *, const char *, int); extern int vfs_link(struct dentry *, struct inode *, struct dentry *); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 15/45] Pass struct vfsmount to the inode_rmdir LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h | 12 security/dummy.c |3 ++- security/selinux/hooks.c |3 ++- 4 files changed, 13 insertions(+), 7 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2041,7 +2041,7 @@ int vfs_rmdir(struct inode *dir, struct if (d_mountpoint(dentry)) error = -EBUSY; else { - error = security_inode_rmdir(dir, dentry); + error = security_inode_rmdir(dir, dentry, mnt); if (!error) { error = dir-i_op-rmdir(dir, dentry); if (!error) --- a/include/linux/security.h +++ b/include/linux/security.h @@ -318,6 +318,7 @@ struct request_sock; * Check the permission to remove a directory. * @dir contains the inode structure of parent of the directory to be removed. * @dentry contains the dentry structure of directory to be removed. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * Return 0 if permission is granted. * @inode_mknod: * Check permissions when creating a special file (or a socket or a fifo @@ -1222,7 +1223,8 @@ struct security_operations { struct vfsmount *mnt, const char *old_name); int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); - int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); + int (*inode_rmdir) (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, @@ -1671,11 +1673,12 @@ static inline int security_inode_mkdir ( } static inline int security_inode_rmdir (struct inode *dir, - struct dentry *dentry) + struct dentry *dentry, + struct vfsmount *mnt) { if (unlikely (IS_PRIVATE (dentry-d_inode))) return 0; - return security_ops-inode_rmdir (dir, dentry); + return security_ops-inode_rmdir (dir, dentry, mnt); } static inline int security_inode_mknod (struct inode *dir, @@ -2396,7 +2399,8 @@ static inline int security_inode_mkdir ( } static inline int security_inode_rmdir (struct inode *dir, - struct dentry *dentry) + struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -295,7 +295,8 @@ static int dummy_inode_mkdir (struct ino return 0; } -static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry) +static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2219,7 +2219,8 @@ static int selinux_inode_mkdir(struct in return may_create(dir, dentry, SECCLASS_DIR); } -static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry) +static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt) { return may_link(dir, dentry, MAY_RMDIR); } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 01/45] Pass struct vfsmount to the inode_create LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |9 ++--- security/dummy.c |2 +- security/selinux/hooks.c |3 ++- 4 files changed, 10 insertions(+), 6 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1521,7 +1521,7 @@ int vfs_create(struct inode *dir, struct return -EACCES; /* shouldn't it be ENOSYS? */ mode = S_IALLUGO; mode |= S_IFREG; - error = security_inode_create(dir, dentry, mode); + error = security_inode_create(dir, dentry, nd ? nd-mnt : NULL, mode); if (error) return error; DQUOT_INIT(dir); --- a/include/linux/security.h +++ b/include/linux/security.h @@ -283,6 +283,7 @@ struct request_sock; * Check permission to create a regular file. * @dir contains inode structure of the parent of the new file. * @dentry contains the dentry structure for the file to be created. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the file mode of the file to be created. * Return 0 if permission is granted. * @inode_link: @@ -1204,8 +1205,8 @@ struct security_operations { void (*inode_free_security) (struct inode *inode); int (*inode_init_security) (struct inode *inode, struct inode *dir, char **name, void **value, size_t *len); - int (*inode_create) (struct inode *dir, -struct dentry *dentry, int mode); + int (*inode_create) (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode); int (*inode_link) (struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); @@ -1611,11 +1612,12 @@ static inline int security_inode_init_se static inline int security_inode_create (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode) { if (unlikely (IS_PRIVATE (dir))) return 0; - return security_ops-inode_create (dir, dentry, mode); + return security_ops-inode_create (dir, dentry, mnt, mode); } static inline int security_inode_link (struct dentry *old_dentry, @@ -2338,6 +2340,7 @@ static inline int security_inode_init_se static inline int security_inode_create (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -265,7 +265,7 @@ static int dummy_inode_init_security (st } static int dummy_inode_create (struct inode *inode, struct dentry *dentry, - int mask) + struct vfsmount *mnt, int mask) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2176,7 +2176,8 @@ static int selinux_inode_init_security(s return 0; } -static int selinux_inode_create(struct inode *dir, struct dentry *dentry, int mask) +static int selinux_inode_create(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mask) { return may_create(dir, dentry, SECCLASS_FILE); } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 33/45] Pass struct file down the inode_*xattr security LSM hooks
This allows LSMs to also distinguish between file descriptor and path access for the xattr operations. (The other relevant operations are covered by the setattr hook.) Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c | 58 --- include/linux/security.h | 53 +- include/linux/xattr.h|8 +++--- security/commoncap.c |4 +-- security/dummy.c | 10 security/selinux/hooks.c | 10 6 files changed, 80 insertions(+), 63 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -70,7 +70,7 @@ xattr_permission(struct inode *inode, co int vfs_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, -void *value, size_t size, int flags) +void *value, size_t size, int flags, struct file *file) { struct inode *inode = dentry-d_inode; int error; @@ -80,7 +80,7 @@ vfs_setxattr(struct dentry *dentry, stru return error; mutex_lock(inode-i_mutex); - error = security_inode_setxattr(dentry, mnt, name, value, size, flags); + error = security_inode_setxattr(dentry, mnt, name, value, size, flags, file); if (error) goto out; error = -EOPNOTSUPP; @@ -107,7 +107,7 @@ EXPORT_SYMBOL_GPL(vfs_setxattr); ssize_t vfs_getxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, -void *value, size_t size) +void *value, size_t size, struct file *file) { struct inode *inode = dentry-d_inode; int error; @@ -116,7 +116,7 @@ vfs_getxattr(struct dentry *dentry, stru if (error) return error; - error = security_inode_getxattr(dentry, mnt, name); + error = security_inode_getxattr(dentry, mnt, name, file); if (error) return error; @@ -144,12 +144,12 @@ EXPORT_SYMBOL_GPL(vfs_getxattr); ssize_t vfs_listxattr(struct dentry *dentry, struct vfsmount *mnt, char *list, - size_t size) + size_t size, struct file *file) { struct inode *inode = dentry-d_inode; ssize_t error; - error = security_inode_listxattr(dentry, mnt); + error = security_inode_listxattr(dentry, mnt, file); if (error) return error; error = -EOPNOTSUPP; @@ -165,7 +165,8 @@ vfs_listxattr(struct dentry *dentry, str EXPORT_SYMBOL_GPL(vfs_listxattr); int -vfs_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name) +vfs_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name, + struct file *file) { struct inode *inode = dentry-d_inode; int error; @@ -177,7 +178,7 @@ vfs_removexattr(struct dentry *dentry, s if (error) return error; - error = security_inode_removexattr(dentry, mnt, name); + error = security_inode_removexattr(dentry, mnt, name, file); if (error) return error; @@ -197,7 +198,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr); */ static long setxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *name, -void __user *value, size_t size, int flags) +void __user *value, size_t size, int flags, struct file *file) { int error; void *kvalue = NULL; @@ -224,7 +225,7 @@ setxattr(struct dentry *dentry, struct v } } - error = vfs_setxattr(dentry, mnt, kname, kvalue, size, flags); + error = vfs_setxattr(dentry, mnt, kname, kvalue, size, flags, file); kfree(kvalue); return error; } @@ -239,7 +240,7 @@ sys_setxattr(char __user *path, char __u error = user_path_walk(path, nd); if (error) return error; - error = setxattr(nd.dentry, nd.mnt, name, value, size, flags); + error = setxattr(nd.dentry, nd.mnt, name, value, size, flags, NULL); path_release(nd); return error; } @@ -254,7 +255,7 @@ sys_lsetxattr(char __user *path, char __ error = user_path_walk_link(path, nd); if (error) return error; - error = setxattr(nd.dentry, nd.mnt, name, value, size, flags); + error = setxattr(nd.dentry, nd.mnt, name, value, size, flags, NULL); path_release(nd); return error; } @@ -272,7 +273,7 @@ sys_fsetxattr(int fd, char __user *name, return error; dentry = f-f_path.dentry; audit_inode(NULL, dentry-d_inode); - error = setxattr(dentry, f-f_vfsmnt, name, value, size, flags); + error = setxattr(dentry, f-f_vfsmnt, name, value, size, flags, f); fput(f); return error; } @@ -282,7 +283,7 @@ sys_fsetxattr(int fd, char __user *name, */ static ssize_t getxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *name, -void __user *value, size_t
[AppArmor 00/45] AppArmor security module overview
lkml-explanatory.txt -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Pathname matching, transition table loading, profile loading and manipulation. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/match.c| 232 security/apparmor/match.h| 83 security/apparmor/module_interface.c | 643 +++ 3 files changed, 958 insertions(+) --- /dev/null +++ b/security/apparmor/match.c @@ -0,0 +1,232 @@ +/* + * Copyright (C) 2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * Regular expression transition table matching + */ + +#include linux/kernel.h +#include linux/slab.h +#include linux/errno.h +#include match.h + +static struct table_header *unpack_table(void *blob, size_t bsize) +{ + struct table_header *table = NULL; + struct table_header th; + size_t tsize; + + if (bsize sizeof(struct table_header)) + goto out; + + th.td_id = be16_to_cpu(*(u16 *) (blob)); + th.td_flags = be16_to_cpu(*(u16 *) (blob + 2)); + th.td_lolen = be32_to_cpu(*(u32 *) (blob + 8)); + blob += sizeof(struct table_header); + + if (!(th.td_flags == YYTD_DATA16 || th.td_flags == YYTD_DATA32 || + th.td_flags == YYTD_DATA8)) + goto out; + + tsize = table_size(th.td_lolen, th.td_flags); + if (bsize tsize) + goto out; + + table = kmalloc(tsize, GFP_KERNEL); + if (table) { + *table = th; + if (th.td_flags == YYTD_DATA8) + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u8, byte_to_byte); + else if (th.td_flags == YYTD_DATA16) + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u16, be16_to_cpu); + else + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u32, be32_to_cpu); + } + +out: + return table; +} + +int unpack_dfa(struct aa_dfa *dfa, void *blob, size_t size) +{ + int hsize, i; + int error = -ENOMEM; + + /* get dfa table set header */ + if (size sizeof(struct table_set_header)) + goto fail; + + if (ntohl(*(u32 *)blob) != YYTH_MAGIC) + goto fail; + + hsize = ntohl(*(u32 *)(blob + 4)); + if (size hsize) + goto fail; + + blob += hsize; + size -= hsize; + + error = -EPROTO; + while (size 0) { + struct table_header *table; + table = unpack_table(blob, size); + if (!table) + goto fail; + + switch(table-td_id) { + case YYTD_ID_ACCEPT: + case YYTD_ID_BASE: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA32) + goto fail; + break; + case YYTD_ID_DEF: + case YYTD_ID_NXT: + case YYTD_ID_CHK: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA16) + goto fail; + break; + case YYTD_ID_EC: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA8) + goto fail; + break; + default: + kfree(table); + goto fail; + } + + blob += table_size(table-td_lolen, table-td_flags); + size -= table_size(table-td_lolen, table-td_flags); + } + + return 0; + +fail: + for (i = 0; i ARRAY_SIZE(dfa-tables); i++) { + if (dfa-tables[i]) { + kfree(dfa-tables[i]); + dfa-tables[i] = NULL; + } + } + return error; +} + +/** + * verify_dfa - verify that all the transitions and states in the dfa tables + * are in bounds. + * @dfa: dfa to test + * + * assumes dfa has gone through the verification done by unpacking + */ +int verify_dfa(struct aa_dfa *dfa) +{ + size_t i, state_count, trans_count; + int error = -EPROTO; + + /* check that required tables exist */ + if (!(dfa-tables[YYTD_ID_ACCEPT -1 ] + dfa-tables[YYTD_ID_DEF - 1] + dfa-tables[YYTD_ID_BASE - 1] + dfa-tables[YYTD_ID_NXT - 1] + dfa-tables[YYTD_ID_CHK - 1])) + goto out; + + /* accept.size == default.size == base.size */ + state_count =
[AppArmor 29/45] Fix __d_path() for lazy unmounts and make it unambiguous
First, when __d_path() hits a lazily unmounted mount point, it tries to prepend the name of the lazily unmounted dentry to the path name. It gets this wrong, and also overwrites the slash that separates the name from the following pathname component. This patch fixes that; if a process was in directory /foo/bar and /foo got lazily unmounted, the old result was ``foobar'' (note the missing slash), while the new result with this patch is ``foo/bar''. Second, it isn't always possible to tell from the __d_path() result whether the specified root and rootmnt (i.e., the chroot) was reached. We need an unambiguous result for AppArmor at least though, so we make sure that paths will only start with a slash if the path leads all the way up to the root. We also add a @fail_deleted argument, which allows to get rid of some of the mess in sys_getcwd(). This patch leaves getcwd() and d_path() as they were before for everything except for bind-mounted directories; for them, it reports ``/foo/bar'' instead of ``foobar'' in the example described above. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Acked-by: Alan Cox [EMAIL PROTECTED] --- fs/dcache.c | 169 ++-- 1 file changed, 98 insertions(+), 71 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1761,52 +1761,51 @@ shouldnt_be_hashed: } /** - * d_path - return the path of a dentry + * __d_path - return the path of a dentry * @dentry: dentry to report * @vfsmnt: vfsmnt to which the dentry belongs * @root: root dentry * @rootmnt: vfsmnt to which the root dentry belongs * @buffer: buffer to return value in * @buflen: buffer length + * @fail_deleted: what to return for deleted files * - * Convert a dentry into an ASCII path name. If the entry has been deleted + * Convert a dentry into an ASCII path name. If the entry has been deleted, + * then if @fail_deleted is true, ERR_PTR(-ENOENT) is returned. Otherwise, * the string (deleted) is appended. Note that this is ambiguous. * - * Returns the buffer or an error code if the path was too long. + * If @dentry is not connected to @root, the path returned will be relative + * (i.e., it will not start with a slash). * - * buflen should be positive. Caller holds the dcache_lock. + * Returns the buffer or an error code. */ -static char * __d_path( struct dentry *dentry, struct vfsmount *vfsmnt, - struct dentry *root, struct vfsmount *rootmnt, - char *buffer, int buflen) -{ - char * end = buffer+buflen; - char * retval; - int namelen; +static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt, + struct dentry *root, struct vfsmount *rootmnt, + char *buffer, int buflen, int fail_deleted) +{ + int namelen, is_slash; + + if (buflen 2) + return ERR_PTR(-ENAMETOOLONG); + buffer += --buflen; + *buffer = '\0'; - *--end = '\0'; - buflen--; + spin_lock(dcache_lock); if (!IS_ROOT(dentry) d_unhashed(dentry)) { - buflen -= 10; - end -= 10; - if (buflen 0) + if (fail_deleted) { + buffer = ERR_PTR(-ENOENT); + goto out; + } + if (buflen 10) goto Elong; - memcpy(end, (deleted), 10); + buflen -= 10; + buffer -= 10; + memcpy(buffer, (deleted), 10); } - - if (buflen 1) - goto Elong; - /* Get '/' right */ - retval = end-1; - *retval = '/'; - - for (;;) { + while (dentry != root || vfsmnt != rootmnt) { struct dentry * parent; - if (dentry == root vfsmnt == rootmnt) - break; if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) { - /* Global root? */ spin_lock(vfsmount_lock); if (vfsmnt-mnt_parent == vfsmnt) { spin_unlock(vfsmount_lock); @@ -1820,33 +1819,72 @@ static char * __d_path( struct dentry *d parent = dentry-d_parent; prefetch(parent); namelen = dentry-d_name.len; - buflen -= namelen + 1; - if (buflen 0) + if (buflen namelen + 1) goto Elong; - end -= namelen; - memcpy(end, dentry-d_name.name, namelen); - *--end = '/'; - retval = end; + buflen -= namelen + 1; + buffer -= namelen; + memcpy(buffer, dentry-d_name.name, namelen); + *--buffer = '/'; dentry = parent; } + /* Get '/' right. */ + if (*buffer != '/') + *--buffer = '/'; - return retval; +out: +
[AppArmor 02/45] Pass struct path down to remove_suid and children
Required by a later patch that adds a struct vfsmount parameter to notify_change(). Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ntfs/file.c |2 +- fs/reiserfs/file.c |2 +- fs/splice.c|4 ++-- fs/xfs/linux-2.6/xfs_lrw.c |2 +- include/linux/fs.h |4 ++-- mm/filemap.c | 12 ++-- mm/filemap_xip.c |2 +- mm/shmem.c |2 +- 8 files changed, 15 insertions(+), 15 deletions(-) --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2121,7 +2121,7 @@ static ssize_t ntfs_file_aio_write_noloc goto out; if (!count) goto out; - err = remove_suid(file-f_path.dentry); + err = remove_suid(file-f_path); if (err) goto out; file_update_time(file); --- a/fs/reiserfs/file.c +++ b/fs/reiserfs/file.c @@ -1335,7 +1335,7 @@ static ssize_t reiserfs_file_write(struc if (count == 0) goto out; - res = remove_suid(file-f_path.dentry); + res = remove_suid(file-f_path); if (res) goto out; --- a/fs/splice.c +++ b/fs/splice.c @@ -796,7 +796,7 @@ generic_file_splice_write_nolock(struct ssize_t ret; int err; - err = remove_suid(out-f_path.dentry); + err = remove_suid(out-f_path); if (unlikely(err)) return err; @@ -845,7 +845,7 @@ generic_file_splice_write(struct pipe_in err = should_remove_suid(out-f_path.dentry); if (unlikely(err)) { mutex_lock(inode-i_mutex); - err = __remove_suid(out-f_path.dentry, err); + err = __remove_suid(out-f_path, err); mutex_unlock(inode-i_mutex); if (err) return err; --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -798,7 +798,7 @@ start: !capable(CAP_FSETID)) { error = xfs_write_clear_setuid(xip); if (likely(!error)) - error = -remove_suid(file-f_path.dentry); + error = -remove_suid(file-f_path); if (unlikely(error)) { goto out_unlock_internal; } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1697,9 +1697,9 @@ extern void __iget(struct inode * inode) extern void clear_inode(struct inode *); extern void destroy_inode(struct inode *); extern struct inode *new_inode(struct super_block *); -extern int __remove_suid(struct dentry *, int); +extern int __remove_suid(struct path *, int); extern int should_remove_suid(struct dentry *); -extern int remove_suid(struct dentry *); +extern int remove_suid(struct path *); extern void __insert_inode_hash(struct inode *, unsigned long hashval); extern void remove_inode_hash(struct inode *); --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1905,20 +1905,20 @@ int should_remove_suid(struct dentry *de } EXPORT_SYMBOL(should_remove_suid); -int __remove_suid(struct dentry *dentry, int kill) +int __remove_suid(struct path *path, int kill) { struct iattr newattrs; newattrs.ia_valid = ATTR_FORCE | kill; - return notify_change(dentry, newattrs); + return notify_change(path-dentry, newattrs); } -int remove_suid(struct dentry *dentry) +int remove_suid(struct path *path) { - int kill = should_remove_suid(dentry); + int kill = should_remove_suid(path-dentry); if (unlikely(kill)) - return __remove_suid(dentry, kill); + return __remove_suid(path, kill); return 0; } @@ -2269,7 +2269,7 @@ __generic_file_aio_write_nolock(struct k if (count == 0) goto out; - err = remove_suid(file-f_path.dentry); + err = remove_suid(file-f_path); if (err) goto out; --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -405,7 +405,7 @@ xip_file_write(struct file *filp, const if (count == 0) goto out_backing; - ret = remove_suid(filp-f_path.dentry); + ret = remove_suid(filp-f_path); if (ret) goto out_backing; --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1516,7 +1516,7 @@ shmem_file_write(struct file *file, cons if (err || !count) goto out; - err = remove_suid(file-f_path.dentry); + err = remove_suid(file-f_path); if (err) goto out; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFD Patch 2/4] Never pass a NULL nameidata to vfs_create()
Create a nameidata2 struct in nfsd and mqueue so that vfs_create does need to conditionally pass the vfsmnt. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/namei.c|2 +- fs/nfsd/vfs.c | 42 +- ipc/mqueue.c |7 ++- 3 files changed, 32 insertions(+), 19 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1532,7 +1532,7 @@ int vfs_create(struct inode *dir, struct return -EACCES; /* shouldn't it be ENOSYS? */ mode = S_IALLUGO; mode |= S_IFREG; - error = security_inode_create(dir, dentry, nd ? nd-mnt : NULL, mode); + error = security_inode_create(dir, dentry, nd-mnt, mode); if (error) return error; DQUOT_INIT(dir); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1121,7 +1121,8 @@ nfsd_create(struct svc_rqst *rqstp, stru char *fname, int flen, struct iattr *iap, int type, dev_t rdev, struct svc_fh *resfhp) { - struct dentry *dentry, *dchild = NULL; + struct nameidata2 nd; + struct dentry *dchild = NULL; struct svc_export *exp; struct inode*dirp; __be32 err; @@ -1138,9 +1139,11 @@ nfsd_create(struct svc_rqst *rqstp, stru if (err) goto out; - dentry = fhp-fh_dentry; + nd.dentry = fhp-fh_dentry; exp = fhp-fh_export; - dirp = dentry-d_inode; + nd.mnt = exp-ex_mnt; + nd.flags = 0; + dirp = nd.dentry-d_inode; err = nfserr_notdir; if(!dirp-i_op || !dirp-i_op-lookup) @@ -1152,7 +1155,7 @@ nfsd_create(struct svc_rqst *rqstp, stru if (!resfhp-fh_dentry) { /* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */ fh_lock_nested(fhp, I_MUTEX_PARENT); - dchild = lookup_one_len(fname, dentry, flen); + dchild = lookup_one_len(fname, nd.dentry, flen); host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) goto out_nfserr; @@ -1166,8 +1169,8 @@ nfsd_create(struct svc_rqst *rqstp, stru /* not actually possible */ printk(KERN_ERR nfsd_create: parent %s/%s not locked!\n, - dentry-d_parent-d_name.name, - dentry-d_name.name); + nd.dentry-d_parent-d_name.name, + nd.dentry-d_name.name); err = nfserr_io; goto out; } @@ -1178,7 +1181,7 @@ nfsd_create(struct svc_rqst *rqstp, stru err = nfserr_exist; if (dchild-d_inode) { dprintk(nfsd_create: dentry %s/%s not negative!\n, - dentry-d_name.name, dchild-d_name.name); + nd.dentry-d_name.name, dchild-d_name.name); goto out; } @@ -1192,7 +1195,7 @@ nfsd_create(struct svc_rqst *rqstp, stru err = 0; switch (type) { case S_IFREG: - host_err = vfs_create(dirp, dchild, iap-ia_mode, NULL); + host_err = vfs_create(nd.dentry-d_inode, dchild, iap-ia_mode, nd); break; case S_IFDIR: host_err = vfs_mkdir(dirp, dchild, exp-ex_mnt, iap-ia_mode); @@ -1212,7 +1215,7 @@ nfsd_create(struct svc_rqst *rqstp, stru goto out_nfserr; if (EX_ISSYNC(exp)) { - err = nfserrno(nfsd_sync_dir(dentry)); + err = nfserrno(nfsd_sync_dir(nd.dentry)); write_inode_now(dchild-d_inode, 1); } @@ -1252,7 +1255,9 @@ nfsd_create_v3(struct svc_rqst *rqstp, s struct svc_fh *resfhp, int createmode, u32 *verifier, int *truncp, int *created) { - struct dentry *dentry, *dchild = NULL; + struct nameidata2 nd; + struct dentry *dchild = NULL; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1270,8 +1275,11 @@ nfsd_create_v3(struct svc_rqst *rqstp, s if (err) goto out; - dentry = fhp-fh_dentry; - dirp = dentry-d_inode; + nd.dentry = fhp-fh_dentry; + exp = fhp-fh_export; + nd.mnt = exp-ex_mnt; + nd.flags = 0; + dirp = nd.dentry-d_inode; /* Get all the sanity checks out of the way before * we lock the parent. */ @@ -1283,12 +1291,12 @@ nfsd_create_v3(struct svc_rqst *rqstp, s /* * Compose the response file handle. */ - dchild = lookup_one_len(fname, dentry, flen); + dchild = lookup_one_len(fname, nd.dentry, flen); host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) goto out_nfserr; - err = fh_compose(resfhp, fhp-fh_export, dchild, fhp); + err = fh_compose(resfhp, exp, dchild,
[AppArmor 34/45] Factor out sysctl pathname code
Convert the selinux sysctl pathname computation code into a standalone function. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- include/linux/sysctl.h |2 ++ kernel/sysctl.c | 27 +++ security/selinux/hooks.c | 35 +-- 3 files changed, 34 insertions(+), 30 deletions(-) --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -963,6 +963,8 @@ extern int proc_doulongvec_minmax(ctl_ta extern int proc_doulongvec_ms_jiffies_minmax(ctl_table *table, int, struct file *, void __user *, size_t *, loff_t *); +extern char *sysctl_pathname(ctl_table *, char *, int); + extern int do_sysctl (int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen); --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1110,6 +1110,33 @@ struct ctl_table_header *sysctl_head_nex return NULL; } +char *sysctl_pathname(ctl_table *table, char *buffer, int buflen) +{ + if (buflen 1) + return NULL; + buffer += --buflen; + *buffer = '\0'; + + while (table) { + int namelen = strlen(table-procname); + + if (buflen namelen + 1) + return NULL; + buflen -= namelen + 1; + buffer -= namelen; + memcpy(buffer, table-procname, namelen); + *--buffer = '/'; + table = table-parent; + } + if (buflen 4) + return NULL; + buffer -= 4; + memcpy(buffer, /sys, 4); + + return buffer; +} +EXPORT_SYMBOL(sysctl_pathname); + #ifdef CONFIG_SYSCTL_SYSCALL int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1427,40 +1427,15 @@ static int selinux_capable(struct task_s static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid) { - int buflen, rc; - char *buffer, *path, *end; + char *buffer, *path; + int rc = -ENOMEM; - rc = -ENOMEM; buffer = (char*)__get_free_page(GFP_KERNEL); if (!buffer) goto out; - - buflen = PAGE_SIZE; - end = buffer+buflen; - *--end = '\0'; - buflen--; - path = end-1; - *path = '/'; - while (table) { - const char *name = table-procname; - size_t namelen = strlen(name); - buflen -= namelen + 1; - if (buflen 0) - goto out_free; - end -= namelen; - memcpy(end, name, namelen); - *--end = '/'; - path = end; - table = table-parent; - } - buflen -= 4; - if (buflen 0) - goto out_free; - end -= 4; - memcpy(end, /sys, 4); - path = end; - rc = security_genfs_sid(proc, path, tclass, sid); -out_free: + path = sysctl_pathname(table, buffer, PAGE_SIZE); + if (path) + rc = security_genfs_sid(proc, path, tclass, sid); free_page((unsigned long)buffer); out: return rc; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 23/45] Add a struct vfsmount parameter to vfs_getxattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/nfsd/nfs4xdr.c |2 +- fs/nfsd/vfs.c | 21 - fs/xattr.c| 14 -- include/linux/nfsd/nfsd.h |3 ++- include/linux/xattr.h |3 ++- 5 files changed, 25 insertions(+), 18 deletions(-) --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1469,7 +1469,7 @@ nfsd4_encode_fattr(struct svc_fh *fhp, s } if (bmval0 (FATTR4_WORD0_ACL | FATTR4_WORD0_ACLSUPPORT | FATTR4_WORD0_SUPPORTED_ATTRS)) { - err = nfsd4_get_nfs4_acl(rqstp, dentry, acl); + err = nfsd4_get_nfs4_acl(rqstp, dentry, exp-ex_mnt, acl); aclsupport = (err == 0); if (bmval0 FATTR4_WORD0_ACL) { if (err == -EOPNOTSUPP) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -378,11 +378,12 @@ out_nfserr: #if defined(CONFIG_NFSD_V2_ACL) || \ defined(CONFIG_NFSD_V3_ACL) || \ defined(CONFIG_NFSD_V4) -static ssize_t nfsd_getxattr(struct dentry *dentry, char *key, void **buf) +static ssize_t nfsd_getxattr(struct dentry *dentry, struct vfsmount *mnt, +char *key, void **buf) { ssize_t buflen; - buflen = vfs_getxattr(dentry, key, NULL, 0); + buflen = vfs_getxattr(dentry, mnt, key, NULL, 0); if (buflen = 0) return buflen; @@ -390,7 +391,7 @@ static ssize_t nfsd_getxattr(struct dent if (!*buf) return -ENOMEM; - return vfs_getxattr(dentry, key, *buf, buflen); + return vfs_getxattr(dentry, mnt, key, *buf, buflen); } #endif @@ -479,13 +480,13 @@ out_nfserr: } static struct posix_acl * -_get_posix_acl(struct dentry *dentry, char *key) +_get_posix_acl(struct dentry *dentry, struct vfsmount *mnt, char *key) { void *buf = NULL; struct posix_acl *pacl = NULL; int buflen; - buflen = nfsd_getxattr(dentry, key, buf); + buflen = nfsd_getxattr(dentry, mnt, key, buf); if (!buflen) buflen = -ENODATA; if (buflen = 0) @@ -497,14 +498,15 @@ _get_posix_acl(struct dentry *dentry, ch } int -nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct nfs4_acl **acl) +nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, + struct vfsmount *mnt, struct nfs4_acl **acl) { struct inode *inode = dentry-d_inode; int error = 0; struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0; - pacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_ACCESS); + pacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_ACCESS); if (IS_ERR(pacl) PTR_ERR(pacl) == -ENODATA) pacl = posix_acl_from_mode(inode-i_mode, GFP_KERNEL); if (IS_ERR(pacl)) { @@ -514,7 +516,7 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqst } if (S_ISDIR(inode-i_mode)) { - dpacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_DEFAULT); + dpacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_DEFAULT); if (IS_ERR(dpacl) PTR_ERR(dpacl) == -ENODATA) dpacl = NULL; else if (IS_ERR(dpacl)) { @@ -1942,7 +1944,8 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i return ERR_PTR(-EOPNOTSUPP); } - size = nfsd_getxattr(fhp-fh_dentry, name, value); + size = nfsd_getxattr(fhp-fh_dentry, fhp-fh_export-ex_mnt, name, +value); if (size 0) return ERR_PTR(size); --- a/fs/xattr.c +++ b/fs/xattr.c @@ -106,7 +106,8 @@ out: EXPORT_SYMBOL_GPL(vfs_setxattr); ssize_t -vfs_getxattr(struct dentry *dentry, char *name, void *value, size_t size) +vfs_getxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, +void *value, size_t size) { struct inode *inode = dentry-d_inode; int error; @@ -278,7 +279,8 @@ sys_fsetxattr(int fd, char __user *name, * Extended attribute GET operations */ static ssize_t -getxattr(struct dentry *d, char __user *name, void __user *value, size_t size) +getxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *name, +void __user *value, size_t size) { ssize_t error; void *kvalue = NULL; @@ -298,7 +300,7 @@ getxattr(struct dentry *d, char __user * return -ENOMEM; } - error = vfs_getxattr(d, kname, kvalue, size); + error = vfs_getxattr(dentry, mnt, kname, kvalue, size); if (error 0) { if (size copy_to_user(value, kvalue, error)) error = -EFAULT; @@ -321,7 +323,7 @@ sys_getxattr(char __user *path, char __u error = user_path_walk(path, nd); if
[AppArmor 28/45] Pass struct vfsmount to the inode_removexattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |2 +- include/linux/security.h | 15 +-- security/commoncap.c |3 ++- security/dummy.c |3 ++- security/selinux/hooks.c |3 ++- 5 files changed, 16 insertions(+), 10 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -177,7 +177,7 @@ vfs_removexattr(struct dentry *dentry, s if (error) return error; - error = security_inode_removexattr(dentry, name); + error = security_inode_removexattr(dentry, mnt, name); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -50,7 +50,7 @@ extern int cap_bprm_set_security (struct extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe); extern int cap_bprm_secureexec(struct linux_binprm *bprm); extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, void *value, size_t size, int flags); -extern int cap_inode_removexattr(struct dentry *dentry, char *name); +extern int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name); extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); extern void cap_task_reparent_to_init (struct task_struct *p); extern int cap_syslog (int type); @@ -1251,7 +1251,8 @@ struct security_operations { int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt, char *name); int (*inode_listxattr) (struct dentry *dentry, struct vfsmount *mnt); - int (*inode_removexattr) (struct dentry *dentry, char *name); + int (*inode_removexattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name); const char *(*inode_xattr_getsuffix) (void); int (*inode_getsecurity)(const struct inode *inode, const char *name, void *buffer, size_t size, int err); int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags); @@ -1799,11 +1800,12 @@ static inline int security_inode_listxat return security_ops-inode_listxattr (dentry, mnt); } -static inline int security_inode_removexattr (struct dentry *dentry, char *name) +static inline int security_inode_removexattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { if (unlikely (IS_PRIVATE (dentry-d_inode))) return 0; - return security_ops-inode_removexattr (dentry, name); + return security_ops-inode_removexattr (dentry, mnt, name); } static inline const char *security_inode_xattr_getsuffix(void) @@ -2502,9 +2504,10 @@ static inline int security_inode_listxat return 0; } -static inline int security_inode_removexattr (struct dentry *dentry, char *name) +static inline int security_inode_removexattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { - return cap_inode_removexattr(dentry, name); + return cap_inode_removexattr(dentry, mnt, name); } static inline const char *security_inode_xattr_getsuffix (void) --- a/security/commoncap.c +++ b/security/commoncap.c @@ -200,7 +200,8 @@ int cap_inode_setxattr(struct dentry *de return 0; } -int cap_inode_removexattr(struct dentry *dentry, char *name) +int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (!strncmp(name, XATTR_SECURITY_PREFIX, sizeof(XATTR_SECURITY_PREFIX) - 1) --- a/security/dummy.c +++ b/security/dummy.c @@ -379,7 +379,8 @@ static int dummy_inode_listxattr (struct return 0; } -static int dummy_inode_removexattr (struct dentry *dentry, char *name) +static int dummy_inode_removexattr (struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (!strncmp(name, XATTR_SECURITY_PREFIX, sizeof(XATTR_SECURITY_PREFIX) - 1) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2404,7 +2404,8 @@ static int selinux_inode_listxattr (stru return dentry_has_perm(current, NULL, dentry, FILE__GETATTR); } -static int selinux_inode_removexattr (struct dentry *dentry, char *name) +static int selinux_inode_removexattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { if (strcmp(name, XATTR_NAME_SELINUX)) { if (!strncmp(name, XATTR_SECURITY_PREFIX, -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 25/45] Add a struct vfsmount parameter to vfs_listxattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c| 25 ++--- include/linux/xattr.h |3 ++- 2 files changed, 16 insertions(+), 12 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -143,18 +143,20 @@ vfs_getxattr(struct dentry *dentry, stru EXPORT_SYMBOL_GPL(vfs_getxattr); ssize_t -vfs_listxattr(struct dentry *d, char *list, size_t size) +vfs_listxattr(struct dentry *dentry, struct vfsmount *mnt, char *list, + size_t size) { + struct inode *inode = dentry-d_inode; ssize_t error; - error = security_inode_listxattr(d); + error = security_inode_listxattr(dentry); if (error) return error; error = -EOPNOTSUPP; - if (d-d_inode-i_op d-d_inode-i_op-listxattr) { - error = d-d_inode-i_op-listxattr(d, list, size); - } else { - error = security_inode_listsecurity(d-d_inode, list, size); + if (inode-i_op inode-i_op-listxattr) + error = inode-i_op-listxattr(dentry, list, size); + else { + error = security_inode_listsecurity(inode, list, size); if (size error size) error = -ERANGE; } @@ -362,7 +364,8 @@ sys_fgetxattr(int fd, char __user *name, * Extended attribute LIST operations */ static ssize_t -listxattr(struct dentry *d, char __user *list, size_t size) +listxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *list, + size_t size) { ssize_t error; char *klist = NULL; @@ -375,7 +378,7 @@ listxattr(struct dentry *d, char __user return -ENOMEM; } - error = vfs_listxattr(d, klist, size); + error = vfs_listxattr(dentry, mnt, klist, size); if (error 0) { if (size copy_to_user(list, klist, error)) error = -EFAULT; @@ -397,7 +400,7 @@ sys_listxattr(char __user *path, char __ error = user_path_walk(path, nd); if (error) return error; - error = listxattr(nd.dentry, list, size); + error = listxattr(nd.dentry, nd.mnt, list, size); path_release(nd); return error; } @@ -411,7 +414,7 @@ sys_llistxattr(char __user *path, char _ error = user_path_walk_link(path, nd); if (error) return error; - error = listxattr(nd.dentry, list, size); + error = listxattr(nd.dentry, nd.mnt, list, size); path_release(nd); return error; } @@ -426,7 +429,7 @@ sys_flistxattr(int fd, char __user *list if (!f) return error; audit_inode(NULL, f-f_path.dentry-d_inode); - error = listxattr(f-f_path.dentry, list, size); + error = listxattr(f-f_path.dentry, f-f_path.mnt, list, size); fput(f); return error; } --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -48,7 +48,8 @@ struct xattr_handler { ssize_t vfs_getxattr(struct dentry *, struct vfsmount *, char *, void *, size_t); -ssize_t vfs_listxattr(struct dentry *d, char *list, size_t size); +ssize_t vfs_listxattr(struct dentry *d, struct vfsmount *, char *list, + size_t size); int vfs_setxattr(struct dentry *, struct vfsmount *, char *, void *, size_t, int); int vfs_removexattr(struct dentry *, char *); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 38/45] AppArmor: Module and LSM hooks
Module parameters, LSM hooks, initialization and teardown. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Index: b/security/apparmor/lsm.c === --- /dev/null +++ b/security/apparmor/lsm.c @@ -0,0 +1,790 @@ +/* + * Copyright (C) 1998-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor LSM interface + */ + +#include linux/security.h +#include linux/module.h +#include linux/mm.h +#include linux/mman.h +#include linux/mount.h +#include linux/namei.h +#include linux/ctype.h +#include linux/sysctl.h + +#include apparmor.h +#include inline.h + +static int param_set_aabool(const char *val, struct kernel_param *kp); +static int param_get_aabool(char *buffer, struct kernel_param *kp); +#define param_check_aabool(name, p) __param_check(name, p, int) + +static int param_set_aauint(const char *val, struct kernel_param *kp); +static int param_get_aauint(char *buffer, struct kernel_param *kp); +#define param_check_aauint(name, p) __param_check(name, p, int) + +/* Flag values, also controllable via /sys/module/apparmor/parameters + * We define special types as we want to do additional mediation. + * + * Complain mode -- in complain mode access failures result in auditing only + * and task is allowed access. audit events are processed by userspace to + * generate policy. Default is 'enforce' (0). + * Value is also togglable per profile and referenced when global value is + * enforce. + */ +int apparmor_complain = 0; +module_param_named(complain, apparmor_complain, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_complain, Toggle AppArmor complain mode); + +/* Debug mode */ +int apparmor_debug = 0; +module_param_named(debug, apparmor_debug, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_debug, Toggle AppArmor debug mode); + +/* Audit mode */ +int apparmor_audit = 0; +module_param_named(audit, apparmor_audit, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_audit, Toggle AppArmor audit mode); + +/* Syscall logging mode */ +int apparmor_logsyscall = 0; +module_param_named(logsyscall, apparmor_logsyscall, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_logsyscall, Toggle AppArmor logsyscall mode); + +/* Maximum pathname length before accesses will start getting rejected */ +unsigned int apparmor_path_max = 2 * PATH_MAX; +module_param_named(path_max, apparmor_path_max, aauint, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_path_max, Maximum pathname length allowed); + +static int param_set_aabool(const char *val, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_set_bool(val, kp); +} + +static int param_get_aabool(char *buffer, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_get_bool(buffer, kp); +} + +static int param_set_aauint(const char *val, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_set_uint(val, kp); +} + +static int param_get_aauint(char *buffer, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_get_uint(buffer, kp); +} + +static int aa_reject_syscall(struct task_struct *task, gfp_t flags, +const char *name) +{ + struct aa_profile *profile = aa_get_profile(task); + int error = 0; + + if (profile) { + error = aa_audit_syscallreject(profile, flags, name); + aa_put_profile(profile); + } + + return error; +} + +static int apparmor_ptrace(struct task_struct *parent, + struct task_struct *child) +{ + struct aa_task_context *cxt; + struct aa_task_context *child_cxt; + struct aa_profile *child_profile; + int error = 0; + + /* +* parent can ptrace child when +* - parent is unconfined +* - parent child are in the same namespace +* - parent is in complain mode +* - parent and child are confined by the same profile +* - parent profile has CAP_SYS_PTRACE +*/ + + rcu_read_lock(); + cxt = aa_task_context(parent); + child_cxt = aa_task_context(child); + child_profile = child_cxt ? child_cxt-profile : NULL; + if (cxt (parent-nsproxy != child-nsproxy)) { + aa_audit_message(NULL, GFP_ATOMIC, REJECTING ptrace across +namespace of %d by %d, +parent-pid, child-pid); + error = -EPERM; + } else { + error = aa_may_ptrace(cxt, child_profile); + if (cxt
[AppArmor 40/45] AppArmor: all the rest
All the things that didn't nicely fit in a category on their own: kbuild code, declararions and inline functions, /sys/kernel/security/apparmor filesystem for controlling apparmor from user space, profile list functions, locking documentation, /proc/$pid/task/$tid/attr/current access. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/Kconfig |9 + security/apparmor/Makefile | 13 ++ security/apparmor/apparmor.h | 259 + security/apparmor/apparmorfs.c | 250 +++ security/apparmor/inline.h | 219 ++ security/apparmor/list.c | 94 ++ security/apparmor/locking.txt | 59 + security/apparmor/procattr.c | 138 + 8 files changed, 1041 insertions(+) --- /dev/null +++ b/security/apparmor/Kconfig @@ -0,0 +1,9 @@ +config SECURITY_APPARMOR + tristate AppArmor support + depends on SECURITY!=n + help + This enables the AppArmor security module. + Required userspace tools (if they are not included in your + distribution) and further information may be found at + http://forge.novell.com/modules/xfmod/project/?apparmor + If you are unsure how to answer this question, answer N. --- /dev/null +++ b/security/apparmor/Makefile @@ -0,0 +1,13 @@ +# Makefile for AppArmor Linux Security Module +# +obj-$(CONFIG_SECURITY_APPARMOR) += apparmor.o + +apparmor-y := main.o list.o procattr.o lsm.o apparmorfs.o \ + module_interface.o match.o + +quiet_cmd_make-caps = GEN $@ +cmd_make-caps = sed -n -e /CAP_FS_MASK/d -e s/^\#define[ \\t]\\+CAP_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\$$/[\\2] = \\\1\,/p $ | tr A-Z a-z $@ + +$(obj)/main.o : $(obj)/capability_names.h +$(obj)/capability_names.h : $(srctree)/include/linux/capability.h + $(call cmd,make-caps) --- /dev/null +++ b/security/apparmor/apparmor.h @@ -0,0 +1,259 @@ +/* + * Copyright (C) 1998-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor internal prototypes + */ + +#ifndef __APPARMOR_H +#define __APPARMOR_H + +#include linux/sched.h +#include linux/fs.h +#include linux/binfmts.h +#include linux/rcupdate.h + +/* + * We use MAY_READ, MAY_WRITE, MAY_EXEC, and the following flags for + * profile permissions (we don't use MAY_APPEND): + */ +#define AA_MAY_LINK0x0010 +#define AA_EXEC_INHERIT0x0020 +#define AA_EXEC_UNCONFINED 0x0040 +#define AA_EXEC_PROFILE0x0080 +#define AA_EXEC_MMAP 0x0100 +#define AA_EXEC_UNSAFE 0x0200 + +#define AA_EXEC_MODIFIERS (AA_EXEC_INHERIT | \ +AA_EXEC_UNCONFINED | \ +AA_EXEC_PROFILE) + +#define AA_SECURE_EXEC_NEEDED 1 + +/* Control parameters (0 or 1), settable thru module/boot flags or + * via /sys/kernel/security/apparmor/control */ +extern int apparmor_complain; +extern int apparmor_debug; +extern int apparmor_audit; +extern int apparmor_logsyscall; +extern unsigned int apparmor_path_max; + +#define PROFILE_COMPLAIN(_profile) \ + (apparmor_complain == 1 || ((_profile) (_profile)-flags.complain)) + +#define APPARMOR_COMPLAIN(_cxt) \ + (apparmor_complain == 1 || \ +((_cxt) (_cxt)-profile (_cxt)-profile-flags.complain)) + +#define PROFILE_AUDIT(_profile) \ + (apparmor_audit == 1 || ((_profile) (_profile)-flags.audit)) + +#define APPARMOR_AUDIT(_cxt) \ + (apparmor_audit == 1 || \ +((_cxt) (_cxt)-profile (_cxt)-profile-flags.audit)) + +/* + * DEBUG remains global (no per profile flag) since it is mostly used in sysctl + * which is not related to profile accesses. + */ + +#define AA_DEBUG(fmt, args...) \ + do {\ + if (apparmor_debug) \ + printk(KERN_DEBUG AppArmor: fmt, ##args);\ + } while (0) + +#define AA_ERROR(fmt, args...) printk(KERN_ERR AppArmor: fmt, ##args) + +/* struct aa_profile - basic confinement data + * @parent: non refcounted pointer to parent profile + * @name: the profiles name + * @file_rules: dfa containing the profiles file rules + * @list: list this profile is on + * @sub: profiles list of subprofiles (HATS) + * @flags: flags controlling profile behavior + * @null_profile: if needed per profile learning and null confinement profile + * @isstale: flag indicating if profile is stale + * @capabilities: capabilities granted by the process + *
[AppArmor 03/45] Add a vfsmount parameter to notify_change()
The vfsmount parameter must be set appropriately for files visibile outside the kernel. Files that are only used in a filesystem (e.g., reiserfs xattr files) will have a NULL vfsmount. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/attr.c |3 ++- fs/ecryptfs/inode.c |4 +++- fs/exec.c |3 ++- fs/fat/file.c |2 +- fs/hpfs/namei.c |2 +- fs/namei.c |3 ++- fs/nfsd/vfs.c |8 fs/open.c | 28 +++- fs/reiserfs/xattr.c |6 +++--- fs/sysfs/file.c |2 +- fs/utimes.c | 11 ++- include/linux/fs.h |6 +++--- mm/filemap.c|2 +- mm/tiny-shmem.c |2 +- 14 files changed, 45 insertions(+), 37 deletions(-) --- a/fs/attr.c +++ b/fs/attr.c @@ -100,7 +100,8 @@ int inode_setattr(struct inode * inode, } EXPORT_SYMBOL(inode_setattr); -int notify_change(struct dentry * dentry, struct iattr * attr) +int notify_change(struct dentry *dentry, struct vfsmount *mnt, + struct iattr *attr) { struct inode *inode = dentry-d_inode; mode_t mode; --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -870,12 +870,14 @@ static int ecryptfs_setattr(struct dentr { int rc = 0; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct inode *inode; struct inode *lower_inode; struct ecryptfs_crypt_stat *crypt_stat; crypt_stat = ecryptfs_inode_to_private(dentry-d_inode)-crypt_stat; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); inode = dentry-d_inode; lower_inode = ecryptfs_inode_to_lower(inode); if (ia-ia_valid ATTR_SIZE) { @@ -890,7 +892,7 @@ static int ecryptfs_setattr(struct dentr if (rc 0) goto out; } - rc = notify_change(lower_dentry, ia); + rc = notify_change(lower_dentry, lower_mnt, ia); out: fsstack_copy_attr_all(inode, lower_inode, NULL); return rc; --- a/fs/exec.c +++ b/fs/exec.c @@ -1564,7 +1564,8 @@ int do_coredump(long signr, int exit_cod goto close_fail; if (!file-f_op-write) goto close_fail; - if (!ispipe do_truncate(file-f_path.dentry, 0, 0, file) != 0) + if (!ispipe + do_truncate(file-f_path.dentry, file-f_path.mnt, 0, 0, file) != 0) goto close_fail; retval = binfmt-core_dump(signr, regs, file); --- a/fs/fat/file.c +++ b/fs/fat/file.c @@ -92,7 +92,7 @@ int fat_generic_ioctl(struct inode *inod } /* This MUST be done before doing anything irreversible... */ - err = notify_change(filp-f_path.dentry, ia); + err = notify_change(filp-f_path.dentry, filp-f_path.mnt, ia); if (err) goto up; --- a/fs/hpfs/namei.c +++ b/fs/hpfs/namei.c @@ -426,7 +426,7 @@ again: /*printk(HPFS: truncating file before delete.\n);*/ newattrs.ia_size = 0; newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME; - err = notify_change(dentry, newattrs); + err = notify_change(dentry, NULL, newattrs); put_write_access(inode); if (!err) goto again; --- a/fs/namei.c +++ b/fs/namei.c @@ -1598,7 +1598,8 @@ int may_open(struct nameidata *nd, int a if (!error) { DQUOT_INIT(inode); - error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME, NULL); + error = do_truncate(dentry, nd-mnt, 0, + ATTR_MTIME|ATTR_CTIME, NULL); } put_write_access(inode); if (error) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -358,7 +358,7 @@ nfsd_setattr(struct svc_rqst *rqstp, str err = nfserr_notsync; if (!check_guard || guardtime == inode-i_ctime.tv_sec) { fh_lock(fhp); - host_err = notify_change(dentry, iap); + host_err = notify_change(dentry, fhp-fh_export-ex_mnt, iap); err = nfserrno(host_err); fh_unlock(fhp); } @@ -893,13 +893,13 @@ out: return err; } -static void kill_suid(struct dentry *dentry) +static void kill_suid(struct dentry *dentry, struct vfsmount *mnt) { struct iattria; ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID; mutex_lock(dentry-d_inode-i_mutex); - notify_change(dentry, ia); + notify_change(dentry, mnt, ia); mutex_unlock(dentry-d_inode-i_mutex); } @@ -958,7 +958,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
[AppArmor 44/45] Switch to vfs_permission() in sys_fchdir()
Switch from file_permission() to vfs_permission() in sys_fchdir(): this avoids calling permission() with a NULL nameidata here. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/open.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) --- a/fs/open.c +++ b/fs/open.c @@ -440,10 +440,8 @@ out: asmlinkage long sys_fchdir(unsigned int fd) { + struct nameidata nd; struct file *file; - struct dentry *dentry; - struct inode *inode; - struct vfsmount *mnt; int error; error = -EBADF; @@ -451,17 +449,17 @@ asmlinkage long sys_fchdir(unsigned int if (!file) goto out; - dentry = file-f_path.dentry; - mnt = file-f_path.mnt; - inode = dentry-d_inode; + nd.dentry = file-f_path.dentry; + nd.mnt = file-f_path.mnt; + nd.flags = 0; error = -ENOTDIR; - if (!S_ISDIR(inode-i_mode)) + if (!S_ISDIR(nd.dentry-d_inode-i_mode)) goto out_putf; - error = file_permission(file, MAY_EXEC); + error = vfs_permission(nd, MAY_EXEC); if (!error) - set_fs_pwd(current-fs, mnt, dentry); + set_fs_pwd(current-fs, nd.mnt, nd.dentry); out_putf: fput(file); out: -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 37/45] AppArmor: Main Part
The underlying functions by which the AppArmor LSM hooks are implemented. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Index: b/security/apparmor/main.c === --- /dev/null +++ b/security/apparmor/main.c @@ -0,0 +1,1399 @@ +/* + * Copyright (C) 2002-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor Core + */ + +#include linux/security.h +#include linux/namei.h +#include linux/audit.h +#include linux/mount.h +#include linux/ptrace.h + +#include apparmor.h + +#include inline.h + +/* + * Table of capability names: we generate it from capabilities.h. + */ +static const char *capability_names[] = { +#include capability_names.h +}; + +/* NULL complain profile + * + * Used when in complain mode, to emit Permitting messages for non-existant + * profiles and hats. This is necessary because of selective mode, in which + * case we need a complain null_profile and enforce null_profile + * + * The null_complain_profile cannot be statically allocated, because it + * can be associated to files which keep their reference even if apparmor is + * unloaded + */ +struct aa_profile *null_complain_profile; + +static inline void aa_permerror2result(int perm_result, struct aa_audit *sa) +{ + if (perm_result == 0) { /* success */ + sa-result = 1; + sa-error_code = 0; + } else { /* -ve internal error code or +ve mask of denied perms */ + sa-result = 0; + sa-error_code = perm_result; + } +} + +/** + * aa_file_denied - check for @mask access on a file + * @profile: profile to check against + * @name: pathname of file + * @mask: permission mask requested for file + * + * Return %0 on success, or else the permissions in @mask that the + * profile denies. + */ +static int aa_file_denied(struct aa_profile *profile, const char *name, + int mask) +{ + return (mask ~aa_match(profile-file_rules, name)); +} + +/** + * aa_link_denied - check for permission to link a file + * @profile: profile to check against + * @link: pathname of link being created + * @target: pathname of target to be linked to + * + * Return %0 on success, or else the permissions that the profile denies. + */ +static int aa_link_denied(struct aa_profile *profile, const char *link, + const char *target) +{ + int l_mode, t_mode; + + l_mode = aa_match(profile-file_rules, link); + t_mode = aa_match(profile-file_rules, target); + + /* Link always requires 'l' on the link, a subset of the +* target's 'r', 'w', 'x', and 'm' permissions on the link, and +* if the link has 'x', an exact match of all the execute flags +* ('i', 'u', 'U', 'p', 'P'). +*/ +#define RWXM (MAY_READ | MAY_WRITE | MAY_EXEC | AA_EXEC_MMAP) + if ((l_mode AA_MAY_LINK) + (l_mode RWXM) !(l_mode ~t_mode RWXM) + (!(l_mode MAY_EXEC) || +((l_mode AA_EXEC_MODIFIERS) == (t_mode AA_EXEC_MODIFIERS) + (l_mode AA_EXEC_UNSAFE) == (t_mode AA_EXEC_UNSAFE + return 0; +#undef RWXM + /* FIXME: There currenly is no way to report which permissions +* we expect in t_mode, so linking could fail even after learning +* the required l_mode. +*/ + return AA_MAY_LINK; +} + +/** + * mangle -- escape special characters in str + * @str: string to escape + * @buffer: buffer containing str + * + * Escape special characters in @str, which is contained in @buffer. @str must + * be aligned to the end of the buffer, and the space between @buffer and @str + * may be used for escaping. + * + * Returns @str if no escaping was necessary, a pointer to the beginning of the + * escaped string, or NULL if there was not enough space in @buffer. When + * called with a NULL buffer, the return value tells whether any escaping is + * necessary. + */ +static const char *mangle(const char *str, char *buffer) +{ + static const char c_escape[] = { + ['\a'] = 'a', ['\b'] = 'b', + ['\f'] = 'f', ['\n'] = 'n', + ['\r'] = 'r', ['\t'] = 't', + ['\v'] = 'v', + [' '] = ' ',['\\'] = '\\', + }; + const char *s; + char *t, c; + +#define mangle_escape(c) \ + unlikely((unsigned char)(c) ARRAY_SIZE(c_escape)\ +c_escape[(unsigned char)c]) + + for (s = (char *)str; (c = *s) != '\0'; s++) + if (mangle_escape(c)) + goto escape; + return str; + +escape: + if (!buffer) + return NULL; + for (s =
[AppArmor 36/45] Export audit subsystem for use by modules
Adds necessary export symbols for audit subsystem routines. Changes audit_log_vformat to be externally visible (analagous to vprintf) Patch is not in mainline -- pending AppArmor code submission to lkml Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- include/linux/audit.h |5 + kernel/audit.c|6 -- 2 files changed, 9 insertions(+), 2 deletions(-) --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -114,6 +114,8 @@ #define AUDIT_ANOM_PROMISCUOUS 1700 /* Device changed promiscuous mode */ #define AUDIT_ANOM_ABEND1701 /* Process ended abnormally */ +#define AUDIT_APPARMOR 1500/* AppArmor audit */ + #define AUDIT_KERNEL 2000/* Asynchronous audit record. NOT A REQUEST. */ /* Rule flags */ @@ -499,6 +501,9 @@ extern void audit_log(struct audit_ __attribute__((format(printf,4,5))); extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type); +extern voidaudit_log_vformat(struct audit_buffer *ab, + const char *fmt, va_list args) + __attribute__((format(printf,2,0))); extern voidaudit_log_format(struct audit_buffer *ab, const char *fmt, ...) __attribute__((format(printf,2,3))); --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1054,8 +1054,7 @@ static inline int audit_expand(struct au * will be called a second time. Currently, we assume that a printk * can't format message larger than 1024 bytes, so we don't either. */ -static void audit_log_vformat(struct audit_buffer *ab, const char *fmt, - va_list args) +void audit_log_vformat(struct audit_buffer *ab, const char *fmt, va_list args) { int len, avail; struct sk_buff *skb; @@ -1311,3 +1310,6 @@ EXPORT_SYMBOL(audit_log_start); EXPORT_SYMBOL(audit_log_end); EXPORT_SYMBOL(audit_log_format); EXPORT_SYMBOL(audit_log); +EXPORT_SYMBOL_GPL(audit_log_vformat); +EXPORT_SYMBOL_GPL(audit_log_untrustedstring); +EXPORT_SYMBOL_GPL(audit_log_d_path); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 17/45] Add a struct vfsmount parameter to vfs_unlink()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |3 ++- fs/namei.c|4 ++-- fs/nfsd/nfs4recover.c |2 +- fs/nfsd/vfs.c |2 +- include/linux/fs.h|2 +- ipc/mqueue.c |2 +- 6 files changed, 8 insertions(+), 7 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -453,10 +453,11 @@ static int ecryptfs_unlink(struct inode { int rc = 0; struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry); + struct vfsmount *lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir); lock_parent(lower_dentry); - rc = vfs_unlink(lower_dir_inode, lower_dentry); + rc = vfs_unlink(lower_dir_inode, lower_dentry, lower_mnt); if (rc) { printk(KERN_ERR Error in vfs_unlink; rc = [%d]\n, rc); goto out_unlock; --- a/fs/namei.c +++ b/fs/namei.c @@ -2105,7 +2105,7 @@ asmlinkage long sys_rmdir(const char __u return do_rmdir(AT_FDCWD, pathname); } -int vfs_unlink(struct inode *dir, struct dentry *dentry) +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt) { int error = may_delete(dir, dentry, 0); @@ -2169,7 +2169,7 @@ static long do_unlinkat(int dfd, const c inode = dentry-d_inode; if (inode) atomic_inc(inode-i_count); - error = vfs_unlink(nd.dentry-d_inode, dentry); + error = vfs_unlink(nd.dentry-d_inode, dentry, nd.mnt); exit2: dput(dentry); } --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -261,7 +261,7 @@ nfsd4_remove_clid_file(struct dentry *di return -EINVAL; } mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT); - status = vfs_unlink(dir-d_inode, dentry); + status = vfs_unlink(dir-d_inode, dentry, rec_dir.mnt); mutex_unlock(dir-d_inode-i_mutex); return status; } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1704,7 +1704,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru host_err = -EPERM; } else #endif - host_err = vfs_unlink(dirp, rdentry); + host_err = vfs_unlink(dirp, rdentry, exp-ex_mnt); } else { /* It's RMDIR */ host_err = vfs_rmdir(dirp, rdentry, exp-ex_mnt); } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -997,7 +997,7 @@ extern int vfs_mknod(struct inode *, str extern int vfs_symlink(struct inode *, struct dentry *, struct vfsmount *, const char *, int); extern int vfs_link(struct dentry *, struct vfsmount *, struct inode *, struct dentry *, struct vfsmount *); extern int vfs_rmdir(struct inode *, struct dentry *, struct vfsmount *); -extern int vfs_unlink(struct inode *, struct dentry *); +extern int vfs_unlink(struct inode *, struct dentry *, struct vfsmount *); extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); /* --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -749,7 +749,7 @@ asmlinkage long sys_mq_unlink(const char if (inode) atomic_inc(inode-i_count); - err = vfs_unlink(dentry-d_parent-d_inode, dentry); + err = vfs_unlink(dentry-d_parent-d_inode, dentry, mqueue_mnt); out_err: dput(dentry); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 13/45] Pass the struct vfsmounts to the inode_link LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |3 ++- include/linux/security.h | 18 +- security/dummy.c |6 -- security/selinux/hooks.c |9 +++-- 4 files changed, 26 insertions(+), 10 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2293,7 +2293,8 @@ int vfs_link(struct dentry *old_dentry, if (S_ISDIR(old_dentry-d_inode-i_mode)) return -EPERM; - error = security_inode_link(old_dentry, dir, new_dentry); + error = security_inode_link(old_dentry, old_mnt, dir, new_dentry, + new_mnt); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -289,8 +289,10 @@ struct request_sock; * @inode_link: * Check permission before creating a new hard link to a file. * @old_dentry contains the dentry structure for an existing link to the file. + * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL). * @dir contains the inode structure of the parent directory of the new link. * @new_dentry contains the dentry structure for the new link. + * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL). * Return 0 if permission is granted. * @inode_unlink: * Check the permission to remove a hard link to a file. @@ -1212,8 +1214,9 @@ struct security_operations { char **name, void **value, size_t *len); int (*inode_create) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); - int (*inode_link) (struct dentry *old_dentry, - struct inode *dir, struct dentry *new_dentry); + int (*inode_link) (struct dentry *old_dentry, struct vfsmount *old_mnt, + struct inode *dir, struct dentry *new_dentry, + struct vfsmount *new_mnt); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); int (*inode_symlink) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, const char *old_name); @@ -1628,12 +1631,15 @@ static inline int security_inode_create } static inline int security_inode_link (struct dentry *old_dentry, + struct vfsmount *old_mnt, struct inode *dir, - struct dentry *new_dentry) + struct dentry *new_dentry, + struct vfsmount *new_mnt) { if (unlikely (IS_PRIVATE (old_dentry-d_inode))) return 0; - return security_ops-inode_link (old_dentry, dir, new_dentry); + return security_ops-inode_link (old_dentry, old_mnt, dir, +new_dentry, new_mnt); } static inline int security_inode_unlink (struct inode *dir, @@ -2359,8 +2365,10 @@ static inline int security_inode_create } static inline int security_inode_link (struct dentry *old_dentry, + struct vfsmount *old_mnt, struct inode *dir, - struct dentry *new_dentry) + struct dentry *new_dentry, + struct vfsmount *new_mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -270,8 +270,10 @@ static int dummy_inode_create (struct in return 0; } -static int dummy_inode_link (struct dentry *old_dentry, struct inode *inode, -struct dentry *new_dentry) +static int dummy_inode_link (struct dentry *old_dentry, +struct vfsmount *old_mnt, struct inode *inode, +struct dentry *new_dentry, +struct vfsmount *new_mnt) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2182,11 +2182,16 @@ static int selinux_inode_create(struct i return may_create(dir, dentry, SECCLASS_FILE); } -static int selinux_inode_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) +static int selinux_inode_link(struct dentry *old_dentry, + struct vfsmount *old_mnt, + struct inode *dir, + struct dentry *new_dentry, + struct vfsmount *new_mnt) { int rc; - rc = secondary_ops-inode_link(old_dentry,dir,new_dentry); + rc = secondary_ops-inode_link(old_dentry, old_mnt, dir, new_dentry, + new_mnt); if (rc)
[AppArmor 43/45] Switch to vfs_permission() in do_path_lookup()
Switch from file_permission() to vfs_permission() in do_path_lookup(): this avoids calling permission() with a NULL nameidata here. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/namei.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1130,25 +1130,24 @@ static int fastcall do_path_lookup(int d nd-dentry = dget(fs-pwd); read_unlock(fs-lock); } else { - struct dentry *dentry; - file = fget_light(dfd, fput_needed); retval = -EBADF; if (!file) goto out_fail; - dentry = file-f_path.dentry; + nd-dentry = file-f_path.dentry; + nd-mnt = file-f_path.mnt; retval = -ENOTDIR; - if (!S_ISDIR(dentry-d_inode-i_mode)) + if (!S_ISDIR(nd-dentry-d_inode-i_mode)) goto fput_fail; - retval = file_permission(file, MAY_EXEC); + retval = vfs_permission(nd, MAY_EXEC); if (retval) goto fput_fail; - nd-mnt = mntget(file-f_path.mnt); - nd-dentry = dget(dentry); + mntget(nd-mnt); + dget(nd-dentry); fput_light(file, fput_needed); } -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AppArmor 08/45] Pass struct vfsmount to the inode_mknod LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |7 +-- security/dummy.c |2 +- security/selinux/hooks.c |5 +++-- 4 files changed, 10 insertions(+), 6 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1869,7 +1869,7 @@ int vfs_mknod(struct inode *dir, struct if (!dir-i_op || !dir-i_op-mknod) return -EPERM; - error = security_inode_mknod(dir, dentry, mode, dev); + error = security_inode_mknod(dir, dentry, mnt, mode, dev); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -323,6 +323,7 @@ struct request_sock; * and not this hook. * @dir contains the inode structure of parent of the new file. * @dentry contains the dentry structure of the new file. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the mode of the new file. * @dev contains the device number. * Return 0 if permission is granted. @@ -1218,7 +1219,7 @@ struct security_operations { struct vfsmount *mnt, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, - int mode, dev_t dev); + struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); int (*inode_readlink) (struct dentry *dentry); @@ -1670,11 +1671,12 @@ static inline int security_inode_rmdir ( static inline int security_inode_mknod (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode, dev_t dev) { if (unlikely (IS_PRIVATE (dir))) return 0; - return security_ops-inode_mknod (dir, dentry, mode, dev); + return security_ops-inode_mknod (dir, dentry, mnt, mode, dev); } static inline int security_inode_rename (struct inode *old_dir, @@ -2388,6 +2390,7 @@ static inline int security_inode_rmdir ( static inline int security_inode_mknod (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode, dev_t dev) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -299,7 +299,7 @@ static int dummy_inode_rmdir (struct ino } static int dummy_inode_mknod (struct inode *inode, struct dentry *dentry, - int mode, dev_t dev) + struct vfsmount *mnt, int mode, dev_t dev) { return 0; } --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2218,11 +2218,12 @@ static int selinux_inode_rmdir(struct in return may_link(dir, dentry, MAY_RMDIR); } -static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode, dev_t dev) { int rc; - rc = secondary_ops-inode_mknod(dir, dentry, mode, dev); + rc = secondary_ops-inode_mknod(dir, dentry, mnt, mode, dev); if (rc) return rc; -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] AF_RXRPC: AF_RXRPC depends on IPv4
Add a dependency for CONFIG_AF_RXRPC on CONFIG_INET. This fixes this error: net/built-in.o: In function `rxrpc_get_peer': (.text+0x42824): undefined reference to `ip_route_output_key' Signed-off-by: David Howells [EMAIL PROTECTED] --- net/rxrpc/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/rxrpc/Kconfig b/net/rxrpc/Kconfig index 91b3d52..e662f1d 100644 --- a/net/rxrpc/Kconfig +++ b/net/rxrpc/Kconfig @@ -4,7 +4,7 @@ config AF_RXRPC tristate RxRPC session sockets - depends on EXPERIMENTAL + depends on INET EXPERIMENTAL select KEYS help Say Y or M here to include support for RxRPC session sockets (just - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] AF_RXRPC: Make call state names available if CONFIG_PROC_FS=n
Make the call state names array available even if CONFIG_PROC_FS is disabled as it's used in other places (such as debugging statements) too. Signed-off-by: David Howells [EMAIL PROTECTED] --- net/rxrpc/ar-call.c | 19 +++ net/rxrpc/ar-proc.c | 19 --- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/net/rxrpc/ar-call.c b/net/rxrpc/ar-call.c index 4d92d88..3c04b00 100644 --- a/net/rxrpc/ar-call.c +++ b/net/rxrpc/ar-call.c @@ -15,6 +15,25 @@ #include net/af_rxrpc.h #include ar-internal.h +const char *rxrpc_call_states[] = { + [RXRPC_CALL_CLIENT_SEND_REQUEST]= ClSndReq, + [RXRPC_CALL_CLIENT_AWAIT_REPLY] = ClAwtRpl, + [RXRPC_CALL_CLIENT_RECV_REPLY] = ClRcvRpl, + [RXRPC_CALL_CLIENT_FINAL_ACK] = ClFnlACK, + [RXRPC_CALL_SERVER_SECURING]= SvSecure, + [RXRPC_CALL_SERVER_ACCEPTING] = SvAccept, + [RXRPC_CALL_SERVER_RECV_REQUEST]= SvRcvReq, + [RXRPC_CALL_SERVER_ACK_REQUEST] = SvAckReq, + [RXRPC_CALL_SERVER_SEND_REPLY] = SvSndRpl, + [RXRPC_CALL_SERVER_AWAIT_ACK] = SvAwtACK, + [RXRPC_CALL_COMPLETE] = Complete, + [RXRPC_CALL_SERVER_BUSY]= SvBusy , + [RXRPC_CALL_REMOTELY_ABORTED] = RmtAbort, + [RXRPC_CALL_LOCALLY_ABORTED]= LocAbort, + [RXRPC_CALL_NETWORK_ERROR] = NetError, + [RXRPC_CALL_DEAD] = Dead, +}; + struct kmem_cache *rxrpc_call_jar; LIST_HEAD(rxrpc_calls); DEFINE_RWLOCK(rxrpc_call_lock); diff --git a/net/rxrpc/ar-proc.c b/net/rxrpc/ar-proc.c index 58f4b4e..1c0be0e 100644 --- a/net/rxrpc/ar-proc.c +++ b/net/rxrpc/ar-proc.c @@ -25,25 +25,6 @@ static const char *rxrpc_conn_states[] = { [RXRPC_CONN_NETWORK_ERROR] = NetError, }; -const char *rxrpc_call_states[] = { - [RXRPC_CALL_CLIENT_SEND_REQUEST]= ClSndReq, - [RXRPC_CALL_CLIENT_AWAIT_REPLY] = ClAwtRpl, - [RXRPC_CALL_CLIENT_RECV_REPLY] = ClRcvRpl, - [RXRPC_CALL_CLIENT_FINAL_ACK] = ClFnlACK, - [RXRPC_CALL_SERVER_SECURING]= SvSecure, - [RXRPC_CALL_SERVER_ACCEPTING] = SvAccept, - [RXRPC_CALL_SERVER_RECV_REQUEST]= SvRcvReq, - [RXRPC_CALL_SERVER_ACK_REQUEST] = SvAckReq, - [RXRPC_CALL_SERVER_SEND_REPLY] = SvSndRpl, - [RXRPC_CALL_SERVER_AWAIT_ACK] = SvAwtACK, - [RXRPC_CALL_COMPLETE] = Complete, - [RXRPC_CALL_SERVER_BUSY]= SvBusy , - [RXRPC_CALL_REMOTELY_ABORTED] = RmtAbort, - [RXRPC_CALL_LOCALLY_ABORTED]= LocAbort, - [RXRPC_CALL_NETWORK_ERROR] = NetError, - [RXRPC_CALL_DEAD] = Dead, -}; - /* * generate a list of extant and dead calls in /proc/net/rxrpc_calls */ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5][TAKE2] fallocate system call
This is the new set of patches which take care of the review comments received from the community (mainly from Andrew). Description: --- fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called fallocate. Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. Interface: - The proposed system call's layout is: asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len) fd: The descriptor of the open file. mode*: This specifies the behavior of the system call. Currently the system call supports two modes - FA_ALLOCATE and FA_DEALLOCATE. FA_ALLOCATE: Applications can use this mode to preallocate blocks to a given file (specified by fd). This mode changes the file size if the preallocation is done beyond the EOF. It also updates the ctime/mtime in the inode of the corresponding file, marking a successfull allocation. FA_DEALLOCATE: This mode can be used by applications to deallocate the previously preallocated blocks. This also may change the file size and the ctime/mtime. * New modes might get added in future. One such new mode which is already under discussion is FA_PREALLOCATE, which when used will preallocate space but will not change the filesize and [cm]time. Since the semantics of this new mode is not clear and agreed upon yet, this patchset does not implement it currently. offset: This is the offset in bytes, from where the preallocation should start. len: This is the number of bytes requested for preallocation (from offset). sys_fallocate() on s390: --- There is a problem with s390 ABI to implement sys_fallocate() with the proposed order of arguments. Martin Schwidefsky has suggested a patch to solve this problem which makes use of a wrapper in the kernel. This will require special handling of this system call on s390 in glibc as well. But, this seems to be the best solution so far. Known Problem: - mmapped writes into uninitialized extents is a known problem with the current ext4 patches. Like XFS, ext4 may need to implement -page_mkwrite() to solve this. See: http://lkml.org/lkml/2007/5/8/583 Since there is a talk of -fault() replacing -page_mkwrite() and also with a generic block_page_mkwrite() implementation already posted, we can implement this later some time. See: http://lkml.org/lkml/2007/3/7/161 http://lkml.org/lkml/2007/3/18/198 ToDos: - 1 Implementation on other architectures (other than i386, x86_64, ppc64 and s390(x)). David Chinner has already posted a patch for ia64. 2 A generic file system operation to handle fallocate (generic_fallocate), for filesystems that do _not_ have the fallocate inode operation implemented. 3 Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Changelog: - Each post will have an individual changelog for the particular patch. Following posts with patches follow: Patch 1/5 : fallocate() implementation on i86, x86_64 and powerpc Patch 2/5 : fallocate() on s390 Patch 3/5 : ext4: Extent overlap bugfix Patch 4/5 : ext4: fallocate support in ext4 Patch 5/5 : ext4: write support for preallocated blocks -- Regards, Amit Arora - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ext4: fallocate support in ext4
On Mon, 7 May 2007 05:37:54 -0600 Does the proposed implementation handle quotas correctly, btw? Has that been tested? It seems to handle quotas fine - the block allocation itself does not differ from the usual case, just the extents in the tree are marked as uninitialized... The only question is whether DQUOT_PREALLOC_BLOCK() shouldn't be called instead of DQUOT_ALLOC_BLOCK(). Then fallocate() won't be able to allocate anything after the softlimit has been reached which makes some sence but probably current behavior is kind-of less surprising. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/45] AppArmor security module overview
and with the actual introductory text this time This post contains patches to include the AppArmor application security framework, with request for inclusion. It contains fixes for almost all of the feedback received from the previous post. A second follow up posting will address passing NULL nameidata. Changes since previous post: - Refactor d_path() patches: Separate changes to d_path(), getcwd(), and /proc/mounts from __d_path() cleanups. - Switch from file_permission() to vfs_permission() in do_path_lookup() and sys_fchdir(): this avoids calling permission() with a NULL nameidata there. - Fix file_permission() to not use NULL nameidata for its remaining users: it makes little sense to replace file_permission() with vfs_permission() everywhere. - Remove special casing for access to /proc/self/attr/current by adding rules to policy user side. - Remove redundant fn's in lsm.c by calling cap functions directly from the security operations vector. - Disallow ptracing process with different namespace. - Use beX_to_cpu instead on ntoX in dfa unpack code. - Fix potential overflow in unpack bounds checking. - Limit profile recursion depth to 1 level. - Factor out sysctl pathname code from selinux to add generic sysctl_pathname() function in kernel/sysctl.c. Replace special casing of sysctl write with finer grained mediation using sysctl_pathname() function to provide pathname for sysctl mediation. - Escape special characters in pathnames when used in audit messages. - Remove use of task-comm from audit messages. The use of task-comm was incorrect and only used as a human readable hint. - Some structural cleanups on apparmors audit code paths. - Set LOOKUP_CONTINUE flag when checking parent permissions. This allows permission functions to tell between parent and leaf checks. Check for (LOOKUP_PARENT | LOOKUP_CONTINUE) in the inode_permission apparmor hook. - Drop rejection of CLONE_NEWNS since the kernel already requires CAP_SYS_ADMIN. - Add a missing dput() in apparmorfs_detry_refcount(). - remove kernel doc style comment header from comments that are not in kernel doc format - use lock subtyping to address lockdep reporint possible irq lock inversion The patch series consists of five areas: (1) Pass struct vfsmount through to LSM hooks. (2) Fixes and improvements to __d_path(): (a) make it unambiguous and exclude unreachable paths from /proc/mounts, (b) make its result consistent in the face of remounts, (c) introduce d_namespace_path(), a variant of d_path that goes up to the namespace root instead of the chroot. (d) the behavior of d_path() and getcwd() remain unchanged, and there is no hidding of unreachable paths in /proc/mounts. The patches addressing these have been seperated from the AppArmor submission and will be introduced at a later date. Part (a) has been in the -mm tree for a while; this series includes an updated copy of the -mm patch. Parts (b) and (c) shouldn't be too controversial. (3) Be able to distinguish file descriptor access from access by name in LSM hooks. Applications expect different behavior from file descriptor accesses and accesses by name in some cases. We need to pass this information down the LSM hooks to allow AppArmor to tell which is which. (4) Convert the selinux sysctl pathname computation code into a standalone function. (5) The AppArmor LSM itself. (See below.) A tarball of the kernel patches, base user-space utilities, example profiles, and technical documentation (including a walk-through) are available at: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/ Explaining the AppArmor design in detail would take by far too much space here, so let me refer you to the technical documentation for that. Included is a low-level walk-through of the system and basic tools, and some examples. The manual pages included in the apparmor-parser package are worth a read as well. pgpjytPcIcfFR.pgp Description: PGP signature
Re: [RFD Patch 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs
sigh, and with the intoductory text attached This post is a request for discussion on creating a second minimal nameidata struct to eliminate conditionally passing of vfsmounts to the LSM. It contains a series of patches that apply on top of the AppArmor patch series. A previous version of these patches was posted by Andreas Gruenbacher on April 16, and the issues raised then have been addressed. To remove conditionally passing of vfsmounts to the LSM, a nameidata struct can be instantiated in the nfsd and mqueue filesystems. This however results in useless information being passed down, as not all fields in the nameidata struct will be meaingful. The nameidata struct is split creating struct nameidata2 that contains only the fields that will carry meaningful information. The creation of the nameidata2 struct raises the possibility of replacing the current dentry, vfsmount argument pairs in the vfs and lsm patches with a single nameidata2 argument although these patches do not currently do this. A tarball of these patches and the AppArmor kernel patches are available at: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/ pgpE0IRYuU6bi.pgp Description: PGP signature
[PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc
This patch implements sys_fallocate() and adds support on i386, x86_64 and powerpc platforms. Changelog: - Following changes were made to the previous version: 1) Added description before sys_fallocate() definition. 2) Return EINVAL for len=0 (With new draft that Ulrich pointed to, posix_fallocate should return EINVAL for len = 0. 3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE 4) Do not return ENODEV for dirs (let individual file systems decide if they want to support preallocation to directories or not. 5) Check for wrap through zero. 6) Update c/mtime if fallocate() succeeds. 7) Added mode descriptions in fs.h 8) Added variable names to function definition (fallocate inode op) Here is the new patch: Signed-off-by: Amit Arora [EMAIL PROTECTED] --- arch/i386/kernel/syscall_table.S |1 arch/powerpc/kernel/sys_ppc32.c |7 +++ arch/x86_64/kernel/functionlist |1 fs/open.c| 89 +++ include/asm-i386/unistd.h|3 - include/asm-powerpc/systbl.h |1 include/asm-powerpc/unistd.h |3 - include/asm-x86_64/unistd.h |4 + include/linux/fs.h | 13 + include/linux/syscalls.h |1 10 files changed, 120 insertions(+), 3 deletions(-) Index: linux-2.6.21/arch/i386/kernel/syscall_table.S === --- linux-2.6.21.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6.21/arch/i386/kernel/syscall_table.S @@ -319,3 +319,4 @@ ENTRY(sys_call_table) .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_fallocate /* 320 */ Index: linux-2.6.21/arch/x86_64/kernel/functionlist === --- linux-2.6.21.orig/arch/x86_64/kernel/functionlist +++ linux-2.6.21/arch/x86_64/kernel/functionlist @@ -931,6 +931,7 @@ *(.text.sys_getitimer) *(.text.sys_getgroups) *(.text.sys_ftruncate) +*(.text.sys_fallocate) *(.text.sysfs_lookup) *(.text.sys_exit_group) *(.text.stub_fork) Index: linux-2.6.21/fs/open.c === --- linux-2.6.21.orig/fs/open.c +++ linux-2.6.21/fs/open.c @@ -351,6 +351,95 @@ asmlinkage long sys_ftruncate64(unsigned #endif /* + * sys_fallocate - preallocate blocks or free preallocated blocks + * @fd: the file descriptor + * @mode: mode specifies if fallocate should preallocate blocks OR free + * (unallocate) preallocated blocks. Currently only FA_ALLOCATE and + * FA_DEALLOCATE modes are supported. + * @offset: The offset within file, from where (un)allocation is being + * requested. It should not have a negative value. + * @len: The amount (in bytes) of space to be (un)allocated, from the offset. + * + * This system call, depending on the mode, preallocates or unallocates blocks + * for a file. The range of blocks depends on the value of offset and len + * arguments provided by the user/application. For FA_ALLOCATE mode, if this + * system call succeeds, subsequent writes to the file in the given range + * (specified by offset len) should not fail - even if the file system + * later becomes full. Hence the preallocation done is persistent (valid + * even after reopen of the file and remount/reboot). + * + * Note: Incase the file system does not support preallocation, + * posix_fallocate() should fall back to the library implementation (i.e. + * allocating zero-filled new blocks to the file). + * + * Return Values + * 0 : On SUCCESS a value of zero is returned. + * error : On Failure, an error code will be returned. + * An error code of -ENOSYS or -EOPNOTSUPP should make posix_fallocate() + * fall back on library implementation of fallocate. + * + * TBD Generic fallocate to be added for file systems that do not + * support fallocate it. + */ +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len) +{ + struct file *file; + struct inode *inode; + long ret = -EINVAL; + + if (offset 0 || len = 0) + goto out; + + /* Return error if mode is not supported */ + ret = -EOPNOTSUPP; + if (mode != FA_ALLOCATE mode !=FA_DEALLOCATE) + goto out; + + ret = -EBADF; + file = fget(fd); + if (!file) + goto out; + if (!(file-f_mode FMODE_WRITE)) + goto out_fput; + + inode = file-f_path.dentry-d_inode; + + ret = -ESPIPE; + if (S_ISFIFO(inode-i_mode)) + goto out_fput; + + ret = -ENODEV; + /* +* Let individual file system decide if it supports preallocation +* for directories or not. +*/ + if (!S_ISREG(inode-i_mode) !S_ISDIR(inode-i_mode)) + goto out_fput; + + ret = -EFBIG; + /* Check for wrap through zero too */ + if (((offset +
[PATCH 2/5][TAKE2] fallocate() on s390
This is the patch suggested by Martin Schwidefsky. Here are the comments and patch from him. - From: Martin Schwidefsky [EMAIL PROTECTED] This patch implements support of fallocate system call on s390(x) platform. A wrapper is added to address the issue which s390 ABI has with the arguments of this system call. Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED] --- arch/s390/kernel/compat_wrapper.S | 10 ++ arch/s390/kernel/sys_s390.c | 29 + arch/s390/kernel/syscalls.S |1 + include/asm-s390/unistd.h |3 ++- 4 files changed, 42 insertions(+), 1 deletion(-) Index: linux-2.6.21/arch/s390/kernel/compat_wrapper.S === --- linux-2.6.21.orig/arch/s390/kernel/compat_wrapper.S +++ linux-2.6.21/arch/s390/kernel/compat_wrapper.S @@ -1682,3 +1682,13 @@ compat_sys_utimes_wrapper: llgtr %r2,%r2 # char * llgtr %r3,%r3 # struct compat_timeval * jg compat_sys_utimes + + .globl sys_fallocate_wrapper +sys_fallocate_wrapper: + lgfr%r2,%r2 # int + lgfr%r3,%r3 # int + sllg%r4,%r4,32 # get high word of 64bit loff_t + lr %r4,%r5 # get low word of 64bit loff_t + sllg%r5,%r6,32 # get high word of 64bit loff_t + l %r5,164(%r15) # get low word of 64bit loff_t + jg sys_fallocate Index: linux-2.6.21/arch/s390/kernel/syscalls.S === --- linux-2.6.21.orig/arch/s390/kernel/syscalls.S +++ linux-2.6.21/arch/s390/kernel/syscalls.S @@ -322,3 +322,4 @@ NI_SYSCALL /* 310 sys_move_pages * SYSCALL(sys_getcpu,sys_getcpu,sys_getcpu_wrapper) SYSCALL(sys_epoll_pwait,sys_epoll_pwait,compat_sys_epoll_pwait_wrapper) SYSCALL(sys_utimes,sys_utimes,compat_sys_utimes_wrapper) +SYSCALL(s390_fallocate,sys_fallocate,sys_fallocate_wrapper) Index: linux-2.6.21/arch/s390/kernel/sys_s390.c === --- linux-2.6.21.orig/arch/s390/kernel/sys_s390.c +++ linux-2.6.21/arch/s390/kernel/sys_s390.c @@ -286,3 +286,32 @@ int kernel_execve(const char *filename, d (__arg3) : memory); return __svcres; } + +#ifndef CONFIG_64BIT +/* + * This is a wrapper to call sys_fallocate(). For 31 bit s390 the last + * 64 bit argument len is split into the upper and lower 32 bits. The + * system call wrapper in the user space loads the value to %r6/%r7. + * The code in entry.S keeps the values in %r2 - %r6 where they are and + * stores %r7 to 96(%r15). But the standard C linkage requires that + * the whole 64 bit value for len is stored on the stack and doesn't + * use %r6 at all. So s390_fallocate has to convert the arguments from + * %r2: fd, %r3: mode, %r4/%r5: offset, %r6/96(%r15)-99(%r15): len + * to + * %r2: fd, %r3: mode, %r4/%r5: offset, 96(%r15)-103(%r15): len + */ +asmlinkage long s390_fallocate(int fd, int mode, loff_t offset, + u32 len_high, u32 len_low) +{ + union { + u64 len; + struct { + u32 high; + u32 low; + }; + } cv; + cv.high = len_high; + cv.low = len_low; + return sys_fallocate(fd, mode, offset, cv.len); +} +#endif Index: linux-2.6.21/include/asm-s390/unistd.h === --- linux-2.6.21.orig/include/asm-s390/unistd.h +++ linux-2.6.21/include/asm-s390/unistd.h @@ -251,8 +251,9 @@ #define __NR_getcpu311 #define __NR_epoll_pwait 312 #define __NR_utimes313 +#define __NR_fallocate 314 -#define NR_syscalls 314 +#define NR_syscalls 315 /* * There are some system calls that are not present on 64 bit, some - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5][TAKE2] ext4: Extent overlap bugfix
This patch adds a check for overlap of extents and cuts short the new extent to be inserted, if there is a chance of overlap. Changelog: - As suggested by Andrew, a check for wrap though zero has been added. Here is the new patch: Signed-off-by: Amit Arora [EMAIL PROTECTED] --- fs/ext4/extents.c | 60 ++-- include/linux/ext4_fs_extents.h |1 2 files changed, 59 insertions(+), 2 deletions(-) Index: linux-2.6.21/fs/ext4/extents.c === --- linux-2.6.21.orig/fs/ext4/extents.c +++ linux-2.6.21/fs/ext4/extents.c @@ -1129,6 +1129,55 @@ ext4_can_extents_be_merged(struct inode } /* + * check if a portion of the newext extent overlaps with an + * existing extent. + * + * If there is an overlap discovered, it updates the length of the newext + * such that there will be no overlap, and then returns 1. + * If there is no overlap found, it returns 0. + */ +unsigned int ext4_ext_check_overlap(struct inode *inode, + struct ext4_extent *newext, + struct ext4_ext_path *path) +{ + unsigned long b1, b2; + unsigned int depth, len1; + unsigned int ret = 0; + + b1 = le32_to_cpu(newext-ee_block); + len1 = le16_to_cpu(newext-ee_len); + depth = ext_depth(inode); + if (!path[depth].p_ext) + goto out; + b2 = le32_to_cpu(path[depth].p_ext-ee_block); + + /* +* get the next allocated block if the extent in the path +* is before the requested block(s) +*/ + if (b2 b1) { + b2 = ext4_ext_next_allocated_block(path); + if (b2 == EXT_MAX_BLOCK) + goto out; + } + + /* check for wrap through zero */ + if (b1 + len1 b1) { + len1 = EXT_MAX_BLOCK - b1; + newext-ee_len = cpu_to_le16(len1); + ret = 1; + } + + /* check for overlap */ + if (b1 + len1 b2) { + newext-ee_len = cpu_to_le16(b2 - b1); + ret = 1; + } +out: + return ret; +} + +/* * ext4_ext_insert_extent: * tries to merge requsted extent into the existing extent or * inserts requested extent as new one into the tree, @@ -2032,7 +2081,15 @@ int ext4_ext_get_blocks(handle_t *handle /* allocate new block */ goal = ext4_ext_find_goal(inode, path, iblock); - allocated = max_blocks; + + /* Check if we can really insert (iblock)::(iblock+max_blocks) extent */ + newex.ee_block = cpu_to_le32(iblock); + newex.ee_len = cpu_to_le16(max_blocks); + err = ext4_ext_check_overlap(inode, newex, path); + if (err) + allocated = le16_to_cpu(newex.ee_len); + else + allocated = max_blocks; newblock = ext4_new_blocks(handle, inode, goal, allocated, err); if (!newblock) goto out2; @@ -2040,7 +2097,6 @@ int ext4_ext_get_blocks(handle_t *handle goal, newblock, allocated); /* try to insert new extent into found leaf and return */ - newex.ee_block = cpu_to_le32(iblock); ext4_ext_store_pblock(newex, newblock); newex.ee_len = cpu_to_le16(allocated); err = ext4_ext_insert_extent(handle, inode, path, newex); Index: linux-2.6.21/include/linux/ext4_fs_extents.h === --- linux-2.6.21.orig/include/linux/ext4_fs_extents.h +++ linux-2.6.21/include/linux/ext4_fs_extents.h @@ -190,6 +190,7 @@ ext4_ext_invalidate_cache(struct inode * extern int ext4_extent_tree_init(handle_t *, struct inode *); extern int ext4_ext_calc_credits_for_insert(struct inode *, struct ext4_ext_path *); +extern unsigned int ext4_ext_check_overlap(struct inode *, struct ext4_extent *, struct ext4_ext_path *); extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *); extern int ext4_ext_walk_space(struct inode *, unsigned long, unsigned long, ext_prepare_callback, void *); extern struct ext4_ext_path * ext4_ext_find_extent(struct inode *, int, struct ext4_ext_path *); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5][TAKE2] ext4: fallocate support in ext4
This patch implements -fallocate() inode operation in ext4. With this patch users of ext4 file systems will be able to use fallocate() system call for persistent preallocation. Current implementation only supports preallocation for regular files (directories not supported as of date) with extent maps. This patch does not support block-mapped files currently. Only FA_ALLOCATE mode is being supported as of now. Supporting FA_DEALLOCATE mode is a To Do item. Changelog: - Here are the changes from the previous post: 1) Added more description for ext4_fallocate(). 2) Now returning EOPNOTSUPP when files are block-mapped (non-extent). 3) Moved journal_start journal_stop inside the while loop. 4) Replaced BUG_ON with WARN_ON ext4_error. 5) Make EXT4_BLOCK_ALIGN use ALIGN macro internally. 6) Added variable names in the function declaration of ext4_fallocate() 7) Converted macros that handle uninitialized extents into inline functions. Here is the updated patch: Signed-off-by: Amit Arora [EMAIL PROTECTED] --- fs/ext4/extents.c | 241 +--- fs/ext4/file.c |1 include/linux/ext4_fs.h |8 + include/linux/ext4_fs_extents.h | 12 + 4 files changed, 221 insertions(+), 41 deletions(-) Index: linux-2.6.21/fs/ext4/extents.c === --- linux-2.6.21.orig/fs/ext4/extents.c +++ linux-2.6.21/fs/ext4/extents.c @@ -283,7 +283,7 @@ static void ext4_ext_show_path(struct in } else if (path-p_ext) { ext_debug( %d:%d:%llu , le32_to_cpu(path-p_ext-ee_block), - le16_to_cpu(path-p_ext-ee_len), + ext4_ext_get_actual_len(path-p_ext), ext_pblock(path-p_ext)); } else ext_debug( []); @@ -306,7 +306,7 @@ static void ext4_ext_show_leaf(struct in for (i = 0; i le16_to_cpu(eh-eh_entries); i++, ex++) { ext_debug(%d:%d:%llu , le32_to_cpu(ex-ee_block), - le16_to_cpu(ex-ee_len), ext_pblock(ex)); + ext4_ext_get_actual_len(ex), ext_pblock(ex)); } ext_debug(\n); } @@ -426,7 +426,7 @@ ext4_ext_binsearch(struct inode *inode, ext_debug( - %d:%llu:%d , le32_to_cpu(path-p_ext-ee_block), ext_pblock(path-p_ext), - le16_to_cpu(path-p_ext-ee_len)); + ext4_ext_get_actual_len(path-p_ext)); #ifdef CHECK_BINSEARCH { @@ -687,7 +687,7 @@ static int ext4_ext_split(handle_t *hand ext_debug(move %d:%llu:%d in new leaf %llu\n, le32_to_cpu(path[depth].p_ext-ee_block), ext_pblock(path[depth].p_ext), - le16_to_cpu(path[depth].p_ext-ee_len), + ext4_ext_get_actual_len(path[depth].p_ext), newblock); /*memmove(ex++, path[depth].p_ext++, sizeof(struct ext4_extent)); @@ -1107,7 +1107,19 @@ static int ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, struct ext4_extent *ex2) { - if (le32_to_cpu(ex1-ee_block) + le16_to_cpu(ex1-ee_len) != + unsigned short ext1_ee_len, ext2_ee_len; + + /* +* Make sure that either both extents are uninitialized, or +* both are _not_. +*/ + if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2)) + return 0; + + ext1_ee_len = ext4_ext_get_actual_len(ex1); + ext2_ee_len = ext4_ext_get_actual_len(ex2); + + if (le32_to_cpu(ex1-ee_block) + ext1_ee_len != le32_to_cpu(ex2-ee_block)) return 0; @@ -1116,14 +1128,14 @@ ext4_can_extents_be_merged(struct inode * as an RO_COMPAT feature, refuse to merge to extents if * this can result in the top bit of ee_len being set. */ - if (le16_to_cpu(ex1-ee_len) + le16_to_cpu(ex2-ee_len) EXT_MAX_LEN) + if (ext1_ee_len + ext2_ee_len EXT_MAX_LEN) return 0; #ifdef AGGRESSIVE_TEST if (le16_to_cpu(ex1-ee_len) = 4) return 0; #endif - if (ext_pblock(ex1) + le16_to_cpu(ex1-ee_len) == ext_pblock(ex2)) + if (ext_pblock(ex1) + ext1_ee_len == ext_pblock(ex2)) return 1; return 0; } @@ -1145,7 +1157,7 @@ unsigned int ext4_ext_check_overlap(stru unsigned int ret = 0; b1 = le32_to_cpu(newext-ee_block); - len1 = le16_to_cpu(newext-ee_len); + len1 = ext4_ext_get_actual_len(newext); depth = ext_depth(inode); if (!path[depth].p_ext) goto out; @@ -1192,8 +1204,9 @@ int
[PATCH 5/5][TAKE2] ext4: write support for preallocated blocks
This patch adds write support to the uninitialized extents that get created when a preallocation is done using fallocate(). It takes care of splitting the extents into multiple (upto three) extents and merging the new split extents with neighbouring ones, if possible. Changelog: - 1) Replaced BUG_ON with WARN_ON ext4_error. 2) Added variable names to the function declaration of ext4_ext_try_to_merge(). 3) Updated variable declarations to use multiple-definitions-per-line. 4) if((a=foo())).. was broken into a=foo(); if(a).. 5) Removed extra spaces. Here is the updated patch: Signed-off-by: Amit Arora [EMAIL PROTECTED] --- fs/ext4/extents.c | 234 +++- include/linux/ext4_fs_extents.h |3 2 files changed, 210 insertions(+), 27 deletions(-) Index: linux-2.6.21/fs/ext4/extents.c === --- linux-2.6.21.orig/fs/ext4/extents.c +++ linux-2.6.21/fs/ext4/extents.c @@ -1141,6 +1141,54 @@ ext4_can_extents_be_merged(struct inode } /* + * This function tries to merge the ex extent to the next extent in the tree. + * It always tries to merge towards right. If you want to merge towards + * left, pass ex - 1 as argument instead of ex. + * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns + * 1 if they got merged. + */ +int ext4_ext_try_to_merge(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_extent *ex) +{ + struct ext4_extent_header *eh; + unsigned int depth, len; + int merge_done = 0; + int uninitialized = 0; + + depth = ext_depth(inode); + BUG_ON(path[depth].p_hdr == NULL); + eh = path[depth].p_hdr; + + while (ex EXT_LAST_EXTENT(eh)) + { + if (!ext4_can_extents_be_merged(inode, ex, ex + 1)) + break; + /* merge with next extent! */ + if (ext4_ext_is_uninitialized(ex)) + uninitialized = 1; + ex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex) + + ext4_ext_get_actual_len(ex + 1)); + if (uninitialized) + ext4_ext_mark_uninitialized(ex); + + if (ex + 1 EXT_LAST_EXTENT(eh)) { + len = (EXT_LAST_EXTENT(eh) - ex - 1) + * sizeof(struct ext4_extent); + memmove(ex + 1, ex + 2, len); + } + eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries) - 1); + merge_done = 1; + WARN_ON(eh-eh_entries == 0); + if (!eh-eh_entries) + ext4_error(inode-i_sb, ext4_ext_try_to_merge, + inode#%lu, eh-eh_entries = 0!, inode-i_ino); + } + + return merge_done; +} + +/* * check if a portion of the newext extent overlaps with an * existing extent. * @@ -1328,25 +1376,7 @@ has_space: merge: /* try to merge extents to the right */ - while (nearex EXT_LAST_EXTENT(eh)) { - if (!ext4_can_extents_be_merged(inode, nearex, nearex + 1)) - break; - /* merge with next extent! */ - if (ext4_ext_is_uninitialized(nearex)) - uninitialized = 1; - nearex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(nearex) - + ext4_ext_get_actual_len(nearex + 1)); - if (uninitialized) - ext4_ext_mark_uninitialized(nearex); - - if (nearex + 1 EXT_LAST_EXTENT(eh)) { - len = (EXT_LAST_EXTENT(eh) - nearex - 1) - * sizeof(struct ext4_extent); - memmove(nearex + 1, nearex + 2, len); - } - eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries)-1); - BUG_ON(eh-eh_entries == 0); - } + ext4_ext_try_to_merge(inode, path, nearex); /* try to merge extents to the left */ @@ -2012,15 +2042,152 @@ void ext4_ext_release(struct super_block #endif } +/* + * This function is called by ext4_ext_get_blocks() if someone tries to write + * to an uninitialized extent. It may result in splitting the uninitialized + * extent into multiple extents (upto three - one initialized and two + * uninitialized). + * There are three possibilities: + * a There is no split required: Entire extent should be initialized + * b Splits in two extents: Write is happening at either end of the extent + * c Splits in three extents: Somone is writing in middle of the extent + */ +int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode, + struct ext4_ext_path *path, + ext4_fsblk_t iblock, +
Re: [RFC][PATCH 14/14] tmpfs whiteout support
On Mon, 14 May 2007, Bharata B Rao wrote: From: Jan Blunck [EMAIL PROTECTED] Subject: tmpfs whiteout support Introduce whiteout support to tmpfs. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- mm/shmem.c |9 - 1 files changed, 8 insertions(+), 1 deletion(-) --- a/mm/shmem.c +++ b/mm/shmem.c @@ -74,7 +74,7 @@ #define LATENCY_LIMIT 64 /* Pretend that each entry is of this size in directory's i_size */ -#define BOGO_DIRENT_SIZE 20 +#define BOGO_DIRENT_SIZE 1 Why would that change be needed for whiteout support? Hugh - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] file capabilities: accomodate 32 bit capabilities
Quoting Suparna Bhattacharya ([EMAIL PROTECTED]): On Thu, May 10, 2007 at 01:01:27PM -0700, Andreas Dilger wrote: On May 08, 2007 16:49 -0500, Serge E. Hallyn wrote: Quoting Andreas Dilger ([EMAIL PROTECTED]): One of the important use cases I can see today is the ability to split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine grained attributes. Sounds plausible, though it suffers from both making capabilities far more cumbersome (i.e. finding the right capability for what you wanted to do) and backward compatibility. Perhaps at that point we should introduce security.capabilityv2 xattrs. A binary can then carry security.capability=CAP_SYS_ADMIN=p, and security.capabilityv2=cap_may_clone_mntns=p. Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing 12 bytes worth of data, so it is probably just better to keep extending the original capability fields as was in the proposal. What we definitely do NOT want to happen is an application that needs priviledged access (e.g. e2fsck, mount) to stop running because the new capabilities _would_ have been granted by the new kernel and are not by the old kernel and STRICTXATTR is used. To me it would seem that having extra capabilities on an old kernel is relatively harmless if the old kernel doesn't know what they are. It's like having a key to a door that you don't know where it is. If we ditch the STRICTXATTR option do the semantics seem sane to you? Seems reasonable. It would simplify the code as well, which is good. This does mean no sanity checking of fcaps, am not sure if that matters, I'm guessing it should be similar to the case for other security attributes. which is to trust the xattr... So here is a new consolidated patch without the STRICTXATTR config option. -serge From: Serge E. Hallyn [EMAIL PROTECTED] Subject: [PATCH] Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: May 14: Remove STRICTXATTR support which could make newer binaries unusable on older kernels, and combine the two patches into one. [recent]: 1. Enable the CONFIG_SECURITY_FS_CAPABILITIES option when CONFIG_SECURITY=n. 2. Rename CONFIG_SECURITY_FS_CAPABILITIES to CONFIG_SECURITY_FILE_CAPABILITIES 3. To accomodate 64-bit caps, specify that capabilities are stored as u32 version; u32 eff0; u32 perm0; u32 inh0; u32 eff1; u32 perm1; u32 inh1; (etc) Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't
Re: [2.6.21] circular locking dependency found in QUOTA OFF
[adding Jan and fsdevel to CC] Hi Folkert, On 14/05/07, Folkert van Heusden [EMAIL PROTECTED] wrote: Hi, When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram and system on an 1-filesystem IDE disk, I get the following circular locking dependency error: [330961.226405] === [330961.226489] [ INFO: possible circular locking dependency detected ] [330961.226531] 2.6.21 #5 [330961.226569] --- [330961.226611] quotaoff/12249 is trying to acquire lock: [330961.226652] (sb-s_type-i_mutex_key#4){--..}, at: [c120e2a1] mutex_lock+0x8/0xa [330961.226861] [330961.226862] but task is already holding lock: [330961.226938] (s-s_dquot.dqonoff_mutex){--..}, at: [c120e2a1] mutex_lock+0x8/0xa [330961.227111] [330961.227111] which lock already depends on the new lock. [330961.227112] [330961.227225] [330961.227225] the existing dependency chain (in reverse order) is: [330961.227303] [330961.227303] - #1 (s-s_dquot.dqonoff_mutex){--..}: [330961.227473][c1039b02] check_prev_add+0x15b/0x281 [330961.227766][c1039cb3] check_prevs_add+0x8b/0xe8 [330961.228056][c103b683] __lock_acquire+0x692/0xb81 [330961.228353][c103bfda] lock_acquire+0x62/0x81 [330961.228643][c120e322] __mutex_lock_slowpath+0x75/0x28c [330961.228934][c120e2a1] mutex_lock+0x8/0xa [330961.229221][c109fbbe] vfs_quota_on_inode+0xc1/0x25f [330961.229513][c109fdd1] vfs_quota_on+0x75/0x79 [330961.229803][c10bc92d] ext3_quota_on+0x95/0xb0 [330961.230093][c10a1eb2] do_quotactl+0xc9/0x2dd [330961.230384][c10a214a] sys_quotactl+0x84/0xd6 [330961.230673][c1003f74] syscall_call+0x7/0xb [330961.230963][] 0x [330961.231268] [330961.231268] - #0 (sb-s_type-i_mutex_key#4){--..}: [330961.231469][c10399db] check_prev_add+0x34/0x281 [330961.231759][c1039cb3] check_prevs_add+0x8b/0xe8 [330961.232049][c103b683] __lock_acquire+0x692/0xb81 [330961.232344][c103bfda] lock_acquire+0x62/0x81 [330961.232632][c120e322] __mutex_lock_slowpath+0x75/0x28c [330961.232923][c120e2a1] mutex_lock+0x8/0xa [330961.233211][c109fa6c] vfs_quota_off+0x1cf/0x260 [330961.233500][c10a2088] do_quotactl+0x29f/0x2dd [330961.233792][c10a214a] sys_quotactl+0x84/0xd6 [330961.234081][c1003f74] syscall_call+0x7/0xb [330961.234503][] 0x [330961.234795] [330961.234795] other info that might help us debug this: [330961.234796] [330961.234908] 2 locks held by quotaoff/12249: [330961.234947] #0: (type-s_umount_key#15){}, at: [c1070b5d] get_super+0x53/0x94 [330961.235183] #1: (s-s_dquot.dqonoff_mutex){--..}, at: [c120e2a1] mutex_lock+0x8/0xa [330961.235386] [330961.235387] stack backtrace: [330961.235462] [c1004d53] show_trace_log_lvl+0x1a/0x30 [330961.235535] [c1004d7b] show_trace+0x12/0x14 [330961.235606] [c1004e75] dump_stack+0x16/0x18 [330961.235679] [c1039352] print_circular_bug_tail+0x6f/0x71 [330961.235753] [c10399db] check_prev_add+0x34/0x281 [330961.235825] [c1039cb3] check_prevs_add+0x8b/0xe8 [330961.235897] [c103b683] __lock_acquire+0x692/0xb81 [330961.235969] [c103bfda] lock_acquire+0x62/0x81 [330961.236041] [c120e322] __mutex_lock_slowpath+0x75/0x28c [330961.236113] [c120e2a1] mutex_lock+0x8/0xa [330961.236185] [c109fa6c] vfs_quota_off+0x1cf/0x260 [330961.236257] [c10a2088] do_quotactl+0x29f/0x2dd [330961.236330] [c10a214a] sys_quotactl+0x84/0xd6 [330961.236402] [c1003f74] syscall_call+0x7/0xb [330961.236473] === Is this a 2.6.21 regression? Regards, Michal -- Michal K. K. Piotrowski Kernel Monkeys (http://kernel.wikidot.com/start) - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.21] circular locking dependency found in QUOTA OFF
[adding Jan and fsdevel to CC] Hi Folkert, When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram and system on an 1-filesystem IDE disk, I get the following circular locking dependency error: [330961.226405] === [330961.226489] [ INFO: possible circular locking dependency detected ] [330961.226531] 2.6.21 #5 ... [330961.236402] [c1003f74] syscall_call+0x7/0xb [330961.236473] === Is this a 2.6.21 regression? This is new for 2.6.21, yes. Folkert van Heusden -- MultiTail est un flexible tool pour suivre de logfiles et execution de commandements. Filtrer, pourvoir de couleur, merge, 'diff-view', etc. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
On Mon, May 14, 2007 at 04:06:21PM +1000, [EMAIL PROTECTED] wrote: This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we also revert. changes like this play havoc with git-bisect. If you must revert stuff before patching new code in, revert it all in a single diff. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/14] tmpfs whiteout support
On 5/14/07, Hugh Dickins [EMAIL PROTECTED] wrote: On Mon, 14 May 2007, Bharata B Rao wrote: From: Jan Blunck [EMAIL PROTECTED] Subject: tmpfs whiteout support Introduce whiteout support to tmpfs. Signed-off-by: Jan Blunck [EMAIL PROTECTED] Signed-off-by: Bharata B Rao [EMAIL PROTECTED] --- mm/shmem.c |9 - 1 files changed, 8 insertions(+), 1 deletion(-) --- a/mm/shmem.c +++ b/mm/shmem.c @@ -74,7 +74,7 @@ #define LATENCY_LIMIT 64 /* Pretend that each entry is of this size in directory's i_size */ -#define BOGO_DIRENT_SIZE 20 +#define BOGO_DIRENT_SIZE 1 Why would that change be needed for whiteout support? Good question. It seems that this a survivor of the changes necessary for union readdir. This isn't necessary for white-outs. BTW: Why do we claim this to be 20??? Is there any meaning behind this? Cheers, Jan - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/14] tmpfs whiteout support
On Mon, 14 May 2007, Jan Blunck wrote: On 5/14/07, Hugh Dickins [EMAIL PROTECTED] wrote: /* Pretend that each entry is of this size in directory's i_size */ -#define BOGO_DIRENT_SIZE 20 +#define BOGO_DIRENT_SIZE 1 Why would that change be needed for whiteout support? Good question. It seems that this a survivor of the changes necessary for union readdir. (I'd be asking the same question in that case, but don't worry about it!) This isn't necessary for white-outs. Phew, thanks, please drop that hunk. BTW: Why do we claim this to be 20??? Is there any meaning behind this? No great meaning, hence BOGO. I put that in when hpa (IIRC) found tmpfs directory size 0 didn't suit some apps. I thought it would be nice to have a size which indicates the current number of entries (which your 1 would do), looks plausible (for short filenames), and easy to make sense of in an ls -l. Bogus, yes; but I'd resist changing it after all this time, without very good reason. Hugh - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] file capabilities: Introduction
Hi! Serge E. Hallyn [EMAIL PROTECTED] wrote: Following are two patches which have been sitting for some time in -mm. Where some time == nearly six months. We need help considering, reviewing and testing this code, please. I did quick scan, and it looks ok. Plus, it means we can finally start using that old capabilities subsystem... so I think we should do it. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html