Re: [RFC] truncate() (generic stuff)
Hi, On Mon, 11 Oct 1999 11:12:01 -0400 (EDT), Alexander Viro [EMAIL PROTECTED] said: I began screwing around the truncate() stuff and the following is a status report/request for comments: a) call of -truncate() method (and vmtruncate()) had been moved into the notify_change(). It is triggered if ATTR_SIZE is set. Modified places: do_truncate(), nfsd_truncate(), nfsd_setattr() and hpfs_unlink(). Semantics had been changed (fixed, AFAICS) for nfsd_* - truncation happens with capabilities dropped. BTW, nfsd_setattr() didn't call vmtruncate(). Fixed. The page cache has very strict ordering requirements on the inode-i_size update, the vmtruncate call and the actual file mapping truncate operation. I'd be nervous about letting the fs do too much of that itself: what exactly is the rationale behind the change? coda,nfs,ncpfs,smbfs - I'm bringing the call of vmtruncate() into the end of -notify_change() (if ATTR_SIZE is set, indeed). It preserves the current behaviour, but I'm not sure that it's OK. Those filesystems don't have -truncate() and do the equivalent theselves (from - notify_change()). Maybe vmtruncate() should be called earlier. Given that we are already looking at adding back in the inode write lock for truncate, that shouldn't be a problem. --Stephen
Re: [RFC] truncate() (generic stuff)
On Tue, 12 Oct 1999, Stephen C. Tweedie wrote: Hi, On Tue, 12 Oct 1999 09:37:28 -0400 (EDT), Alexander Viro [EMAIL PROTECTED] said: Rationale was: a) get rid of code duplication and get all calls of -truncate() into the same place. b) make it in the same place that sets i_size. c) on many filesystems extending -truncate() may fail. Right now we have no way to report said fact (or any other errors, for that matter). Change will be easier if we will have _one_ place to look after. Yes, and I had to make similar changes in 2.2 to report such changes. Observing RLIMIT_FSIZE in the filesystem requires that, for example. However, 2.3 is a lot more sensitive to the ordering. IMHO it became cleaner - I can post the detailed proof that it preserves the ordering. It was a requirement from the very beginning. Fine, as long as we document somewhere exactly what the filesystem requirements are. Filesystem can either: a) set -notify_change to NULL or b) call in -notify_change() inode_setattr() (which will do the right thing wrt truncation - set i_size, call vmtruncate() and -truncate()) and keep in mind that upon the return from inode_setattr() file is already truncated or c) take care to set i_size by hands and at the end of -notify_change() (if it didn't fail so far and had been asked to change size) do vmtruncate() followed by fs-specific metadata modifications needed for truncation (if they are needed at all - right now none of the filesystems in that category (nfs,coda,ncpfs and smbfs) has such need) In other words, impact on existing filesystems was minimal. I've added the call of vmtruncate() into the end of -notify_change() for coda, nfs, ncpfs and smbfs + made a couple of cleanups in affs and hpfs (not strictly needed - see the final variant of patch for details). That was it.
[RFC] truncate() (generic stuff)
I began screwing around the truncate() stuff and the following is a status report/request for comments: a) call of -truncate() method (and vmtruncate()) had been moved into the notify_change(). It is triggered if ATTR_SIZE is set. Modified places: do_truncate(), nfsd_truncate(), nfsd_setattr() and hpfs_unlink(). Semantics had been changed (fixed, AFAICS) for nfsd_* - truncation happens with capabilities dropped. BTW, nfsd_setattr() didn't call vmtruncate(). Fixed. b) I'ld rather pushed the call of -truncate() deeper - into the inode_setattr() if possible. Situation being: adfs, efs, isofs, qnx4 and romfs - readonly, no problems at all. devpts, autofs - no regular files, no problems. ext2, ufs, minix, sysv, msdos, vfat, udf, umsdos - no problems. ntfs - very odd. No -truncate(), but there is -write(). IOW, call of sys_truncate() will happily pass and just change -i_size. Steve, could you comment on that? No problems wrt this patch, but something is very wrong. procfs - there may be drago^Wbugs. That is, sys_truncate() silently sets i_size to zero. sys_chmod() is also, erm, interesting. Unrelated to the patch, though. hpfs - no problems, but there is something rather odd. Why on the Earth does it forcibly write inode to disk in the end of notify_change()? Immediately before that inode is dirtified by inode_setattr(). Mikulas, is there something special that makes you write it ASAP? affs - no problems, except the call of make_inode_dirty() immediately after inode_setattr() (which dirtifies inode immediately before exit). I'm removing the call. hfs - probably OK, but I'ld really like to hear the confirmation - is there any chance of troubles (from the HFS side, VFS is OK) if the call of -truncate() will happen before the hfs_cat_mark_dirty()? (in hfs_notify_change(), that is). coda,nfs,ncpfs,smbfs - I'm bringing the call of vmtruncate() into the end of -notify_change() (if ATTR_SIZE is set, indeed). It preserves the current behaviour, but I'm not sure that it's OK. Those filesystems don't have -truncate() and do the equivalent theselves (from -notify_change()). Maybe vmtruncate() should be called earlier. c) ext2-related parts (and general messing with pagecache for directories) - still too unstable. IOW, crashes the kernel _fast_. The following is the relatively stable part of the patch (no ext2 stuff). Please, look if you have any problems with it. Cheers, Al diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c --- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999 +++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999 @@ -246,7 +246,6 @@ inode-u.affs_i.i_protect = mode_to_prot(attr-ia_mode); inode_setattr(inode, attr); - mark_inode_dirty(inode); error = 0; out: return error; diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c --- linux-2.3.20/fs/attr.c Mon Jun 21 12:36:42 1999 +++ linux-bird.truncate/fs/attr.c Mon Oct 11 10:54:44 1999 @@ -62,8 +62,6 @@ inode-i_uid = attr-ia_uid; if (ia_valid ATTR_GID) inode-i_gid = attr-ia_gid; - if (ia_valid ATTR_SIZE) - inode-i_size = attr-ia_size; if (ia_valid ATTR_ATIME) inode-i_atime = attr-ia_atime; if (ia_valid ATTR_MTIME) @@ -75,6 +73,12 @@ if (!in_group_p(inode-i_gid) !capable(CAP_FSETID)) inode-i_mode = ~S_ISGID; } + if (ia_valid ATTR_SIZE) { + inode-i_size = attr-ia_size; + vmtruncate(inode, attr-ia_size); + if (inode-i_op inode-i_op-truncate) + inode-i_op-truncate(inode); + } mark_inode_dirty(inode); } diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c --- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999 +++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999 @@ -229,6 +229,8 @@ if ( !error ) { coda_vattr_to_iattr(inode, vattr); coda_cache_clear_inode(inode); + if (iattr-ia_valid ATTR_SIZE) + vmtruncate(inode, iattr-ia_size); } CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", inode-i_mode, error); diff -urN linux-2.3.20/fs/hpfs/inode.c linux-bird.truncate/fs/hpfs/inode.c --- linux-2.3.20/fs/hpfs/inode.cSun Sep 12 06:00:35 1999 +++ linux-bird.truncate/fs/hpfs/inode.c Mon Oct 11 10:55:25 1999 @@ -364,7 +364,6 @@ if (inode-i_sb-s_hpfs_root == inode-i_ino) return -EINVAL; if ((error = inode_change_ok(inode, attr))) return error; inode_setattr(inode, attr); - hpfs_write_inode(inode); return 0; }
Re: [RFC] truncate() (generic stuff)
hpfs - no problems, but there is something rather odd. Why on the Earth does it forcibly write inode to disk in the end of notify_change()? Immediately before that inode is dirtified by inode_setattr(). Mikulas, is there something special that makes you write it ASAP? HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent directory to update dirent. write_inode is called asynchronously and locking directory makes deadlocks. So, write_inode is NULL and inode is explicitly written on notify_change and file_release. [ Once you suggested a solution for this - keep in inode pointer to directory data, but I'm too lazy to implement it. HPFS is not performance critical ] Mikulas Patocka
Re: [RFC] truncate() (generic stuff)
On Mon, 11 Oct 1999, Mikulas Patocka wrote: I mean the following: assume that call of hpfs_truncate() (via -truncate()) had been moved into the inode_setattr() (i.e. happens immediately before the hpfs_write_inode() in hpfs_notify_change()). Will it have any bad consequences? Right now the sequence looks so: inode-i_size = newsize; hpfs_write_inode(inode); hpfs_truncate(inode); Could we swap the last steps? Probably yes, the first hpfs_write_inode is not needed at all. Aha. OK, since with this patch -truncate() is called only from inode_setattr() I'm leaving the call of hpfs_write_inode() in hpfs_notify_change() and removing it from hpfs_truncate(). You can do the same just by examining content of dnode (dirents have fnode numbers). They do, but lookup in icache is going to be more expensive. Besides, we will have to do it for every dirent when we are splitting dnode. And we do not care for inodes not in core. -clear_inode() would exclude the sucker from the lists, all inodes marked with I_FREEING (i.e. those that gave NULL upon igrab()) are skipped during the cyclic list searches. There is another clash if you use async write_inode. Somebody is modifying directory, btree operation is in progress and now - system decides to call write_inode. You can't wait in write_inode until directory is in consistent state, you just must patch dirent somwhere in memory and return. That means - inode must hold pointer to dirent on disk or in memory. No problems - in-core inode holds a location on disk, any dirent moving starts from setting it to 0 (with spinlock held - see above) and ends with setting it according to new location. write_inode(): write fnode part; retry: loc = location; if (!loc) return; get the block corresponding to loc; grab the spinlock; if (loc still points to the same block) { update dirent; mark it dirty; release block; release spinlock; return; } /* we had been moved */ release block; release spinlock; goto retry; That's it. This is what I've done in FAT and it doesn't take much code. It is possible to write it, but HPFS is currently very stable and race-free (the last race I had was bug in kernel) and such modifications would surely make a lot of problems. OK, I definitely agree with that. It's worth doing someday, but not now. Anyway, the new variant of truncate patch follows. Please, check whether you have problems with the hpfs part. Cheers, Al diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c --- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999 +++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999 @@ -246,7 +246,6 @@ inode-u.affs_i.i_protect = mode_to_prot(attr-ia_mode); inode_setattr(inode, attr); - mark_inode_dirty(inode); error = 0; out: return error; diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c --- linux-2.3.20/fs/attr.c Mon Jun 21 12:36:42 1999 +++ linux-bird.truncate/fs/attr.c Mon Oct 11 10:54:44 1999 @@ -62,8 +62,6 @@ inode-i_uid = attr-ia_uid; if (ia_valid ATTR_GID) inode-i_gid = attr-ia_gid; - if (ia_valid ATTR_SIZE) - inode-i_size = attr-ia_size; if (ia_valid ATTR_ATIME) inode-i_atime = attr-ia_atime; if (ia_valid ATTR_MTIME) @@ -75,6 +73,12 @@ if (!in_group_p(inode-i_gid) !capable(CAP_FSETID)) inode-i_mode = ~S_ISGID; } + if (ia_valid ATTR_SIZE) { + inode-i_size = attr-ia_size; + vmtruncate(inode, attr-ia_size); + if (inode-i_op inode-i_op-truncate) + inode-i_op-truncate(inode); + } mark_inode_dirty(inode); } diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c --- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999 +++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999 @@ -229,6 +229,8 @@ if ( !error ) { coda_vattr_to_iattr(inode, vattr); coda_cache_clear_inode(inode); + if (iattr-ia_valid ATTR_SIZE) + vmtruncate(inode, iattr-ia_size); } CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", inode-i_mode, error); diff -urN linux-2.3.20/fs/hpfs/file.c linux-bird.truncate/fs/hpfs/file.c --- linux-2.3.20/fs/hpfs/file.c Sun Sep 12 06:00:35 1999 +++ linux-bird.truncate/fs/hpfs/file.c Mon Oct 11 17:44:44 1999 @@ -55,7 +55,6 @@ i-i_hpfs_n_secs = 0; i-i_blocks = 1 + ((i-i_size + 511) 9); hpfs_truncate_btree(i-i_sb,