Re: [RFC] truncate() (generic stuff)

1999-10-12 Thread Stephen C. Tweedie

Hi,

On Mon, 11 Oct 1999 11:12:01 -0400 (EDT), Alexander Viro
[EMAIL PROTECTED] said:

   I began screwing around the truncate() stuff and the following is
 a status report/request for comments:
   a) call of -truncate() method (and vmtruncate()) had been moved
 into the notify_change(). It is triggered if ATTR_SIZE is set. Modified
 places: do_truncate(), nfsd_truncate(), nfsd_setattr() and hpfs_unlink().
 Semantics had been changed (fixed, AFAICS) for nfsd_* - truncation happens
 with capabilities dropped. BTW, nfsd_setattr() didn't call vmtruncate().
 Fixed.

The page cache has very strict ordering requirements on the
inode-i_size update, the vmtruncate call and the actual file mapping
truncate operation.  I'd be nervous about letting the fs do too much of
that itself: what exactly is the rationale behind the change?

   coda,nfs,ncpfs,smbfs - I'm bringing the call of vmtruncate() into
 the end of -notify_change() (if ATTR_SIZE is set, indeed). It preserves
 the current behaviour, but I'm not sure that it's OK. Those filesystems
 don't have -truncate() and do the equivalent theselves (from
 - notify_change()). Maybe vmtruncate() should be called earlier.

Given that we are already looking at adding back in the inode write lock
for truncate, that shouldn't be a problem.

--Stephen



Re: [RFC] truncate() (generic stuff)

1999-10-12 Thread Alexander Viro



On Tue, 12 Oct 1999, Stephen C. Tweedie wrote:

 Hi,
 
 On Tue, 12 Oct 1999 09:37:28 -0400 (EDT), Alexander Viro
 [EMAIL PROTECTED] said:
 
  Rationale was:
  a) get rid of code duplication and get all calls of -truncate()
  into the same place.
  b) make it in the same place that sets i_size.
  c) on many filesystems extending -truncate() may fail. Right now
  we have no way to report said fact (or any other errors, for that matter).
  Change will be easier if we will have _one_ place to look after.
 
 Yes, and I had to make similar changes in 2.2 to report such changes.
 Observing RLIMIT_FSIZE in the filesystem requires that, for example.
 However, 2.3 is a lot more sensitive to the ordering.
 
  IMHO it became cleaner - I can post the detailed proof that it
  preserves the ordering. It was a requirement from the very beginning.
 
 Fine, as long as we document somewhere exactly what the filesystem
 requirements are.

Filesystem can either:
a) set -notify_change to NULL or
b) call in -notify_change() inode_setattr() (which will do the
right thing wrt truncation - set i_size, call vmtruncate() and
-truncate()) and keep in mind that upon the return from inode_setattr()
file is already truncated or
c) take care to set i_size by hands and at the end of
-notify_change() (if it didn't fail so far and had been asked to change
size) do vmtruncate() followed by fs-specific metadata modifications
needed for truncation (if they are needed at all - right now none of the
filesystems in that category (nfs,coda,ncpfs and smbfs) has such need)

In other words, impact on existing filesystems was minimal. I've
added the call of vmtruncate() into the end of -notify_change() for coda,
nfs, ncpfs and smbfs + made a couple of cleanups in affs and hpfs (not
strictly needed - see the final variant of patch for details). That was
it.



[RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro

I began screwing around the truncate() stuff and the following is
a status report/request for comments:
a) call of -truncate() method (and vmtruncate()) had been moved
into the notify_change(). It is triggered if ATTR_SIZE is set. Modified
places: do_truncate(), nfsd_truncate(), nfsd_setattr() and hpfs_unlink().
Semantics had been changed (fixed, AFAICS) for nfsd_* - truncation happens
with capabilities dropped. BTW, nfsd_setattr() didn't call vmtruncate().
Fixed.
b) I'ld rather pushed the call of -truncate() deeper - into the
inode_setattr() if possible. Situation being:
adfs, efs, isofs, qnx4 and romfs - readonly, no problems at all.
devpts, autofs - no regular files, no problems.
ext2, ufs, minix, sysv, msdos, vfat, udf, umsdos - no problems.
ntfs - very odd. No -truncate(), but there is -write(). IOW,
call of sys_truncate() will happily pass and just change -i_size. Steve,
could you comment on that? No problems wrt this patch, but something is
very wrong.
procfs - there may be drago^Wbugs. That is, sys_truncate()
silently sets i_size to zero. sys_chmod() is also, erm, interesting.
Unrelated to the patch, though.
hpfs - no problems, but there is something rather odd. Why on the
Earth does it forcibly write inode to disk in the end of notify_change()?
Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
there something special that makes you write it ASAP?
affs - no problems, except the call of make_inode_dirty()
immediately after inode_setattr() (which dirtifies inode immediately
before exit). I'm removing the call.
hfs - probably OK, but I'ld really like to hear the confirmation -
is there any chance of troubles (from the HFS side, VFS is OK) if the call
of -truncate() will happen before the hfs_cat_mark_dirty()? (in
hfs_notify_change(), that is).
coda,nfs,ncpfs,smbfs - I'm bringing the call of vmtruncate() into
the end of -notify_change() (if ATTR_SIZE is set, indeed). It preserves
the current behaviour, but I'm not sure that it's OK. Those filesystems
don't have -truncate() and do the equivalent theselves (from
-notify_change()). Maybe vmtruncate() should be called earlier.

c) ext2-related parts (and general messing with pagecache for
directories) - still too unstable. IOW, crashes the kernel _fast_.

The following is the relatively stable part of the patch (no ext2
stuff). Please, look if you have any problems with it.
Cheers,
Al

diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c
--- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999
+++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999
@@ -246,7 +246,6 @@
inode-u.affs_i.i_protect = mode_to_prot(attr-ia_mode);
 
inode_setattr(inode, attr);
-   mark_inode_dirty(inode);
error = 0;
 out:
return error;
diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c
--- linux-2.3.20/fs/attr.c  Mon Jun 21 12:36:42 1999
+++ linux-bird.truncate/fs/attr.c   Mon Oct 11 10:54:44 1999
@@ -62,8 +62,6 @@
inode-i_uid = attr-ia_uid;
if (ia_valid  ATTR_GID)
inode-i_gid = attr-ia_gid;
-   if (ia_valid  ATTR_SIZE)
-   inode-i_size = attr-ia_size;
if (ia_valid  ATTR_ATIME)
inode-i_atime = attr-ia_atime;
if (ia_valid  ATTR_MTIME)
@@ -75,6 +73,12 @@
if (!in_group_p(inode-i_gid)  !capable(CAP_FSETID))
inode-i_mode = ~S_ISGID;
}
+   if (ia_valid  ATTR_SIZE) {
+   inode-i_size = attr-ia_size;
+   vmtruncate(inode, attr-ia_size);
+   if (inode-i_op  inode-i_op-truncate)
+   inode-i_op-truncate(inode);
+   }
mark_inode_dirty(inode);
 }
 
diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c
--- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999
+++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999
@@ -229,6 +229,8 @@
 if ( !error ) {
coda_vattr_to_iattr(inode, vattr); 
coda_cache_clear_inode(inode);
+   if (iattr-ia_valid  ATTR_SIZE)
+   vmtruncate(inode, iattr-ia_size);
 }
CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", 
   inode-i_mode, error);
diff -urN linux-2.3.20/fs/hpfs/inode.c linux-bird.truncate/fs/hpfs/inode.c
--- linux-2.3.20/fs/hpfs/inode.cSun Sep 12 06:00:35 1999
+++ linux-bird.truncate/fs/hpfs/inode.c Mon Oct 11 10:55:25 1999
@@ -364,7 +364,6 @@
if (inode-i_sb-s_hpfs_root == inode-i_ino) return -EINVAL;
if ((error = inode_change_ok(inode, attr))) return error;
inode_setattr(inode, attr);
-   hpfs_write_inode(inode);
return 0;
 }
 

Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Mikulas Patocka

   hpfs - no problems, but there is something rather odd. Why on the
 Earth does it forcibly write inode to disk in the end of notify_change()?
 Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
 there something special that makes you write it ASAP?

HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent
directory to update dirent. write_inode is called asynchronously and
locking directory makes deadlocks. So, write_inode is NULL and inode is
explicitly written on notify_change and file_release. 

[ Once you suggested a solution for this - keep in inode pointer to
directory data, but I'm too lazy to implement it. HPFS is not performance
critical ]

Mikulas Patocka



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Mikulas Patocka wrote:

  I mean the following: assume that call of hpfs_truncate() (via
  -truncate()) had been moved into the inode_setattr() (i.e. happens
  immediately before the hpfs_write_inode() in hpfs_notify_change()). Will
  it have any bad consequences? Right now the sequence looks so:
  inode-i_size = newsize;
  hpfs_write_inode(inode);
  hpfs_truncate(inode);
  Could we swap the last steps?
 
 Probably yes, the first hpfs_write_inode is not needed at all.

Aha. OK, since with this patch -truncate() is called only from
inode_setattr() I'm leaving the call of hpfs_write_inode() in
hpfs_notify_change() and removing it from hpfs_truncate().

 You can do the same just by examining content of dnode (dirents have fnode
 numbers). 

They do, but lookup in icache is going to be more expensive. Besides, we
will have to do it for every dirent when we are splitting dnode. And we
do not care for inodes not in core.

  -clear_inode() would exclude the sucker from the lists, all inodes marked
  with I_FREEING (i.e. those that gave NULL upon igrab()) are skipped during
  the cyclic list searches.
 
 There is another clash if you use async write_inode. Somebody is modifying
 directory, btree operation is in progress and now - system decides to call
 write_inode. You can't wait in write_inode until directory is in
 consistent state, you just must patch dirent somwhere in memory and
 return. That means - inode must hold pointer to dirent on disk or in
 memory.
No problems - in-core inode holds a location on disk, any dirent
moving starts from setting it to 0 (with spinlock held - see above) and
ends with setting it according to new location.
write_inode():
write fnode part;
retry:
loc = location;
if (!loc)
return;
get the block corresponding to loc;
grab the spinlock;
if (loc still points to the same block) {
update dirent;
mark it dirty;
release block;
release spinlock;
return;
}
/* we had been moved */
release block;
release spinlock;
goto retry;

That's it. This is what I've done in FAT and it doesn't take much code.

 It is possible to write it, but HPFS is currently very stable
and
 race-free (the last race I had was bug in kernel) and such modifications
 would surely make a lot of problems.

OK, I definitely agree with that. It's worth doing someday, but not now.

Anyway, the new variant of truncate patch follows. Please, check whether
you have problems with the hpfs part.
Cheers,
Al
diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c
--- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999
+++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999
@@ -246,7 +246,6 @@
inode-u.affs_i.i_protect = mode_to_prot(attr-ia_mode);
 
inode_setattr(inode, attr);
-   mark_inode_dirty(inode);
error = 0;
 out:
return error;
diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c
--- linux-2.3.20/fs/attr.c  Mon Jun 21 12:36:42 1999
+++ linux-bird.truncate/fs/attr.c   Mon Oct 11 10:54:44 1999
@@ -62,8 +62,6 @@
inode-i_uid = attr-ia_uid;
if (ia_valid  ATTR_GID)
inode-i_gid = attr-ia_gid;
-   if (ia_valid  ATTR_SIZE)
-   inode-i_size = attr-ia_size;
if (ia_valid  ATTR_ATIME)
inode-i_atime = attr-ia_atime;
if (ia_valid  ATTR_MTIME)
@@ -75,6 +73,12 @@
if (!in_group_p(inode-i_gid)  !capable(CAP_FSETID))
inode-i_mode = ~S_ISGID;
}
+   if (ia_valid  ATTR_SIZE) {
+   inode-i_size = attr-ia_size;
+   vmtruncate(inode, attr-ia_size);
+   if (inode-i_op  inode-i_op-truncate)
+   inode-i_op-truncate(inode);
+   }
mark_inode_dirty(inode);
 }
 
diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c
--- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999
+++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999
@@ -229,6 +229,8 @@
 if ( !error ) {
coda_vattr_to_iattr(inode, vattr); 
coda_cache_clear_inode(inode);
+   if (iattr-ia_valid  ATTR_SIZE)
+   vmtruncate(inode, iattr-ia_size);
 }
CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", 
   inode-i_mode, error);
diff -urN linux-2.3.20/fs/hpfs/file.c linux-bird.truncate/fs/hpfs/file.c
--- linux-2.3.20/fs/hpfs/file.c Sun Sep 12 06:00:35 1999
+++ linux-bird.truncate/fs/hpfs/file.c  Mon Oct 11 17:44:44 1999
@@ -55,7 +55,6 @@
i-i_hpfs_n_secs = 0;
i-i_blocks = 1 + ((i-i_size + 511)  9);
hpfs_truncate_btree(i-i_sb,