Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops

2020-10-09 Thread Alexander Viro
On Fri, Oct 09, 2020 at 06:29:13PM -0700, Linus Torvalds wrote:
> On Fri, Oct 9, 2020 at 6:19 PM Eric Biggers  wrote:
> >
> > Okay, that makes more sense.  So the patchset from Matthew
> > https://lkml.kernel.org/linux-fsdevel/20201003025534.21045-1-wi...@infradead.org/T/#u
> > isn't what you had in mind.
> 
> No.
> 
> That first patch makes sense - it's just the "ppos can be NULL" patch.
> 
> But as mentioned, NULL isn't "shorthand for zero". It's just "pipes
> don't _have_ a pos, trying to pass in some explicit position is
> crazy".
> 
> So no, the other patches in that set are a bit odd, I think.
> 
> SOME of them look potentially fine - the bpfilter one seems to be
> valid, for example, because it's literally about reading/writing a
> pipe. And maybe the sysctl one is similarly sensible - I didn't check
> the context of that one.

FWIW, I hadn't pushed that branch out (or merged it into #for-next yet);
for one thing, uml part (mconsole) is simply broken, for another...
IMO ##5--8 are asking for kernel_pread() and if you look at binfmt_elf.c,
you'll see elf_read() being pretty much that.  acct.c, keys and usermode
parts are asking for kernel_pwrite() as well.

I've got stuck looking through the drivers/target stuff - it would've
been another kernel_pwrite() candidate, but it smells like its use of
filp_open() is really asking for trouble, starting with symlink attacks.
Not sure - I'm not familiar with the area, but...



Re: [PATCH] NFS: Stop sillyname renames and unmounts from racing

2007-11-06 Thread Alexander Viro
On Tue, Nov 06, 2007 at 10:24:50AM +0200, Benny Halevy wrote:

> It'd be very nice if the silly renamed inodes (with silly_count > 1) were 
> moved
> to a different list in the first pass, under the inode_lock, and then waited 
> on
> until silly_count <= 1 in a second pass only on the filtered list.  This will
> provide you with O(1).

It's absolutely pointless, starting with any kind of searching for inodes,
etc.  If you want fs shutdown _not_ to happen until async activity of
that kind is over, don't reinvent the sodding wheels, just tell VFS that
you are holding an active reference to superblock.  End of story.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NFS: Stop sillyname renames and unmounts from racing

2007-11-05 Thread Alexander Viro
On Mon, Nov 05, 2007 at 09:06:36PM -0800, Andrew Morton wrote:
> > Any objections to exporting the inode_lock spin lock?
> > If so, how should modules _safely_ access the s_inode list?

> That's going to make hch unhappy.

That's going to make me just as unhappy, especially since it's pointless;
instead of the entire sorry mess we should just bump sb->s_active to pin
the superblock down (we know that it's active at that point, so it's just
an atomic_inc(); no games with locking, etc., are needed) and call
deactivate_super() on the way out.  And deactivate_super() is exported
already.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nfsroot uses bogus mountd version for NFSv2

2001-07-19 Thread Alexander Viro

nfsroot uses bogus protocol version when it asks portmapper on
server for mountd port. Fix is obvious:

--- linux/fs/nfs/nfsroot.cFri Feb 16 18:56:03 2001
+++ linux/fs/nfs/nfsroot.c.new  Thu Jul 19 23:55:09 2001
@@ -418,7 +418,7 @@
"as nfsd port\n", port);
}
 
-   if ((port = root_nfs_getport(NFS_MNT_PROGRAM, nfsd_ver, proto)) < 0) {
+   if ((port = root_nfs_getport(NFS_MNT_PROGRAM, mountd_ver, proto)) < 0) {
printk(KERN_ERR "Root-NFS: Unable to get mountd port "
"number from server, using default\n");
port = mountd_port;

Notice that for NFSv3 both nfsd and mountd are using version 3, so it both
nfsd_ver == mountd_ver. However, for NFSv2 we end up asking for mountd
version 2, which doesn't exist - mountd version for NFSv2 was 1.

Looks like this typo got into the tree in 2.3.99-4-pre3 when NFSv3 had
been merged into the tree - until then we had (correctly) asked for
version 1. Corresponding code in 2.2 is using mountd_ver, so it's also
OK.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Acpi] Re: ACPI fundamental locking problems

2001-07-08 Thread Alexander Viro



On Sat, 7 Jul 2001, Jamie Lokier wrote:

> Daniel Phillips wrote:
> > > Reading a tarball is the distillation of what you describe into
> > > efficient form :)
> > 
> > /me downloads tar file definition
> > 
> > Um, gnu tar or posix tar? or some new, improved tar?
> 
> I suggest cpio, which is more compact and in some ways more standard.
> (tar has a silly pad-to-multiple-of-512-byte per file rule, which is
> inappropriate for this).  GNU cpio creates cpio format just fine.

GNU cpio is a race-ridden unmaintained pile of junk. Look at the size
of, say it, Debian patch to upstream source. Then try to read the
patched code.  Quite a few of us simply don't have that FPOS on their
boxen.

Using cpio archive layout is OK, but _please_, don't make it dependent
on GNU cpio.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Acpi] Re: ACPI fundamental locking problems

2001-07-07 Thread Alexander Viro



On 7 Jul 2001, Eugene Crosser wrote:

> Doesn't the approach "treat a chunk of data built into bzImage as
> populated ramfs" look cleaner?  No need to fiddle with tar format,
> no copying data from place to place.

What the hell _is_ "populated ramfs"? The thing doesn't live in array
of blocks. Its directory structure consists of a bunch of dentries.
Permissions/ownership/timestamps are in a bunch of struct inode -
sitting in icache and allocated in normal way. Regular files are
entirely in pagecache, ditto for symlinks.

Ramfs has no backing store. At all. That's precisely what remains of
filesystem if you take backing store away - everything is in VFS/VM caches.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Acpi] Re: ACPI fundamental locking problems

2001-07-05 Thread Alexander Viro



On Thu, 5 Jul 2001, Helge Hafting wrote:

> Linus Torvalds wrote:
> [...]
> > We migth want to just make initrd a built-in thing in the kernel,
> > something that you simply cannot avoid. A lot of these things (ie dhcp for
> > NFS root etc) are right now done in kernel space, simply because we don't
> > want to depend on initrd, and people want to use old loaders.
> > 
> > I don't like the current initrd very much myself, I have to admit. I'm not
> > going to accept a "you have to have a ramdisk" approach - I think the
> > ramdisks are really broken.
> > 
> > But I've seen a "populate ramfs from a tar-file built into 'bzImage'"
> > patch somewhere, and that would be a whole lot more palatable to me.
> > 
> > If anybody were to send me a patch that just unconditionally does this, I
> > would probably not be adverse to putting it into 2.5.x. We have all the
> > infrastructure to make all this a lot cleaner than it used to be (ie the
> > "pivot_root()" stuff etc means that we can _truly_ do things from user
> > mode, with no magic kernel flags).

Open 2.5 and I'm starting to feed that stuff in pieces...

> I am fine with "You have to use initrd (or similiar) _if_ you want this
> feature."

"Similar" == ramfs.

> But please don't make initrd mandatory for those of us who don't
> need ACPI, don't need dhcp before mounting disks and so on.

How about "don't want to keep special-case code for mounting root in your
kernel"? It's more than ramfs, BTW, and rm(1) on ramfs frees memory just
fine.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ufs on linux question/problem

2001-07-03 Thread Alexander Viro



On Tue, 3 Jul 2001, Admin Mailing Lists wrote:

> 
> Trying to mount a solaris x86 drive under linux.
> kernel 2.4.5, ufs support and x86 partition support compiled in (no
> module)
> On boot, linux recognizes the drive, but shows no solaris partitions on
> it.
> Below, linux drive is hda, solaris is hdb.

You need support of Solaris disklabels. And UFS patches that are in
-ac. Then you can get more or less safe r/o mounts. r/w is hopeless
at that stage.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Recent change in directory g+s behavior (bug?)

2001-07-03 Thread Alexander Viro



On Tue, 3 Jul 2001, Ken Brownfield wrote:

> Somewhere between 2.4.5-pre1 and 2.4.6-pre3, the behavior of the setgid
> bit on directories has changed:

Fsck... Linus, please apply the patch below. That's a bug in
ext2_new_inode() that used to be hidden by redundant code in ext2_mkdir().

Notice that current code in ext2_new_inode() makes no sense at all -
the only reason why gcc doesn't scream bloody murder is that we have (unrelated)
S_ISLNK(mode) several lines below.

--- fs/ext2/ialloc.cTue Jun  5 09:24:52 2001
+++ fs/ext2/ialloc.c.fixTue Jul  3 05:38:37 2001
@@ -417,7 +417,6 @@
cpu_to_le32(le32_to_cpu(es->s_free_inodes_count) - 1);
mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
sb->s_dirt = 1;
-   inode->i_mode = mode;
inode->i_uid = current->fsuid;
if (test_opt (sb, GRPID))
inode->i_gid = dir->i_gid;
@@ -427,6 +426,7 @@
mode |= S_ISGID;
} else
inode->i_gid = current->fsgid;
+   inode->i_mode = mode;
 
inode->i_ino = j;
inode->i_blksize = PAGE_SIZE;   /* This is the optimal IO size (for stat), not 
the fs block size */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: A Possible 2.5 Idea, maybe?

2001-06-30 Thread Alexander Viro



On Sat, 30 Jun 2001, Philips wrote:

>   If I could choose what filesystem to run on / - it impact performance greatly

No, it doesn't. Most of lookups go outside of root and within root you
mostly deal with cached lookups from dcache (which doesn't give a damn for
fs type) and with page cache lookups for data (mostly in libc) (ditto).

[snip]

>   This would be one little step toward the microkernel architecture (like Hurd).
> Good again :-)

Hurd and architecture in one sentence? Uh-oh...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VFS locking & HFS problems (2.4.6pre6)

2001-06-29 Thread Alexander Viro



On Fri, 29 Jun 2001, Benjamin Herrenschmidt wrote:

> The deadlock happen in the HFS filesystem in hfs_cat_put(), apparently
> (quickly looking at addresses) in spin_lock().


Uh-oh. Looks like hfs_cat_put() grabs some internal spinlock and calls
write_entry(). If it really is what its name implies, you are calling
a blocking function under the spinlock.

> So my question: Is there any document explaining the various locking
> requirements & re-entrency possibilities in a filesystem.

There is, but this bug has nothing fs-specific in it. You should never
block while holding a spinlock.

BTW, looks like 2.2 has the same bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: directory order of files

2001-06-29 Thread Alexander Viro



On Fri, 29 Jun 2001, Alan Cox wrote:

> > With Linux ext2, and some other systems, when you create files in a
> > new directory, the file system remembers their order:
> 
> No - it merely seems too. 
> 
> > $ touch one two three four
> > $ ls -U
> > one  two  three  four
> 
> Then try 'rm three; touch five'

Moreover, it isn't true even for the case when we create a list of files
in empty directory. Example: assuming that /tmp has 1Kb blocks,

mkdir /tmp/A
cd A
touch `perl -e 'print "a"x255'`
touch `perl -e 'print "b"x255'`
touch `perl -e 'print "c"x255'`
touch `perl -e 'print "d"x255'`
touch A
ls -U

will give you (lots of a) (lots of b) (lots of c) A (lots of d).

With 4Kb blocks you'll need 16 long names instead of 4 - the effect
will be the same.

The reason is quite simple - at some point you get no space for long
name and it goes into the next directory block, but there's still enough
for a short name, so it gets created in the first block.

IOW, there's no warranties at all.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Bug in 2.4.5 in proc_pid_make_inode ()

2001-06-28 Thread Alexander Viro



On Thu, 28 Jun 2001, Martin Wilck wrote:

> Hi,
> 
> I have recently experienced a number of kernel OOPSes
> in "top" under heavy load. Kernel is 2.4.5 (IA64, but
> this has nothing to do the IA64 patch).
> 
> The OOPS happens in the call tree
> 
> open () system call
> [...]
> real_lookup ()
> proc_base_lookup ()
> proc_pid_make_inode ()
> iput ()
> proc_delete_inode () -> OOPS in __MOD_DEC_USE_COUNT

Known, had been already fixed in 2.4.6-pre3.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Maximum mountpoints + chrooted login

2001-06-27 Thread Alexander Viro



On Wed, 27 Jun 2001, Magnus Naeslund(f) wrote:

> I'll wait for 2.5 then...
> Where's that namespace patch located?

The last one I've put on anonftp was against 2.4.6-pre2 (namespaces-a-S6-pre2,
on ftp.math.psu.edu/pub/viro). It still includes tons of fs/super.c cleanups
and fixes - they still need to be merged into the tree.

> Now in 2.4.5 it's darn slow to _unmount_, it's like 100 times faster to
> mount than unmount :)

Erm... The last umount should sync everything on given fs. You don't
read a hundred megabytes upon mount but you can easily get such amount
of dirty data after working for a while ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: mounting a fs in two places at once?

2001-06-27 Thread Alexander Viro



On Thu, 28 Jun 2001, Chris Wedgwood wrote:

> On Mon, Jun 25, 2001 at 02:20:16AM -0700, Ben Ford wrote:
> 
> > Feature.  It actually makes it quite nice when you want to allow
> > chrooted user(s) access to a common directory, you just mount a
> > partition in all the users home dirs.
> 
> For security, this can be a bad idea.
> 
> Potentially, chrooted user can mess with another, by messing with
> libraries and such like. In most cases not terribly easy, but in some
> cases possible.

If chrooted user had gained root - he can do much more damage than that.
If your libraries are world-writable - you had asked for that, hadn't
you?

> No, if the fs was mounted RO, then I assume you would have less to
> worry about. Its a pity the VFS code doesn't allow you to fix RO & RW
> of the same fs.

 2.5 stuff. Requires extra argument on getattr/setattr/permission -
prototype change on 3 methods for something that is a feature and not a
fix for any specific bug...

If you want root-proof analog of chroot - fine, but that will require
at least taking away the ability to mount/umount anything. Otherwise
attacker will simply be able to remount everything he want r/w once he
had gained root. That can be done (e.g. by adding "can modify" flag
to namespace and doing something along the lines

pid = clone(CLONE_NAMESPACE, NULL);
if (!pid) {
/* do all needed mount/umount work */
pid = clone(CLONE_FREEZE_NAMESPACE, NULL);
if (!pid) {
/* we are set */
}
exit(0);
}

which would give grandchild a namespace we want it to see and prohibit
any changes in said namespace, root or not)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Maximum mountpoints + chrooted login

2001-06-27 Thread Alexander Viro



On Wed, 27 Jun 2001, Magnus Naeslund(f) wrote:

> I was thinking of doing a chrooted login for some ssh accounts.
> The plan is this:

[snip CLONE_NAMESPACE-by-hands]
 
> Does this seem like a bad idea?
> (then please tell me why :))

Mostly because there's a better way to do that. Yes, such scheme would
work (that + massive pending fs/super.c cleanups was the main reason why
I didn't go for proper solution in 2.4.0-test*). However, instead of
crufting up kinda-sorta namespaces one could use the real thing. Relevant
cleanups of superblock handling will go in in 2.5.very_early and the
rest of patch (namespace proper) takes about 10Kb.

You can simply say clone(CLONE_NAMESPACE,NULL) and you get an independent
set of mounts to play with. mount/umount whatever you want before dropping
the root priveleges. All children of that process will share its namespace.
When the last one goes away everything will be garbage-collected - no
need to umount anything on logout.

> One problem could be the _massive_ mounts, 3*online_users.
> Are there any limits/drawbacks doing it like this?

With the mntcache in - not really. It fixes the main performance problem.
Memory cost is sizeof(struct vfsmount)*total amount of mountpoints. I.e.
about 100 bytes per mountpoint. That's it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-27 Thread Alexander Viro



On Wed, 27 Jun 2001, Chris Wedgwood wrote:

> On Tue, Jun 26, 2001 at 09:40:36PM -0400, Alexander Viro wrote:
> 
> > You need /dev/zero to get anywhere near the normal behaviour of the
> > system.
> 
> Not commenting on the original patch, I think requiring /dev/zero for
> a 'usable' system should be considered a [g]libc bug. /dev/zero should
> be present, but if not, [g]libc should have fall-back mechanisms to
> deal with things.

Frankly, glibc already has too many fall-back mechanisms of various kinds.
Several things Should Be There(tm). /dev/zero, /dev/null and /dev/tty are
definitely among them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-26 Thread Alexander Viro



On Tue, 26 Jun 2001, Paul Menage wrote:

> But only root can set this up, since you currently have to be root in
> order to chroot(). The (only) advantage of the user chroot() patch would
> be that users would be able to do the same thing without root
> intervention.

You need to be root to do mknod. You need to do mknod to create /dev/zero.
You need /dev/zero to get anywhere near the normal behaviour of the system.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[RFC] Checks in ext2_new_block()

2001-06-26 Thread Alexander Viro

Ted, could you comment on sanity checks in ext2_new_block()?
a)
if (tmp == le32_to_cpu(gdp->bg_block_bitmap) ||
tmp == le32_to_cpu(gdp->bg_inode_bitmap) ||
in_range (tmp, le32_to_cpu(gdp->bg_inode_table),
  sb->u.ext2_sb.s_itb_per_group))
ext2_error (sb, "ext2_new_block",
"Allocating block in system zone - "
"block = %u", tmp);

will go ahead and return the block. Looks like we can do better than that
if we mark it in use (we do that anyway), decremnt relevant free blocks
counters (global and cylinder group one) and goto repeat;

b) we don't do similar checks for blocks we grab in preallocation loop.
And ext2_alloc_block() doesn't do such checks either.

c)
if (ext2_set_bit (j, bh->b_data)) {
ext2_warning (sb, "ext2_new_block",
  "bit already set for block %d", j);
DQUOT_FREE_BLOCK(sb, inode, 1);
goto repeat;
}
is of the "if memory got corrupted during the last dozens of cycles" variety -
we had seen that bit 0 several lines before and we couldn't even block during
that interval (not that it mattered much, since all modifications of these
bitmaps are under lock_super() anyway).

d)
if (j >= le32_to_cpu(es->s_blocks_count)) {
ext2_error (sb, "ext2_new_block",
"block(%d) >= blocks count(%d) - "
"block_group = %d, es == %p ",j,
le32_to_cpu(es->s_blocks_count), i, es);
goto out;
}
is a bit too late _and_ we don't do anything similar for preallocated blocks.

The question being: which of these checks deserve to stay ((c) doesn't, IMO)
and which deserve to be extended to preallocation? If we do them for
main path, we ought to be at least consistent...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: mounting a fs in two places at once?

2001-06-24 Thread Alexander Viro



On Sun, 24 Jun 2001, Marty Leisner wrote:

> I just installed redhat 7.1 on a system.
> 
> Cleaning up, a made a fs for home...(mounted on /mnt
> to write the stuff to it)
> 
> Then I accidently mounted it on /home.
> 
> So it was mounted on /home and /mnt at the same time.
> (I didn't bother going in to see what was there).

Same tree, obviously.

> Shouldn't this NOT happen?

Sigh... Guys, who maintains l-k FAQ?

Q: I've mounted filesystem in two different places and it worked. Why?
A: Because you've asked to do that. Yes, it works. No, it's not a bug.

Q: what should I do to unmount it?
A: umount 

Q: but that took care only of one of them. How can I deal with another?
A: umount 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: The Joy of Forking

2001-06-24 Thread Alexander Viro



On Sun, 24 Jun 2001, George Bonser wrote:

> > no SMP
> > x86 only (and similar, e.g. Crusoe)
> 
> Never 

YHBT. YHL.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT]Re: One more ZDNet article with BillG hammering Linux andOpen Source.

2001-06-23 Thread Alexander Viro



On 22 Jun 2001, Miles Lane wrote:

> It would be great to see the "Shared Source" licenses that Microsoft has 
> made people sign.  It would be especially interesting to compare the

It would be great to see you learning WTF "offtopic" means and taking the
advocacy crap to the places where it belongs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.5-ac16 -- "proc_get_inode" still unresolved in /net/wan/comx.o

2001-06-22 Thread Alexander Viro



On Tue, 19 Jun 2001, Miles Lane wrote:

> 
> depmod: *** Unresolved symbols in 
>/lib/modules/2.4.5-ac16/kernel/drivers/net/wan/comx.o
> depmod:   proc_get_inode

And it won't be exported. Moreover, it has a very good chance to become
static.

If you have the hardware in question and are willing to help with
testing I would be rather grateful. I'm rewriting filesystem side of
the driver (along with fixing rmmod races, etc.) and testers will be
needed somewhere in the middle of next week.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: What happened to lookup_dentry?

2001-06-22 Thread Alexander Viro



On Tue, 19 Jun 2001, Timur Tabi wrote:

> Well, I didn't write the driver that I'm trying to port, so it's a little
> difficult.  The code in question is:
> 
> struct dentry *   de = lookup_dentry(zfcdb[i].fullname, NULL, LOOKUP_FOLLOW);
> if (IS_ERR(de))
>   continue;
> if (de != zfcdb[i].dentry) 
> {
>   print("zfc: dentry changed for %s\n", zfcdb[i].fullname);
>   zfc_file_init(&zfcdb[i], de);
> }
> 
> So it appears it's just checking to see if the dentry for a particular file has
> changed.

Apparently, more than that. You'll need at least vfsmount in addition to
dentry. Could you send me the source? In principle, situation looks like
you need path_init() and path_walk(), but you almost definitely will need
to make changes in more places than that.

It should be easy to fix, but it's easier to mark the places that need
fixing in the source than try to describe how to find them ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Alan Cox quote? (was: Re: accounting for threads)

2001-06-21 Thread Alexander Viro



On Thu, 21 Jun 2001, Alexander Viro wrote:

> 
> 
> On Thu, 21 Jun 2001, Rusty Russell wrote:
> 
> > Disagree.  A significant percentage of the netfilter bugs have been
> > SMP only (the whole thing is non-reentrant on UP).
> 
> I really doubt it.  
> Well, if you use GFP_ATOMIC for everything... grep...
> Erm... AFAICS, you call create_chain() with interrupts disabled
> (under write_lock_irq_save()). Unless I'm _very_ mistaken,
> kmalloc(..., GFP_KERNEL) is a Bad Thing(tm) in that situation.
> And create_chain() leads to it.

BTW, proc_net_create() is also not a good idea if you block the interrupts.
Ditto for netlink_kernel_create(), AFAICS (due to netlink_kernel_creat() ->
sock_alloc() -> get_empty_inode() -> kmem_cache_alloc() with SLAB_KERNEL).

That, BTW, is a nice illustration - it's easy to get a preemption point
without noticing, so holding spinlocks, let alone disabling interrupts
over the large area is going to hurt like hell.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Alan Cox quote? (was: Re: accounting for threads)

2001-06-21 Thread Alexander Viro



On Thu, 21 Jun 2001, Rusty Russell wrote:

> Disagree.  A significant percentage of the netfilter bugs have been
> SMP only (the whole thing is non-reentrant on UP).

I really doubt it.  
Well, if you use GFP_ATOMIC for everything... grep...
Erm... AFAICS, you call create_chain() with interrupts disabled
(under write_lock_irq_save()). Unless I'm _very_ mistaken,
kmalloc(..., GFP_KERNEL) is a Bad Thing(tm) in that situation.
And create_chain() leads to it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Controversy over dynamic linking -- how to end the panic

2001-06-21 Thread Alexander Viro



On Thu, 21 Jun 2001, Timur Tabi wrote:

> In my opinion, this whole thing would just go away (including some of
> Microsoft's anti-GPL rants), if the FSF officially declared that under the GPL,
> #including a GPL header file does NOT force your code to be also GPL.

The problem being, there is no such thing as header file from C point of view.
I can do

cat >my_file.c 

Re: rename problem on vfat file systems

2001-06-21 Thread Alexander Viro



On Thu, 21 Jun 2001, abc abc wrote:

> If I reboot the machine just after the rename() call
> is completed, when the machine comes up the file
> /mnt/sns-c/segments/segfile has zero bytes and there
> is no file in the tmp directory. Effectively the file
> is lost some where. Running fsck recovers the file,
> but it doesn't help me much because I would be copying
> hundreds of files and its difficult to match the
> files.
> 
> Can you think of any thing that might be causing this.

Crappy filesystem layout. If you want to do something a-la journalling
for VFAT - seek professional help.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] remove null register_disk

2001-06-20 Thread Alexander Viro



On Wed, 20 Jun 2001 [EMAIL PROTECTED] wrote:

> In fs/partitions/check.c we read
> 
> void register_disk(struct gendisk *gdev, kdev_t dev, unsigned minors,
> struct block_device_operations *ops, long size)
> {
> if (!gdev)
> return;
> grok_partitions(gdev, MINOR(dev)>>gdev->minor_shift, minors, size);
> }
> 
> showing that register_disk is void when its first argument is NULL.
> This allows one to remove some dead code.
> Can be applied to 2.4. No behaviour is changed.

That's simply wrong. We will need register_disk(). Reinserting it into the
right places in 2.5 is a unnecessary PITA.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Alan Cox quote? (was: Re: accounting for threads)

2001-06-20 Thread Alexander Viro



On Wed, 20 Jun 2001, george anzinger wrote:

> > around we _will_ get problems. Kernel UP programming is not different
> > from SMP one. It is multithreaded. And amount of genuine SMP bugs is
> > very small compared to ones that had been there on UP since way back.
> > And yes, programming threads is the same thing. No arguments here.
> > 
> Correct, IF the UP kernel is preemptable.  As long as it is not (and SMP
> is ignored) threads are harder BECAUSE they are preemptable.

In practice it's a BS. There is a lot of ways minor modifications of code
could add a preemption point, so if you rely on the lack of such - expect
major PITA.

Yes, in theory SMP adds some extra fun. Practically, almost every "SMP"
race found so far did not require SMP.

Clean code is trivial to make SMP-safe - critical areas that rely on
lack of preemption are couple of instructions wide and are easy to
protect. Anything trickier and I bet that you have a race on (normal)
UP kernel. Been there, found probably several hundreds of them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Threads are processes that share more

2001-06-20 Thread Alexander Viro



On Wed, 20 Jun 2001, bert hubert wrote:

> Rounding up, it may be worth repeating what I think Alan said some months
> ago:
> 
> Threads are processes that share more

... and for absolute majority of programmers additional shared objects mean
additional fsckup sources.  I don't trust them to write correct async code.
OK, so I don't trust the majority of programmers to find their dicks if
you take their Visual Masturbation Aid++ away, but that's another story -
I'm talking about otherwise clued people, not burger-flippers armed with
Foo For Complete Dummies in 24 Hours.

> And if we just keep bearing that out to everybody a lot of the myths will go
> away. I would suggest that the pthreads manpages get this attitude.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.5 corruption (again)

2001-06-19 Thread Alexander Viro



On Tue, 19 Jun 2001, Larry McVoy wrote:

> OK, my corruption is back and this time I'm saving the data.  Al, send some 
> email when you are around, we can talk about access to the data.  I'm tarring

Doing that.

> up both good & bad right now.  I've looked at a few files and they look
> "shifted".
> 
>   extra junk
>   original file less sizeof(extra junk) bytes
> 
> The machine has been up 6 days since the last corruption happened and the
> process which detected the corruption ran successfully every night as well
> as about 4 times by hand after my last corroption report.  

Lovely. Are these files longer than 4Kb, BTW?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: What happened to lookup_dentry?

2001-06-18 Thread Alexander Viro



On Mon, 18 Jun 2001, Timur Tabi wrote:

> I'm porting a driver from 2.2 to 2.4, and this driver calls lookup_dentry,
> which doesn't exist in 2.4.  I've read through the source code and searched the
> web and newsgroups, and I can't find any explanation as to why lookup_dentry no
> longer exists or how I'm supposed to change code that uses it.  Can anyone help
> me?

It depends on what kind of use 2.2 code had for it. There are several
situations in which it used to be called and proper replacements depend
on the context. Details, please... (alternatively, send an URL of patch
and I'll see what to do with the thing)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] devfs v181 available

2001-06-18 Thread Alexander Viro



On Mon, 18 Jun 2001, Richard Gooch wrote:

> > Irrelevant. BKL provides an exclusion only on non-blocking areas.
> 
> Yeah, I know all that.

So what the hell are you talking about?

> > _Moved_ them there from the callers of these functions. And AFAICS
> > you do need BKL for get_devfs_entry_...(); otherwise relocation of
> > the table will be able to screw you inside of that function. Now, it
> > will merrily screw you anyway in a lot of places, but that's another
> > story.
> 
> OK, so it was another global change.

Moving BKL into the ->readlink() and ->follow_link()? Sure, it was a global
change. About a year ago.

> Question: assuming data fed to vfs_follow_link() is "safe", does it

> need the BKL? I can see that vfs_readlink() obviously doesn't need
> it. From reading Documentation/filesystems/Locking I suspect it
> doesn't need the BKL, but the way I read it says "follow_link() method
> does not *have* the BKL already". But that doesn't explicitely say
> whether vfs_follow_link() needs it.

vfs_follow_link() doesn't need it. Moreover, if data fed to it is unsafe
without BKL, you are screwed even if you take BKL. So assumption above
is bogus - you _never_ need BKL on that call.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Newbie idiotic questions.

2001-06-18 Thread Alexander Viro



On Mon, 18 Jun 2001, Roman Zippel wrote:

> > I wouldn't call it "rather popular".
> 
> You should also grep for '__typeof__'. :-)

Yeeeccchhh. OK, there is more of that. However, the main user of that
beast is, AFAICS, get_user()/put_user() and their ilk in include/asm-*
The rest looks very bogus...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: function of getname() function

2001-06-18 Thread Alexander Viro



On Mon, 18 Jun 2001, SATHISH.J wrote:

> Hi,
> 
> Sorry if this question is too silly.
> 
> I could not understand what getname(filename) function in the sys_open()
> function is doing. I could not understand from the code what exactly it is
> doing. Please help me with the same.

It allocates a buffer and copies file name from user memory to that buffer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] devfs v181 available

2001-06-17 Thread Alexander Viro



On Mon, 18 Jun 2001, Richard Gooch wrote:

> Alexander Viro writes:
> > 
> > 
> > On Mon, 18 Jun 2001, Richard Gooch wrote:
> > 
> > > - Widened locking in  and 
> > 
> > No, you hadn't. Both vfs_readlink() and vfs_follow_link() are blocking
> > functions, so BKL is worthless there.
> 
> Huh? The BKL will protect against other operations which might cause
> the devfs entry to be unregistered, where those other operations also
> grab the BKL. So, it's an improvement.

BKL is released as soon as you block. You _do_ regain it when you get
the next timeslice, but in the meanwhile anything could happen.

> Sure, some operations may cause unregistration without grabbing the

Irrelevant. BKL provides an exclusion only on non-blocking areas.

> BKL, but that's orthogonal (and requires more extensive changes). If
> this "widening" is of no use, then what use are the existing grabs of
> the BKL in those functions? You're the one who added them in the first
> place.

_Moved_ them there from the callers of these functions. And AFAICS you
do need BKL for get_devfs_entry_...(); otherwise relocation of the
table will be able to screw you inside of that function. Now, it will
merrily screw you anyway in a lot of places, but that's another story.

BTW, free advice: when you are checking some condition treat the result
as something that can expire. And don't rely on it past the moment when
it might expired. E.g. in case of de->registered result expires as soon
as you do unlock_kernel() _or_ do anything that might block.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] devfs v181 available

2001-06-17 Thread Alexander Viro



On Mon, 18 Jun 2001, Richard Gooch wrote:

> - Widened locking in  and 

No, you hadn't. Both vfs_readlink() and vfs_follow_link() are blocking
functions, so BKL is worthless there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Newbie idiotic questions.

2001-06-17 Thread Alexander Viro



On Sun, 17 Jun 2001, Daniel Phillips wrote:

> typeof?  It's rather popular in the kernel already.  Besides, who is going to 

Really? 5 instances in PPC arch-specific code, 1 (absolutely gratitious)
in drivers/mtd, 2 - in m68k (also useless), 4 - in drivers/video, 2 -
in AFFS and 1 - in netfilter.

I wouldn't call it "rather popular".

> compile this with anything other than gcc?

> 
> I don't see your point about greppability.

You are making the types it is applied to harder to deal with wrt. global
search.

But the real issue here is that preprocessor is not a way to get
polymorphism. And that would be the only context where typeof might
have any use. Trying to turn C into the things it isn't is always a bad
idea - had been proven many times. starting at least with Bourne shel
(check the v7 sh source if you don't know what I'm refering to).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Newbie idiotic questions.

2001-06-17 Thread Alexander Viro



On Sun, 17 Jun 2001, Daniel Phillips wrote:

> > macro that behaves like `new' in C++:
> > | #define knew(type, flags) (type *)kmalloc(sizeof(type), (flags))
> >
> > If the types in the assignment don't match, gcc will tell you.
> 
> Well, since we are still beating this one to death, I'd written a "knew" 
> macro as well, and put it aside.  It does the assignment for you too:
> 
>#define knew(p) ((p) = (typeof(p)) kmalloc(sizeof(*(p)), GFP_KERNEL))
 
> Terse and clear at the same time, and type safe.  I still don't like it much. 

And ungreppable, not to mention gratitious use of GNU extension.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reg:magic number of the filesystem

2001-06-17 Thread Alexander Viro



On Sun, 17 Jun 2001, SATHISH.J wrote:

> Hi,
> 
> Every file system has a magic number. Can you please tell me what for this
> magic number is used. When do we really use this unique magic number of
> the file system and why?

find . -name *.[chS] >/tmp/list
xargs http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reg:use of file_system_type structure

2001-06-17 Thread Alexander Viro



On Sun, 17 Jun 2001, SATHISH.J wrote:

> Hi,
> Every file system has file_system_type structure defined. Where else this
> structure is referred. Does register_filesystem() refer this structure.
> Does sys_mount refer to this structure by any means?

Umm... No offense, but
* all of these questions take a couple of minutes to answer.
* if you know how to use grep you should be able to find the
answer faster than anybody could reply
* if you know C the last two questions are non-issue (everyone who
doubts that is welcome to read the register_filesystem() source and see
what arguments do its callers pass to it)
* it looks suspiciously similar to, pardon me, attempt to cheat on
a quiz.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [ANNOUNCE] HotPlug CPU patch against 2.4.5

2001-06-16 Thread Alexander Viro



On Sun, 17 Jun 2001, Rusty Russell wrote:

> In message <[EMAIL PROTECTED]> you write:
> > In article  you wrote:
> > >   # Up...
> > >   echo 1 > /proc/sys/cpu/1
> > 
> > Wouldn't /proc/sys/cpu//enable be better?  This way other per-cpu
> > sysctls could be added more easily...
> 
> Yep.  But rewrite the sysctl crap first to make dynamically adding and
> deleting entries sane.

I had, actually. 2.5 stuff, but as soon as fs/super.c merge gets into the
sane area I'll see what can be safely merged into 2.4. Sorry - it touches
quite a few places and running two splitups in parallel...  As
soon as this fscking roll of barbwire^W^W^Wset of locking changes gets
untangled...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.0.35 limits

2001-06-15 Thread Alexander Viro



On Fri, 15 Jun 2001, Paul Faure wrote:

> Just this morning, our firewall get a kernel panic after 500 days of
> uptime.
> 
> As you can see from the log files, the date starts at June 15th, where we
> get two div by zeros, then jumps May 11th, then a kernel panic. A reboot
> brings it back to June 15th. Since cron could not open /dev/rtc. My first
> thought was an internal kernel limit on the time, but 500 days seems a bit
> short.
> 
> Any ideas ?

(1<<32) / (24 * 60 * 60 * 100) == 497

IOW, 2^32 timer interrupts since the boot.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: unregistered changes to the user<->kernel API

2001-06-14 Thread Alexander Viro



On Thu, 14 Jun 2001, Richard Henderson wrote:

> Yes, I saw those.  What is the effect of O_NOFOLLOW?  To not
> follow symbolic links when opening the file.  If you open a
> regular file, in effect nothing happens.  Moreover, if these
> opens were not finding files now, the system wouldn't work.
> 
> So: the effect, I suppose, is (1) disabling some security
> within glibc, and (2) making these accesses slower since they
> will be considered O_DIRECT after the change.
> 
> Which doesn't seem that life-threatening to me.

O_NOFOLLOW is used to deal with symlink attacks. Breaking it means
that for quite a few binaries you are opening security holes. And
since it's a flagday change, you'll get the situation when no version
will work for all kernels. Bad idea, IMO.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Download process for a "split kernel" (was: obsolete code mustdie)

2001-06-14 Thread Alexander Viro



On Thu, 14 Jun 2001, Daniel Phillips wrote:

> This sounds a lot like apt-get, doesn't it?

Folks, RTFFAQ, please. URL is attached to the end of each posting.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: is there a way to export a fat32 file system using nfs?

2001-06-12 Thread Alexander Viro



On Wed, 13 Jun 2001, Neil Brown wrote:

>Call fat_iget(i_location).
> If this finds something, check i_logstart. 
> If it matches, assume SUCCESS.
> 
>Then comes the tricky bit:  read the directory entry
> indicated by i_location, check the i_logstart is right,
> if it is, try to get it into the inode cache properly.

Uh-huh. Suppose that directory had been removed and space had been
reused by a regular file. Which had been filled with the right
contents. It's really not hard to do. Now, remove that file and
you've got a nice data corruption waiting to happen.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: threading question

2001-06-12 Thread Alexander Viro



On Tue, 12 Jun 2001, Kip Macy wrote:

> implementation of threads is not an accidental oversight, threads are not
> looked upon favorably by most of the core linux kernel hackers. A quote

s/threads/POSIX threads/.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CFT][PATCH] superblock handling changes

2001-06-11 Thread Alexander Viro



On Tue, 12 Jun 2001, Marcelo Tosatti wrote:

> 
> 
> On Tue, 12 Jun 2001, Alexander Viro wrote:
> 
> > Folks, the patch below the fixed and combined variant of
> > the last series of patches sent to Linus.
> 
> Al, 
> 
> Since you are working on that code, would you mind to add some comments
> about IO completion guarantees (also why we don't guarantee fsync() to
> work as it should :)) there ?

I'm _not_ working on that side of things. Let's not add that into the
mix, OK? If you look at inode.c changes you'll see that the only thing
they expect from __sync_one() is to retake inode_lock before moving the
inode from the locked list. Other than that patch doesn't know and
doesn't care about fsync() semantics and implementation.

We have enough fun on the superblock side of the business. Let's keep
the fsync() stuff separate - they are pretty much orthogonal to each
other.

Right now I don't want to open that can of worms. Sorry.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (3/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-fsync_no_super/include/linux/fs.h 
S6-pre2-put_super/include/linux/fs.h
--- S6-pre2-fsync_no_super/include/linux/fs.h   Sun Jun 10 18:36:27 2001
+++ S6-pre2-put_super/include/linux/fs.hSun Jun 10 18:39:04 2001
@@ -1320,7 +1320,6 @@
 
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(kdev_t);
-extern void put_super(kdev_t);
 static inline int is_mounted(kdev_t dev)
 {
struct super_block *sb = get_super(dev);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (8/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-s_count/fs/inode.c S6-pre2-freeing/fs/inode.c
--- S6-pre2-s_count/fs/inode.c  Sun Jun 10 12:45:04 2001
+++ S6-pre2-freeing/fs/inode.c  Sun Jun 10 12:45:47 2001
@@ -258,23 +258,6 @@
__sync_one(list_entry(tmp, struct inode, i_list), 0);
 }
 
-static inline int wait_on_dirty(struct list_head *head)
-{
-   struct list_head * tmp;
-   list_for_each(tmp, head) {
-   struct inode *inode = list_entry(tmp, struct inode, i_list);
-   if (!inode->i_state & I_DIRTY)
-   continue;
-   __iget(inode);
-   spin_unlock(&inode_lock);
-   __wait_on_inode(inode);
-   iput(inode);
-   spin_lock(&inode_lock);
-   return 1;
-   }
-   return 0;
-}
-
 static inline void wait_on_locked(struct list_head *head)
 {
struct list_head * tmp;
@@ -319,23 +302,13 @@
return 1;
 }
 
-/**
- * sync_inodes
- * @dev: device to sync the inodes from.
- *
- * sync_inodes goes through the super block's dirty list, 
- * writes them out, and puts them back on the normal list.
- */
-
-/*
- * caller holds exclusive lock on sb->s_umount
- */
- 
 void sync_inodes_sb(struct super_block *sb)
 {
spin_lock(&inode_lock);
-   sync_list(&sb->s_dirty);
-   wait_on_locked(&sb->s_locked_inodes);
+   while (!list_empty(&sb->s_dirty)||!list_empty(&sb->s_locked_inodes)) {
+   sync_list(&sb->s_dirty);
+   wait_on_locked(&sb->s_locked_inodes);
+   }
spin_unlock(&inode_lock);
 }
 
@@ -365,37 +338,75 @@
spin_unlock(&inode_lock);
 }
 
+/*
+ * Find a superblock with inodes that need to be synced
+ */
+
+static struct super_block *get_super_to_sync(void)
+{
+   struct list_head *p;
+restart:
+   spin_lock(&inode_lock);
+   spin_lock(&sb_lock);
+   list_for_each(p, &super_blocks) {
+   struct super_block *s = list_entry(p,struct super_block,s_list);
+   if (list_empty(&s->s_dirty) && list_empty(&s->s_locked_inodes))
+   continue;
+   s->s_count++;
+   spin_unlock(&sb_lock);
+   spin_unlock(&inode_lock);
+   down_read(&s->s_umount);
+   if (!s->s_root) {
+   up_read(&s->s_umount);
+   spin_lock(&sb_lock);
+   if (!--s->s_count)
+   kfree(s);
+   spin_unlock(&sb_lock);
+   goto restart;
+   }
+   return s;
+   }
+   spin_unlock(&sb_lock);
+   spin_unlock(&inode_lock);
+   return NULL;
+}
+
+/**
+ * sync_inodes
+ * @dev: device to sync the inodes from.
+ *
+ * sync_inodes goes through the super block's dirty list, 
+ * writes them out, and puts them back on the normal list.
+ */
+
 void sync_inodes(kdev_t dev)
 {
-   struct super_block * sb;
+   struct super_block * s;
 
/*
 * Search the super_blocks array for the device(s) to sync.
 */
-   spin_lock(&sb_lock);
-   sb = sb_entry(super_blocks.next);
-   for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.next)) {
-   if (!sb->s_dev)
-   continue;
-   if (dev && sb->s_dev != dev)
-   continue;
-   sb->s_count++;
-   spin_unlock(&sb_lock);
-   down_read(&sb->s_umount);
-   if (sb->s_dev && (sb->s_dev == dev || !dev)) {
-   spin_lock(&inode_lock);
-   do {
-   sync_list(&sb->s_dirty);
-   } while (wait_on_dirty(&sb->s_locked_inodes));
-   spin_unlock(&inode_lock);
+   if (dev) {
+   if ((s = get_super(dev)) != NULL) {
+   down_read(&s->s_umount);
+   if (s->s_root)
+   sync_inodes_sb(s);
+   up_read(&s->s_umount);
+   spin_lock(&sb_lock);
+   if (!--s->s_count)
+   kfree(s);
+   spin_unlock(&sb_lock);
+   }
+   } else {
+   while ((s = get_super_to_sync()) != NULL) {
+   sync_inodes_sb(s);
+   up_read(&s->s_umount);
+   spin_lock(&sb_lock);
+   if (!--s->s_count)
+   kfree(s);
+   spin_unlock(&sb_lock);
}
-   up_read(&sb->s_umount);
-   spin_lock(&sb_lock);
-   sb->s_count--;
-   if (dev)
-   break;
}
-   spin_unlock(&sb_lock);
 }
 
 /*
diff -urN S6-pre2-s_count/fs/super.c S6-pre2-freeing/fs/super.c
--- S6-pre2-s_count/fs/super.c  Sun Jun 10 12:45:04 2001
+++ S6-pre2-freei

[PATCH] fs/super.c stuff (5/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-put_super/fs/dquot.c S6-pre2-dquot/fs/dquot.c
--- S6-pre2-put_super/fs/dquot.cThu May 24 18:26:44 2001
+++ S6-pre2-dquot/fs/dquot.cSun Jun 10 18:46:54 2001
@@ -325,7 +325,7 @@
 memset(&dquot->dq_dqb, 0, sizeof(struct dqblk));
 }
 
-void invalidate_dquots(kdev_t dev, short type)
+static void invalidate_dquots(kdev_t dev, short type)
 {
struct dquot *dquot, *next;
int need_restart;
@@ -1388,7 +1388,7 @@
 }
 
 /* Function in inode.c - remove pointers to dquots in icache */
-extern void remove_dquot_ref(kdev_t, short);
+extern void remove_dquot_ref(struct super_block *, short);
 
 /*
  * Turn quota off on a device. type == -1 ==> quotaoff for all types (umount)
@@ -1413,7 +1413,7 @@
reset_enable_flags(dqopt, cnt);
 
/* Note: these are blocking operations */
-   remove_dquot_ref(sb->s_dev, cnt);
+   remove_dquot_ref(sb, cnt);
invalidate_dquots(sb->s_dev, cnt);
 
/* Wait for any pending IO - remove me as soon as invalidate is more 
polite */
diff -urN S6-pre2-put_super/fs/inode.c S6-pre2-dquot/fs/inode.c
--- S6-pre2-put_super/fs/inode.cFri Jun  8 18:29:03 2001
+++ S6-pre2-dquot/fs/inode.cSun Jun 10 18:43:02 2001
@@ -1164,14 +1164,13 @@
 void put_dquot_list(struct list_head *);
 int remove_inode_dquot_ref(struct inode *, short, struct list_head *);
 
-void remove_dquot_ref(kdev_t dev, short type)
+void remove_dquot_ref(struct super_block *sb, short type)
 {
-   struct super_block *sb = get_super(dev);
struct inode *inode;
struct list_head *act_head;
LIST_HEAD(tofree_head);
 
-   if (!sb || !sb->dq_op)
+   if (!sb->dq_op)
return; /* nothing to do */
 
/* We have to be protected against other CPUs */
diff -urN S6-pre2-put_super/include/linux/quotaops.h 
S6-pre2-dquot/include/linux/quotaops.h
--- S6-pre2-put_super/include/linux/quotaops.h  Sun Jun 10 13:15:27 2001
+++ S6-pre2-dquot/include/linux/quotaops.h  Sun Jun 10 18:46:33 2001
@@ -21,7 +21,6 @@
  */
 extern void dquot_initialize(struct inode *inode, short type);
 extern void dquot_drop(struct inode *inode);
-extern void invalidate_dquots(kdev_t dev, short type);
 extern int  quota_off(struct super_block *sb, short type);
 extern int  sync_dquots(kdev_t dev, short type);
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] fs/super.c stuff (6/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-dquot/arch/parisc/hpux/sys_hpux.c 
S6-pre2-drop_super/arch/parisc/hpux/sys_hpux.c
--- S6-pre2-dquot/arch/parisc/hpux/sys_hpux.c   Fri Feb 16 20:46:44 2001
+++ S6-pre2-drop_super/arch/parisc/hpux/sys_hpux.c  Sun Jun 10 18:38:23 2001
@@ -109,9 +109,11 @@
 
lock_kernel();
s = get_super(to_kdev_t(dev));
+   unlock_kernel();
if (s == NULL)
goto out;
err = vfs_statfs(s, &sbuf);
+   drop_super(s);
if (err)
goto out;
 
@@ -124,7 +126,6 @@
/* Changed to hpux_ustat:  */
err = copy_to_user(ubuf,&tmp,sizeof(struct hpux_ustat)) ? -EFAULT : 0;
 out:
-   unlock_kernel();
return err;
 }
 
diff -urN S6-pre2-dquot/fs/dquot.c S6-pre2-drop_super/fs/dquot.c
--- S6-pre2-dquot/fs/dquot.cSun Jun 10 18:46:54 2001
+++ S6-pre2-drop_super/fs/dquot.c   Sun Jun 10 18:38:23 2001
@@ -1602,6 +1602,8 @@
if (sb && sb_has_quota_enabled(sb, type))
ret = set_dqblk(sb, id, type, flags, (struct dqblk *) addr);
 out:
+   if (sb)
+   drop_super(sb);
unlock_kernel();
return ret;
 }
diff -urN S6-pre2-dquot/fs/inode.c S6-pre2-drop_super/fs/inode.c
--- S6-pre2-dquot/fs/inode.cSun Jun 10 18:43:02 2001
+++ S6-pre2-drop_super/fs/inode.c   Sun Jun 10 18:38:23 2001
@@ -605,8 +605,10 @@
fsync_dev(dev);
 
res = 0;
-   if (sb)
+   if (sb) {
res = invalidate_inodes(sb);
+   drop_super(sb);
+   }
invalidate_buffers(dev);
return res;
 }
diff -urN S6-pre2-dquot/fs/super.c S6-pre2-drop_super/fs/super.c
--- S6-pre2-dquot/fs/super.cSun Jun 10 18:36:27 2001
+++ S6-pre2-drop_super/fs/super.c   Sun Jun 10 18:38:23 2001
@@ -491,7 +491,6 @@
kill_super(sb);
 }
 
-
 /* Use octal escapes, like mount does, for embedded spaces etc. */
 static unsigned char need_escaping[] = { ' ', '\t', '\n', '\\' };
 
@@ -640,6 +639,10 @@
 #undef MANGLE
 #undef FREEROOM
 }
+
+void drop_super(struct super_block *sb)
+{
+}
  
 /*
  * Note: check the dirty flag before waiting, so we don't
@@ -709,6 +712,7 @@
 if (s == NULL)
 goto out;
err = vfs_statfs(s, &sbuf);
+   drop_super(s);
if (err)
goto out;
 
diff -urN S6-pre2-dquot/include/linux/fs.h S6-pre2-drop_super/include/linux/fs.h
--- S6-pre2-dquot/include/linux/fs.hSun Jun 10 18:39:04 2001
+++ S6-pre2-drop_super/include/linux/fs.h   Sun Jun 10 18:38:31 2001
@@ -1320,11 +1320,12 @@
 
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(kdev_t);
+extern void drop_super(struct super_block *sb);
 static inline int is_mounted(kdev_t dev)
 {
struct super_block *sb = get_super(dev);
if (sb) {
-   /* drop_super(sb); will go here */
+   drop_super(sb);
return 1;
}
return 0;
diff -urN S6-pre2-dquot/kernel/ksyms.c S6-pre2-drop_super/kernel/ksyms.c
--- S6-pre2-dquot/kernel/ksyms.cFri Jun  8 18:29:03 2001
+++ S6-pre2-drop_super/kernel/ksyms.c   Sun Jun 10 18:38:23 2001
@@ -129,6 +129,7 @@
 EXPORT_SYMBOL(update_atime);
 EXPORT_SYMBOL(get_fs_type);
 EXPORT_SYMBOL(get_super);
+EXPORT_SYMBOL(drop_super);
 EXPORT_SYMBOL(getname);
 EXPORT_SYMBOL(names_cachep);
 EXPORT_SYMBOL(fput);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (10/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-alloc_super/fs/inode.c S6-pre2-current/fs/inode.c
--- S6-pre2-alloc_super/fs/inode.c  Sun Jun 10 19:09:35 2001
+++ S6-pre2-current/fs/inode.c  Sun Jun 10 19:26:27 2001
@@ -357,11 +357,7 @@
spin_unlock(&inode_lock);
down_read(&s->s_umount);
if (!s->s_root) {
-   up_read(&s->s_umount);
-   spin_lock(&sb_lock);
-   if (!--s->s_count)
-   kfree(s);
-   spin_unlock(&sb_lock);
+   drop_super(s);
goto restart;
}
return s;
@@ -388,23 +384,13 @@
 */
if (dev) {
if ((s = get_super(dev)) != NULL) {
-   down_read(&s->s_umount);
-   if (s->s_root)
-   sync_inodes_sb(s);
-   up_read(&s->s_umount);
-   spin_lock(&sb_lock);
-   if (!--s->s_count)
-   kfree(s);
-   spin_unlock(&sb_lock);
+   sync_inodes_sb(s);
+   drop_super(s);
}
} else {
while ((s = get_super_to_sync()) != NULL) {
sync_inodes_sb(s);
-   up_read(&s->s_umount);
-   spin_lock(&sb_lock);
-   if (!--s->s_count)
-   kfree(s);
-   spin_unlock(&sb_lock);
+   drop_super(s);
}
}
 }
@@ -636,13 +622,14 @@
  
 int invalidate_device(kdev_t dev, int do_sync)
 {
-   struct super_block *sb = get_super(dev);
+   struct super_block *sb;
int res;
 
if (do_sync)
fsync_dev(dev);
 
res = 0;
+   sb = get_super(dev);
if (sb) {
res = invalidate_inodes(sb);
drop_super(sb);
diff -urN S6-pre2-alloc_super/fs/super.c S6-pre2-current/fs/super.c
--- S6-pre2-alloc_super/fs/super.c  Sun Jun 10 19:09:39 2001
+++ S6-pre2-current/fs/super.c  Sun Jun 10 19:36:51 2001
@@ -647,8 +647,23 @@
spin_unlock(&sb_lock);
 }
 
+static inline struct super_block * find_super(kdev_t dev)
+{
+   struct list_head *p;
+
+   list_for_each(p, &super_blocks) {
+   struct super_block * s = sb_entry(p);
+   if (s->s_dev == dev) {
+   s->s_count++;
+   return s;
+   }
+   }
+   return NULL;
+}
+
 void drop_super(struct super_block *sb)
 {
+   up_read(&sb->s_umount);
__put_super(sb);
 }
 
@@ -681,8 +696,7 @@
if (sb) {
if (sb->s_dirt)
write_super(sb);
-   up_read(&sb->s_umount);
-   __put_super(sb);
+   drop_super(sb);
}
return;
}
@@ -695,8 +709,7 @@
spin_unlock(&sb_lock);
down_read(&sb->s_umount);
write_super(sb);
-   up_read(&sb->s_umount);
-   __put_super(sb);
+   drop_super(sb);
goto restart;
} else
sb = sb_entry(sb->s_list.next);
@@ -719,21 +732,19 @@
return NULL;
 restart:
spin_lock(&sb_lock);
-   s = sb_entry(super_blocks.next);
-   while (s != sb_entry(&super_blocks))
-   if (s->s_dev == dev) {
-   /* Yes, it sucks. As soon as we get refcounting... */
-   /* Almost there */
-   s->s_count++;
-   spin_unlock(&sb_lock);
-   lock_super(s);
-   unlock_super(s);
-   if (s->s_dev == dev)
-   return s;
-   drop_super(s);
-   goto restart;
-   } else
-   s = sb_entry(s->s_list.next);
+   s = find_super(dev);
+   if (s) {
+   spin_unlock(&sb_lock);
+   /* Yes, it sucks. As soon as we get refcounting... */
+   /* Almost there - next two lines will go away RSN */
+   lock_super(s);
+   unlock_super(s);
+   down_read(&s->s_umount);
+   if (s->s_root)
+   return s;
+   drop_super(s);
+   goto restart;
+   }
spin_unlock(&sb_lock);
return NULL;
 }
@@ -905,10 +916,11 @@
spin_unlock(&sb_lock);
}
atomic_inc(&sb->s_active);
+   up_read(&sb->s_umount);
path_release(&nd);
return sb;
  

Re: [PATCH] fs/super.c stuff (3/10)

2001-06-10 Thread Alexander Viro

Grr... 4 of 10, that is. Sorry.

On Mon, 11 Jun 2001, Alexander Viro wrote:

> diff -urN S6-pre2-fsync_no_super/include/linux/fs.h 
>S6-pre2-put_super/include/linux/fs.h
> --- S6-pre2-fsync_no_super/include/linux/fs.h Sun Jun 10 18:36:27 2001
> +++ S6-pre2-put_super/include/linux/fs.h  Sun Jun 10 18:39:04 2001
> @@ -1320,7 +1320,6 @@
>  
>  extern struct file_system_type *get_fs_type(const char *name);
>  extern struct super_block *get_super(kdev_t);
> -extern void put_super(kdev_t);
>  static inline int is_mounted(kdev_t dev)
>  {
>   struct super_block *sb = get_super(dev);
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (7/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-put_super/fs/inode.c S6-pre2-s_count/fs/inode.c
--- S6-pre2-put_super/fs/inode.cSun Jun 10 12:25:34 2001
+++ S6-pre2-s_count/fs/inode.c  Sun Jun 10 12:29:35 2001
@@ -339,30 +339,48 @@
spin_unlock(&inode_lock);
 }
 
+/*
+ * Note:
+ * We don't need to grab a reference to superblock here. If it has non-empty
+ * ->s_dirty it's hadn't been killed yet and kill_super() won't proceed
+ * past sync_inodes_sb() until both ->s_dirty and ->s_locked_inodes are
+ * empty. Since __sync_one() regains inode_lock before it finally moves
+ * inode from superblock lists we are OK.
+ */
+
 void sync_unlocked_inodes(void)
 {
-   struct super_block * sb = sb_entry(super_blocks.next);
+   struct super_block * sb;
+   spin_lock(&inode_lock);
+   spin_lock(&sb_lock);
+   sb = sb_entry(super_blocks.next);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.next)) {
if (!list_empty(&sb->s_dirty)) {
-   spin_lock(&inode_lock);
+   spin_unlock(&sb_lock);
sync_list(&sb->s_dirty);
-   spin_unlock(&inode_lock);
+   spin_lock(&sb_lock);
}
}
+   spin_unlock(&sb_lock);
+   spin_unlock(&inode_lock);
 }
 
 void sync_inodes(kdev_t dev)
 {
-   struct super_block * sb = sb_entry(super_blocks.next);
+   struct super_block * sb;
 
/*
 * Search the super_blocks array for the device(s) to sync.
 */
+   spin_lock(&sb_lock);
+   sb = sb_entry(super_blocks.next);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.next)) {
if (!sb->s_dev)
continue;
if (dev && sb->s_dev != dev)
continue;
+   sb->s_count++;
+   spin_unlock(&sb_lock);
down_read(&sb->s_umount);
if (sb->s_dev && (sb->s_dev == dev || !dev)) {
spin_lock(&inode_lock);
@@ -372,9 +390,12 @@
spin_unlock(&inode_lock);
}
up_read(&sb->s_umount);
+   spin_lock(&sb_lock);
+   sb->s_count--;
if (dev)
break;
}
+   spin_unlock(&sb_lock);
 }
 
 /*
@@ -382,13 +403,19 @@
  */
 static void try_to_sync_unused_inodes(void)
 {
-   struct super_block * sb = sb_entry(super_blocks.next);
+   struct super_block * sb;
+
+   spin_lock(&sb_lock);
+   sb = sb_entry(super_blocks.next);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.next)) {
if (!sb->s_dev)
continue;
+   spin_unlock(&sb_lock);
if (!try_to_sync_unused_list(&sb->s_dirty))
break;
+   spin_lock(&sb_lock);
}
+   spin_unlock(&sb_lock);
 }
 
 /**
diff -urN S6-pre2-put_super/fs/super.c S6-pre2-s_count/fs/super.c
--- S6-pre2-put_super/fs/super.cSun Jun 10 12:25:34 2001
+++ S6-pre2-s_count/fs/super.c  Sun Jun 10 12:35:54 2001
@@ -62,6 +62,7 @@
 int nr_super_blocks;
 int max_super_blocks = NR_SUPER;
 LIST_HEAD(super_blocks);
+spinlock_t sb_lock = SPIN_LOCK_UNLOCKED;
 
 /*
  * Handling of filesystem drivers list.
@@ -640,8 +641,16 @@
 #undef FREEROOM
 }
 
+static inline void __put_super(struct super_block *sb)
+{
+   spin_lock(&sb_lock);
+   sb->s_count--;
+   spin_unlock(&sb_lock);
+}
+
 void drop_super(struct super_block *sb)
 {
+   __put_super(sb);
 }
  
 /*
@@ -653,6 +662,7 @@
 {
struct super_block * sb;
 
+   spin_lock(&sb_lock);
for (sb = sb_entry(super_blocks.next);
 sb != sb_entry(&super_blocks); 
 sb = sb_entry(sb->s_list.next)) {
@@ -662,12 +672,17 @@
continue;
if (!sb->s_dirt)
continue;
+   sb->s_count++;
+   spin_unlock(&sb_lock);
lock_super(sb);
if (sb->s_dev && sb->s_dirt && (!dev || dev == sb->s_dev))
if (sb->s_op && sb->s_op->write_super)
sb->s_op->write_super(sb);
unlock_super(sb);
+   spin_lock(&sb_lock);
+   sb->s_count--;
}
+   spin_unlock(&sb_lock);
 }
 
 /**
@@ -685,17 +700,23 @@
if (!dev)
return NULL;
 restart:
+   spin_lock(&sb_lock);
s = sb_entry(super_blocks.next);
while (s != sb_entry(&super_blocks))
if (s->s_dev == dev) {
/* Yes, it sucks. As soon as we get refcounting... */
+   /* Almost there */
+   s->s_count++;
+   spin_unlock(&sb_lock);
lock_super(s);
unlock_super(s);
if (s->s_dev == dev)

[PATCH] fs/super.c stuff (9/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-freeing/fs/super.c S6-pre2-current/fs/super.c
--- S6-pre2-freeing/fs/super.c  Sun Jun 10 12:45:47 2001
+++ S6-pre2-current/fs/super.c  Sun Jun 10 12:53:15 2001
@@ -59,8 +59,6 @@
 /* this is initialized in init/main.c */
 kdev_t ROOT_DEV;
 
-int nr_super_blocks;
-int max_super_blocks = NR_SUPER;
 LIST_HEAD(super_blocks);
 spinlock_t sb_lock = SPIN_LOCK_UNLOCKED;
 
@@ -775,43 +773,23 @@
  * the request.
  */
  
-static struct super_block *get_empty_super(void)
+static struct super_block *alloc_super(void)
 {
-   struct super_block *s;
-
-   spin_lock(&sb_lock);
-   for (s  = sb_entry(super_blocks.next);
-s != sb_entry(&super_blocks); 
-s  = sb_entry(s->s_list.next)) {
-   if (s->s_dev)
-   continue;
-   s->s_count++;
-   atomic_inc(&s->s_active);
-   spin_unlock(&sb_lock);
-   return s;
-   }
-   spin_unlock(&sb_lock);
-   /* Need a new one... */
-   if (nr_super_blocks >= max_super_blocks)
-   return NULL;
-   s = kmalloc(sizeof(struct super_block),  GFP_USER);
+   struct super_block *s = kmalloc(sizeof(struct super_block),  GFP_USER);
if (s) {
-   nr_super_blocks++;
memset(s, 0, sizeof(struct super_block));
-   spin_lock(&sb_lock);
INIT_LIST_HEAD(&s->s_dirty);
INIT_LIST_HEAD(&s->s_locked_inodes);
-   list_add (&s->s_list, super_blocks.prev);
INIT_LIST_HEAD(&s->s_files);
init_rwsem(&s->s_umount);
sema_init(&s->s_lock, 1);
-   atomic_set(&s->s_active, 1);
s->s_count = 1;
+   atomic_set(&s->s_active, 1);
sema_init(&s->s_vfs_rename_sem,1);
sema_init(&s->s_nfsd_free_path_sem,1);
sema_init(&s->s_dquot.dqio_sem, 1);
sema_init(&s->s_dquot.dqoff_sem, 1);
-   spin_unlock(&sb_lock);
+   s->s_maxbytes = MAX_NON_LFS;
}
return s;
 }
@@ -821,16 +799,16 @@
   void *data, int silent)
 {
struct super_block * s;
-   s = get_empty_super();
+   s = alloc_super();
if (!s)
goto out;
s->s_dev = dev;
s->s_bdev = bdev;
s->s_flags = flags;
-   s->s_dirt = 0;
s->s_type = type;
-   s->s_dquot.flags = 0;
-   s->s_maxbytes = MAX_NON_LFS;
+   spin_lock(&sb_lock);
+   list_add (&s->s_list, super_blocks.prev);
+   spin_unlock(&sb_lock);
lock_super(s);
if (!type->read_super(s, data, silent))
goto out_fail;
@@ -991,8 +969,8 @@
sb = fs_type->kern_mnt->mnt_sb;
if (!sb)
BUG();
-   do_remount_sb(sb, flags, data);
atomic_inc(&sb->s_active);
+   do_remount_sb(sb, flags, data);
return sb;
 }
 
diff -urN S6-pre2-freeing/include/linux/fs.h S6-pre2-current/include/linux/fs.h
--- S6-pre2-freeing/include/linux/fs.h  Sun Jun 10 12:45:04 2001
+++ S6-pre2-current/include/linux/fs.h  Sun Jun 10 06:10:13 2001
@@ -61,7 +61,6 @@
 };
 extern struct inodes_stat_t inodes_stat;
 
-extern int max_super_blocks, nr_super_blocks;
 extern int leases_enable, dir_notify_enable, lease_break_time;
 
 #define NR_FILE  8192  /* this can well be larger on a larger system */
diff -urN S6-pre2-freeing/kernel/sysctl.c S6-pre2-current/kernel/sysctl.c
--- S6-pre2-freeing/kernel/sysctl.c Sat Apr 14 21:41:29 2001
+++ S6-pre2-current/kernel/sysctl.c Sun Jun 10 06:09:56 2001
@@ -286,10 +286,6 @@
 0444, NULL, &proc_dointvec},
{FS_MAXFILE, "file-max", &files_stat.max_files, sizeof(int),
 0644, NULL, &proc_dointvec},
-   {FS_NRSUPER, "super-nr", &nr_super_blocks, sizeof(int),
-0444, NULL, &proc_dointvec},
-   {FS_MAXSUPER, "super-max", &max_super_blocks, sizeof(int),
-0644, NULL, &proc_dointvec},
{FS_NRDQUOT, "dquot-nr", &nr_dquots, 2*sizeof(int),
 0444, NULL, &proc_dointvec},
{FS_MAXDQUOT, "dquot-max", &max_dquots, sizeof(int),


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (3/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-s_active/fs/block_dev.c S6-pre2-fsync_no_super/fs/block_dev.c
--- S6-pre2-s_active/fs/block_dev.c Fri Jun  8 18:29:02 2001
+++ S6-pre2-fsync_no_super/fs/block_dev.c   Sun Jun 10 12:13:03 2001
@@ -678,8 +678,10 @@
down(&bdev->bd_sem);
/* syncing will go here */
lock_kernel();
-   if (kind == BDEV_FILE || kind == BDEV_FS)
+   if (kind == BDEV_FILE)
fsync_dev(rdev);
+   else if (kind == BDEV_FS)
+   fsync_no_super(rdev);
if (atomic_dec_and_test(&bdev->bd_openers)) {
/* invalidating buffers will go here */
invalidate_buffers(rdev);
diff -urN S6-pre2-s_active/fs/buffer.c S6-pre2-fsync_no_super/fs/buffer.c
--- S6-pre2-s_active/fs/buffer.cFri Jun  8 18:29:03 2001
+++ S6-pre2-fsync_no_super/fs/buffer.c  Sun Jun 10 12:13:03 2001
@@ -318,6 +318,12 @@
return sync_buffers(dev, 1);
 }
 
+int fsync_no_super(kdev_t dev)
+{
+   sync_buffers(dev, 0);
+   return sync_buffers(dev, 1);
+}
+
 int fsync_dev(kdev_t dev)
 {
sync_buffers(dev, 0);
diff -urN S6-pre2-s_active/fs/super.c S6-pre2-fsync_no_super/fs/super.c
--- S6-pre2-s_active/fs/super.c Sun Jun 10 12:07:40 2001
+++ S6-pre2-fsync_no_super/fs/super.c   Sun Jun 10 12:13:04 2001
@@ -971,12 +971,12 @@
sb->s_type = NULL;
unlock_super(sb);
unlock_kernel();
-   up_write(&sb->s_umount);
if (bdev) {
blkdev_put(bdev, BDEV_FS);
bdput(bdev);
} else
put_unnamed_dev(dev);
+   up_write(&sb->s_umount);
 }
 
 /*
diff -urN S6-pre2-s_active/include/linux/fs.h S6-pre2-fsync_no_super/include/linux/fs.h
--- S6-pre2-s_active/include/linux/fs.h Sun Jun 10 11:58:01 2001
+++ S6-pre2-fsync_no_super/include/linux/fs.h   Sun Jun 10 12:13:04 2001
@@ -1122,6 +1122,7 @@
 extern void sync_dev(kdev_t);
 extern int fsync_dev(kdev_t);
 extern int fsync_super(struct super_block *);
+extern int fsync_no_super(kdev_t);
 extern void sync_inodes_sb(struct super_block *);
 extern int fsync_inode_buffers(struct inode *);
 extern int osync_inode_buffers(struct inode *);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCHes] fs/super.c stuff

2001-06-10 Thread Alexander Viro

OK. It works here(tm). I'm sending first 10 chunks - about 70% of locking
changes. That's a good intermediate point and I'd rather avoid doing too
large steps.

Contents (patches will go in separate postings):

1, Eliminates mnt_instances and s_mounts. Instead of it we add new field to
struct super_block - s_active. Number of vfsmounts for given superblock,
i.e. number of entries in old s_mounts. Right now all accesses are serialized
by mount_sem, but later we'll need it to be atomic_t.

2. Better handling of s_active. Instead of incrementing it just when we
attach a vfsmount we do that beforehand and decrement if get_sb_... fails.

3. blkdev_put(bdev, BDEV_FS) doesn't touch superblock anymore. Current
callers don't need that (nothing to touch - it's either final kill_super()
or failed read_super()) and having it non-interfering with fs structures
gives us more freedom for get_sb_bdev().

4. pure cosmetics - fs.h contains an extern for function that doesn't exist
(put_super(kdev_t)). Removed.

5. instead of passing sb->s_dev to remove_dquot_ref() and doing get_super()
there we pass sb itself. While we are at it invalidate_dquots() is made
static - nothing outside of dquot.c calls it.

6. drop_super() added. At that stage - empty, we just add calls to balance
those of get_super().

7. First serious part.
* we add a spinlock (sb_lock) that protects super_blocks list.
* we add a reference counter to struct super_block. ->s_count.
At that stage we don't use it - only maintain correct value. Logics is
the same as for mm_struct - each temporary reference contributes 1,
all permanent references (from vfsmounts) are lumped together. It's an
int - all accesses are protected by sb_lock.
At that stage we rely on mount_sem to handle the moments when
we turn a temporary reference into permanent one. That will change,
but we need to kill the "reuse" branch of get_empty_super() to do that.
And that requires s_count already in place.

8. _Now_ we can get to real stuff.
* kill_super() removes dying superblock from the super_blocks list.
* when s_count drops to zero we free the superblock.

9. We are done with "reuse" branch of get_empty_super(). The rest (allocation
of new one) is renamed in alloc_super(). Insertion into the super_blocks
is moved into (the only) caller - read_super().

10. Now we can solve most of the problems with get_super()/umount().
get_super() does down_read(&s->s_umount (and drop_super() - up_read()).

>From that point it's more or less easy ride - we need to reorganize
get_sb_...() to have exclusion between mount() and get_super() callers,
but now we have everything we need for that.  I would rather submit that
part separately. All really evil stuff is done - in a sense it's the
nastiest point of sequence. Basically, the rest will consist of cleanups.

I've tried to carve the thing into edible chunks - if you find something
too large, please, tell. Patches themselves will go in followups to this
posting, numbered from 1 to 10. They are incremental to each other, starting
at 2.4.6-pre2.
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (2/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2-mnt_instances/fs/super.c S6-pre2-s_active/fs/super.c
--- S6-pre2-mnt_instances/fs/super.cSat Jun  9 19:18:31 2001
+++ S6-pre2-s_active/fs/super.c Sun Jun 10 12:07:40 2001
@@ -388,7 +388,6 @@
spin_lock(&dcache_lock);
list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
-   atomic_inc(&sb->s_active);
if (sb->s_type->fs_flags & FS_SINGLE)
get_filesystem(sb->s_type);
 out:
@@ -740,6 +739,7 @@
 s  = sb_entry(s->s_list.next)) {
if (s->s_dev)
continue;
+   atomic_inc(&s->s_active);
return s;
}
/* Need a new one... */
@@ -755,7 +755,7 @@
INIT_LIST_HEAD(&s->s_files);
init_rwsem(&s->s_umount);
sema_init(&s->s_lock, 1);
-   atomic_set(&s->s_active, 0);
+   atomic_set(&s->s_active, 1);
sema_init(&s->s_vfs_rename_sem,1);
sema_init(&s->s_nfsd_free_path_sem,1);
sema_init(&s->s_dquot.dqio_sem, 1);
@@ -794,6 +794,7 @@
s->s_bdev = 0;
s->s_type = NULL;
unlock_super(s);
+   atomic_dec(&s->s_active);
return NULL;
 }
 
@@ -860,6 +861,7 @@
if (fs_type == sb->s_type &&
((flags ^ sb->s_flags) & MS_RDONLY) == 0) {
path_release(&nd);
+   atomic_inc(&sb->s_active);
return sb;
}
} else {
@@ -923,6 +925,7 @@
if (!sb)
BUG();
do_remount_sb(sb, flags, data);
+   atomic_inc(&sb->s_active);
return sb;
 }
 
@@ -1038,7 +1041,6 @@
mnt->mnt_root = dget(sb->s_root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
-   atomic_inc(&sb->s_active);
type->kern_mnt = mnt;
return mnt;
 }
@@ -1315,7 +1317,6 @@
mnt->mnt_root = dget(sb->s_root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
-   atomic_inc(&sb->s_active);
 
/* Something was mounted here while we slept */
while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
@@ -1573,6 +1574,7 @@
sb = get_super(ROOT_DEV);
if (sb) {
fs_type = sb->s_type;
+   atomic_inc(&sb->s_active);
goto mount_it;
}
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] fs/super.c stuff (1/10)

2001-06-10 Thread Alexander Viro

diff -urN S6-pre2/fs/super.c S6-pre2-mnt_instances/fs/super.c
--- S6-pre2/fs/super.c  Fri Jun  8 18:29:03 2001
+++ S6-pre2-mnt_instances/fs/super.cSat Jun  9 19:18:31 2001
@@ -386,19 +386,20 @@
mnt->mnt_parent = mnt;
 
spin_lock(&dcache_lock);
-   list_add(&mnt->mnt_instances, &sb->s_mounts);
list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
+   atomic_inc(&sb->s_active);
if (sb->s_type->fs_flags & FS_SINGLE)
get_filesystem(sb->s_type);
 out:
return mnt;
 }
 
-static struct vfsmount *clone_mnt(struct vfsmount *old_mnt, struct dentry *root)
+static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root)
 {
-   char *name = old_mnt->mnt_devname;
+   char *name = old->mnt_devname;
struct vfsmount *mnt = alloc_vfsmnt();
+   struct super_block *sb = old->mnt_sb;
 
if (!mnt)
goto out;
@@ -408,14 +409,12 @@
if (mnt->mnt_devname)
strcpy(mnt->mnt_devname, name);
}
-   mnt->mnt_sb = old_mnt->mnt_sb;
+   mnt->mnt_sb = sb;
mnt->mnt_root = dget(root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
 
-   spin_lock(&dcache_lock);
-   list_add(&mnt->mnt_instances, &old_mnt->mnt_instances);
-   spin_unlock(&dcache_lock);
+   atomic_inc(&sb->s_active);
 out:
return mnt;
 }
@@ -487,9 +486,6 @@
struct super_block *sb = mnt->mnt_sb;
 
dput(mnt->mnt_root);
-   spin_lock(&dcache_lock);
-   list_del(&mnt->mnt_instances);
-   spin_unlock(&dcache_lock);
if (mnt->mnt_devname)
kfree(mnt->mnt_devname);
kmem_cache_free(mnt_cache, mnt);
@@ -757,9 +753,9 @@
INIT_LIST_HEAD(&s->s_locked_inodes);
list_add (&s->s_list, super_blocks.prev);
INIT_LIST_HEAD(&s->s_files);
-   INIT_LIST_HEAD(&s->s_mounts);
init_rwsem(&s->s_umount);
sema_init(&s->s_lock, 1);
+   atomic_set(&s->s_active, 0);
sema_init(&s->s_vfs_rename_sem,1);
sema_init(&s->s_nfsd_free_path_sem,1);
sema_init(&s->s_dquot.dqio_sem, 1);
@@ -938,12 +934,9 @@
struct file_system_type *fs = sb->s_type;
struct super_operations *sop = sb->s_op;
 
-   spin_lock(&dcache_lock);
-   if (!list_empty(&sb->s_mounts)) {
-   spin_unlock(&dcache_lock);
+   atomic_dec(&sb->s_active);
+   if (atomic_read(&sb->s_active))
return;
-   }
-   spin_unlock(&dcache_lock);
down_write(&sb->s_umount);
lock_kernel();
sb->s_root = NULL;
@@ -1045,9 +1038,7 @@
mnt->mnt_root = dget(sb->s_root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
-   spin_lock(&dcache_lock);
-   list_add(&mnt->mnt_instances, &sb->s_mounts);
-   spin_unlock(&dcache_lock);
+   atomic_inc(&sb->s_active);
type->kern_mnt = mnt;
return mnt;
 }
@@ -1092,7 +1083,7 @@
 
spin_lock(&dcache_lock);
 
-   if (mnt->mnt_instances.next != mnt->mnt_instances.prev) {
+   if (atomic_read(&sb->s_active) > 1) {
if (atomic_read(&mnt->mnt_count) > 2) {
spin_unlock(&dcache_lock);
return -EBUSY;
@@ -1324,9 +1315,7 @@
mnt->mnt_root = dget(sb->s_root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
-   spin_lock(&dcache_lock);
-   list_add(&mnt->mnt_instances, &sb->s_mounts);
-   spin_unlock(&dcache_lock);
+   atomic_inc(&sb->s_active);
 
/* Something was mounted here while we slept */
while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
diff -urN S6-pre2/include/linux/fs.h S6-pre2-mnt_instances/include/linux/fs.h
--- S6-pre2/include/linux/fs.h  Fri Jun  8 18:29:03 2001
+++ S6-pre2-mnt_instances/include/linux/fs.hSat Jun  9 19:18:31 2001
@@ -679,13 +679,13 @@
struct dentry   *s_root;
struct rw_semaphore s_umount;
struct semaphores_lock;
+   atomic_ts_active;
 
struct list_heads_dirty;/* dirty inodes */
struct list_heads_locked_inodes;/* inodes being synced */
struct list_heads_files;
 
struct block_device *s_bdev;
-   struct list_heads_mounts;   /* vfsmount(s) of this one */
struct quota_mount_options s_dquot; /* Diskquota specific options */
 
union {
diff -urN S6-pre2/include/linux/mount.h S6-pre2-mnt_instances/include/linux/mount.h
--- S6-pre2/include/linux/mount.h   Fri Jun  8 18:29:03 2001
+++ S6-pre2-mnt_instances/include/linux/mount.h Sat Jun  9 19:18:31 2001
@@ -18,7 +18,6 @@
struct vfsmount *mnt_parent;/* fs we are mounted on */
struct dentry *mnt_mountpoint;  /* dentry of 

[PATCH] Re: Oops with kernel 2.4.5 on heavy disk traffic

2001-06-10 Thread Alexander Viro

Please, apply. What's happing here is simple - we set i_ino by
PID and get something out of range of per-process inode. Confusion
follows... Fix: move initializing ->u.proc_i.task past the check.
Then proc_delete_inode() will be happy with it.
Alois, Bryce - that ought to fix the oopsen you see.

--- linux/fs/proc/base.c.oldSun Jun 10 11:15:55 2001
+++ linux/fs/proc/base.cSun Jun 10 11:21:51 2001
@@ -635,15 +635,14 @@
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
inode->i_ino = fake_ino(task->pid, ino);
 
-   inode->u.proc_i.file = NULL;
+   if (!task->pid)
+   goto out_unlock;
+
/*
 * grab the reference to task.
 */
-   inode->u.proc_i.task = task;
get_task_struct(task);
-   if (!task->pid)
-   goto out_unlock;
-
+   inode->u.proc_i.task = task;
inode->i_uid = 0;
inode->i_gid = 0;
if (ino == PROC_PID_INO || task->dumpable) {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] sockreg2.4.5-05 inet[6]_create() register/unregistertable

2001-06-09 Thread Alexander Viro



On Sat, 9 Jun 2001, watermodem wrote:

> He is discussing a theme with legal implications. (Legal and Slow tended
> to be intertwined)  I know what his position in the linux kernel
> hierarchy is, and if he were in a corporation with that position he
> could just say NO without any reason.  But, linux development is
> portrayed as something "open" and "of the people" not a closed corporate
> offering.  Now, if that is not the case, then just take out all the
> flowery words from the license and replace it with the unstated but
> defacto communist motto "What's mine is mine What's yours is mine!". 

Pot. Kettle. Black.  You are one who tries to tell other people what
can be done with their code.  With all my personal dislike of GPL
(I use it if the project I'm working on does, but I won't use it
for anything else), Dave _has_ right to choose the license he likes
and you'd bloody better respect that.  Author has absolute right
to set the conditions for using his thing.  If they are unacceptable
for you - nobody forces you to use it.  Any whining about that places
you on the level of Napster wankers.  Now, bugger off - go play with
"social hackers" or something...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] a couple potential deadlocks in 2.4.5-ac8

2001-06-09 Thread Alexander Viro



On Sat, 9 Jun 2001, Linus Torvalds wrote:

> On Sat, 9 Jun 2001, Alexander Viro wrote:
> > 
> > True, but... I can easily see the situation when ->foo() and ->bar()
> > both call a helper function which needs BKL for a small piece of code.
> 
> I'd hope that we can fix the small helper functions to not need BKL -
> there are already many circumstances where you can't use the BKL anyway
> (ie you already hold a spinlock - I'd really like to have the rule that
> the BKL is the "outermost" of all spinlocks, as we could in theory some
> day use it as a point to schedule away on BKL contention).
> 
> > ObUnrelated: fs/super.c is getting to the point where it naturally
> > falls into two files - one that deals with mount cache and all things
> > vfsmount-related, mount tree manipulations, etc. and another that deals
> > with superblocks. Mind if I split the thing?
> 
> Sure. As long as there is some sane naming and not too many new non-static
> functions. Maybe just "fs/mount.c" for the vfsmount caches etc.

Umm... In the final variant of patch all interaction is done by
alloc_vfsmnt() and set_devname() on one side and kill_super(),
do_kern_mount() and do_remount_sb() on another. That is, aside
of the currently public functions (actually, some of them are
gone - kern_umount is #defined to mntput and kern_mount became
a trivial inlined wrapper around do_kern_mount, change_root is
gone, etc.).

In my variant it was called fs/namespace.c, basically since I prefer
the names along the line "what does it deal with" (answer: user-visible
namespace) to "what action is done here" ones, especially since mount(2)
is not the only syscall exported (pivot_root(2) and umount(2) are also
handled here).

After all, inode.c, dcache.c, buffer.c, locks.c, devices.c, etc. are named
that way. OTOH, open.c and namei.c are not, so... Up to you - my preference
would be a noun and namespace looks better than next contender (mntcache),
but that's your call - filenames in core kernel...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] a couple potential deadlocks in 2.4.5-ac8

2001-06-09 Thread Alexander Viro



On Sat, 9 Jun 2001, Linus Torvalds wrote:

> Anyway, in a 2.5.x timeframe we should probably make sure that we do not
> have the need for a recursive BKL any more. That shouldn't be that hard to
> fix, especially with help from CHECKER to verify that we didn't forget
> some case.

True, but... I can easily see the situation when ->foo() and ->bar()
both call a helper function which needs BKL for a small piece of code.
->foo() callers take BKL (and it's choke-full of places that still need
BKL, anyway). ->bar() is called without BKL. Moreover, grabbing BKL
over the whole helper is a massive overkill.

ObUnrelated: fs/super.c is getting to the point where it naturally
falls into two files - one that deals with mount cache and all things
vfsmount-related, mount tree manipulations, etc. and another that deals
with superblocks. Mind if I split the thing?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] a couple potential deadlocks in 2.4.5-ac8

2001-06-09 Thread Alexander Viro



On 9 Jun 2001, Linus Torvalds wrote:

> The big kernel lock rules are that it's a "normal spinlock" in many
> regards, BUT you can block while holding it, and the BKL will magically
> be released during the blocking.  This means, for example, that the BKL
> can never deadlock with a semaphore - if a BKL holder blocks on sombody
> elses semaphore (and that somebody else wants the BKL), then the act of
> blocking on the semaphore will release the BKL, and allow the original
> semaphore holder to continue. 

Another difference from spinlocks is that BKL is recursive. I'm
actually surprised that it didn't show up first.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] truncate_inode_pages

2001-06-09 Thread Alexander Viro



> takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
> truncate_inode_pages().  The machine is locked up for the duration.
> The patch reduces this to 20 milliseconds via an O(clean + dirty)
> algorithm.

Unfortunately, it's _not_ O(clean + dirty).

> + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) {
> + spin_lock(&pagecache_lock);
> + complete = 0;
> + }

Cool. Now think what happens if pages with large indices are in the
very end of list. Half of them. You skip clean/2 pages on each of
clean/2 passes. Hardly a linear behaviour - all you need is a different
program to trigger it.

Now, having a separate pass that would reorder the pages on list,
moving the to-kill ones in the beginning might help.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] a couple potential deadlocks in 2.4.5-ac8

2001-06-09 Thread Alexander Viro



On Sat, 9 Jun 2001, Dawson Engler wrote:

> Hi All,
> 
> we're starting to develop a checker that finds deadlocks by (1)
> computing all lock acquisition paths and (2) checking if two paths
> violate a partial order.
> 
> E.g., for two threads T1 and T2:
>   T1: foo acquires A --> calls bar which tries to acquire B
>   T2: baz acquires B --> calls blah which tries to acquire A
> all else being equal, this deadlocks.
> 
> The checker is pretty primitive.  In particular:
>   - lots of false negatives come from the fact that it does not 
> track interrupt disabling.  A missed deadlock:
>   foo acquires A
>   bar interrupts foo, disables interrupts, tries to acquire A
> (Is this the most common deadlock?)
> 
>   - many potential false positives since it does not realize when
>   two kernel call traces are mutually exclusive.
> 
> To check it's mechanics I've enclosed what look to me to be two potential
> deadlocks --- given the limits of the tool and my understanding of what
> can happen when, these could be (likely be?) false positive, so I'd
> appreciate any corrective feedback.

BKL is special. It has no nesting constraints wrt. semaphores. It is
a spinlock, but we are allowed to block while holding it - then it will
be released and the next time we get a timeslice we will start with
attempt to reacquire it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: I/O system call never returns if file desc is closedin the

2001-06-07 Thread Alexander Viro



On 7 Jun 2001, Florian Weimer wrote:

> There's a subtle difference: For malloc(), libc has a mutex (or
> whatever), but for open(), socket() etc., no locking is performed, and
> many libc functions create (and destroy) descriptors imlicitely.  

So? You don't have to close() descriptors you had not (to your code
knowledge) opened. End of story.

> I still don't see how you can write maintainable and reliable software
> with asynchronous close().  For example, if some select() call returns
> EBADF after an asynchronous close(), you would have to scan the
> descriptors to find the offending one, but in the meantime, it has
> been reused by another thread.  What do you do in this case?

You don't rely on EBADF. It's _your_ code that had closed the thing. Unless
you pass descriptors of unknown origin into select() (hardly a good idea)
you have all information you need to provide an exclusion.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: I/O system call never returns if file desc is closedin the

2001-06-06 Thread Alexander Viro



On 7 Jun 2001, Florian Weimer wrote:

> Matthias Urlichs <[EMAIL PROTECTED]> writes:
> 
> > Select is defined as to return, with the appropriate bit set, if/when
> > a nonblocking read/write on the file descriptor won't block. You'd get
> > EBADF in this case, therefore causing the select to return would be a
> > Good Thing.
> 
> How do you avoid race conditions if more than one thread is creating
> file descriptors?  I think you can only do that under very special
> circumstances, and it definitely requires some synchronization.

The same way as you do it for many threads doing any allocations.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-06 Thread Alexander Viro



On Thu, 7 Jun 2001, Edgar Toernig wrote:

> Alexander Viro wrote:
> > ...
> > dir = open("/usr/local", O_DIRECTORY);
> > /* error handling */
> > new_mount(dir, MNT_SET, fs_fd); /* closes dir and fs_fd */
> 
> Do you really want to start using fds instead of strings for tree
> modifying commands (link, unlink, symlink, rename, mount and umount)?
> Even if it were possible in the new_mount case it wouldn't have the
> atomic lookup+act nature of the old mount.  And then, _I_ would
> prefer a uniform interface for tree management commands - strings.

You have exactly the same atomicity warranties. That is to say, none.
Mountpoint can be renamed between the lookup and mounting.

Moreover, even after mount(2) you can rename() parent of mountpoint. On
all Unices I've seen (well, aside of v7 which didn't have rename(2)).
So if you rely on anything of that kind - you are screwed. Portably
screwed, at that.

I would argue that mount(2) is seriously different from rename(2) and
friends, but even if your argument makes sense, it only makes sense for
"dir" argument. "device" is nothing but a filesystem-specific option.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Break 2.4 VM in five easy steps

2001-06-06 Thread Alexander Viro



On Wed, 6 Jun 2001, Sean Hunter wrote:

> This is completely bogus. I am not saying that I can't afford the swap.
> What I am saying is that it is completely broken to require this amount
> of swap given the boundaries of efficient use. 

Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD
systems I've used were broken, but I've never thought that swap==2*RAM rule
was one of them.

Not that being more kind on swap would be a bad thing, but that rule for
amount of swap is pretty common. ISTR similar for (very old) SCO, so it's
not just BSD world. How are modern Missed'em'V variants in that respect, BTW?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] more fs/super.c cleanups (5)

2001-06-05 Thread Alexander Viro

Chunk 5:
* we put vfsmounts into hash, keyed by pair dentry/vfsmount of
mountpoint. attach_mnt() and detach_mnt() do the obvious thing.
* follow_down() and friends do lookup in that hash, instead of
traversing ->d_vfsmnt. It kills scalability problem with many parallel
trees (if you remember, that's what was planned from the very beginning;
d_vfsmount was a "it works for now" sort of thing).
* d_vfsmnt is gone. In its place we have a counter - how many
things are mounted on that dentry. That (along with the above) covers all
uses of d_vfsmnt. First of all, d_mountoint() is easier (->d_mounted != 0
instead of !list_empty()). Besides, struct dentry became smaller.
* we allocate vfsmounts from the cache of their own now.

OK, that's probably it for 6-pre1. It Works Here(tm), it had been done with
equivalent transformations and I hope that chunks are small enough to be
easy to verify.  Next part changes mount_sem locking (internal to fs/super.c),
so I'd rather keep it separate.

Please, apply.
Al

diff -urN S6-pre1-graft_tree/fs/autofs4/expire.c S6-pre1-mntcache/fs/autofs4/expire.c
--- S6-pre1-graft_tree/fs/autofs4/expire.c  Sat Apr 28 02:12:56 2001
+++ S6-pre1-mntcache/fs/autofs4/expire.cTue Jun  5 08:18:04 2001
@@ -66,19 +66,11 @@
non-busy mounts */
 static int check_vfsmnt(struct vfsmount *mnt, struct dentry *dentry)
 {
-   int ret = 0;
-   struct list_head *tmp;
-
-   list_for_each(tmp, &dentry->d_vfsmnt) {
-   struct vfsmount *vfs = list_entry(tmp, struct vfsmount, 
- mnt_clash);
-   DPRINTK(("check_vfsmnt: mnt=%p, dentry=%p, tmp=%p, vfs=%p\n",
-mnt, dentry, tmp, vfs));
-   if (vfs->mnt_parent != mnt || /* don't care about busy-ness of other 
namespaces */
-   !is_vfsmnt_tree_busy(vfs))
-   ret++;
-   }
+   int ret = dentry->d_mounted;
+   struct vfsmount *vfs = lookup_mnt(mnt, dentry);
 
+   if (vfs && is_vfsmnt_tree_busy(vfs))
+   ret--;
DPRINTK(("check_vfsmnt: ret=%d\n", ret));
return ret;
 }
diff -urN S6-pre1-graft_tree/fs/dcache.c S6-pre1-mntcache/fs/dcache.c
--- S6-pre1-graft_tree/fs/dcache.c  Fri May 25 21:51:12 2001
+++ S6-pre1-mntcache/fs/dcache.cTue Jun  5 08:18:04 2001
@@ -616,7 +616,7 @@
dentry->d_name.hash = name->hash;
dentry->d_op = NULL;
dentry->d_fsdata = NULL;
-   INIT_LIST_HEAD(&dentry->d_vfsmnt);
+   dentry->d_mounted = 0;
INIT_LIST_HEAD(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
@@ -1283,6 +1283,7 @@
 
dcache_init(mempages);
inode_init(mempages);
+   mnt_init(mempages);
bdev_cache_init();
cdev_cache_init();
 }
diff -urN S6-pre1-graft_tree/fs/namei.c S6-pre1-mntcache/fs/namei.c
--- S6-pre1-graft_tree/fs/namei.c   Fri May 25 21:51:14 2001
+++ S6-pre1-mntcache/fs/namei.c Tue Jun  5 08:18:04 2001
@@ -351,22 +351,17 @@
 
 static inline int __follow_down(struct vfsmount **mnt, struct dentry **dentry)
 {
-   struct list_head *p;
+   struct vfsmount *mounted;
+
spin_lock(&dcache_lock);
-   p = (*dentry)->d_vfsmnt.next;
-   while (p != &(*dentry)->d_vfsmnt) {
-   struct vfsmount *tmp;
-   tmp = list_entry(p, struct vfsmount, mnt_clash);
-   if (tmp->mnt_parent == *mnt) {
-   *mnt = mntget(tmp);
-   spin_unlock(&dcache_lock);
-   mntput(tmp->mnt_parent);
-   /* tmp holds the mountpoint, so... */
-   dput(*dentry);
-   *dentry = dget(tmp->mnt_root);
-   return 1;
-   }
-   p = p->next;
+   mounted = lookup_mnt(*mnt, *dentry);
+   if (mounted) {
+   *mnt = mntget(mounted);
+   spin_unlock(&dcache_lock);
+   dput(*dentry);
+   mntput(mounted->mnt_parent);
+   *dentry = dget(mounted->mnt_root);
+   return 1;
}
spin_unlock(&dcache_lock);
return 0;
diff -urN S6-pre1-graft_tree/fs/super.c S6-pre1-mntcache/fs/super.c
--- S6-pre1-graft_tree/fs/super.c   Tue Jun  5 08:17:28 2001
+++ S6-pre1-mntcache/fs/super.c Tue Jun  5 08:18:04 2001
@@ -281,13 +281,25 @@
 
 static LIST_HEAD(vfsmntlist);
 
+static struct list_head *mount_hashtable;
+static int hash_mask, hash_bits;
+static kmem_cache_t *mnt_cache; 
+
+static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry)
+{
+   unsigned long tmp = ((unsigned long) mnt / L1_CACHE_BYTES);
+   tmp += ((unsigned long) dentry / L1_CACHE_BYTES);
+   tmp = tmp + (tmp >> hash_mask);
+   return tmp & hash_bits;
+}
+
 struct vfsmount *alloc_vfsmnt(void)
 {
-  

[PATCH] more fs/super.c cleanups (4)

2001-06-05 Thread Alexander Viro

Chunk 4: OK, this one is interesting.
* new function - graft_tree(what, where). It does necessary locking
and checks and mounts existing vfsmount on given point. Basically, it's the
common part of mounting and binding. Checks are usual - mountpoint is not
dead, we are not trying to mount directory on non-directory or vice versa.
* clone_mnt(vfsmount, root) - creates vfsmount of subtree.
* do_loopback() turned into "find what we want to bind, clone that
vfsmount setting its root to that dentry and graft it on mountpoint"
* do_add_mount() (aka. normal mounting) is "allocate vfsmount, then
find superblock, then attach it to already allocated vfsmount, check that
we are not stacking it onto the same fs and graft on mountpoint". The good
thing being: we are done with the ugliness on the "can't mount here, need
to kill superblock". We simply do mntput() on the vfsmount we have. Always.
If it was successfully grafted on the mountpoint - fine, we are left with
->mnt_count == 1. If we didn't make it - last reference goes away and we
are doing the right thing again.

Another good thing is that vfsmount allocation is gone from the area where
we keep mountpoint locked. That helps later big way, since we can clean
the mount/rmdir and mount/unlink race-prevention nicely - it's easier if
we can get the critical area in mount non-blocking.

Please, apply.
Al


diff -urN S6-pre1-do_add_mount/fs/super.c S6-pre1-graft_tree/fs/super.c
--- S6-pre1-do_add_mount/fs/super.c Tue Jun  5 08:16:35 2001
+++ S6-pre1-graft_tree/fs/super.c   Tue Jun  5 08:17:28 2001
@@ -330,9 +330,7 @@
  * dentry (if any).
  */
 
-static struct vfsmount *add_vfsmnt(struct nameidata *nd,
-   struct dentry *root,
-   const char *dev_name)
+static struct vfsmount *add_vfsmnt(struct dentry *root, const char *dev_name)
 {
struct vfsmount *mnt;
struct super_block *sb = root->d_inode->i_sb;
@@ -351,18 +349,11 @@
}
}
mnt->mnt_sb = sb;
-
-   spin_lock(&dcache_lock);
-   if (nd && !IS_ROOT(nd->dentry) && d_unhashed(nd->dentry))
-   goto fail;
mnt->mnt_root = dget(root);
+   mnt->mnt_mountpoint = mnt->mnt_root;
+   mnt->mnt_parent = mnt;
 
-   if (nd) {
-   attach_mnt(mnt, nd);
-   } else {
-   mnt->mnt_mountpoint = mnt->mnt_root;
-   mnt->mnt_parent = mnt;
-   }
+   spin_lock(&dcache_lock);
list_add(&mnt->mnt_instances, &sb->s_mounts);
list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
@@ -370,12 +361,60 @@
get_filesystem(sb->s_type);
 out:
return mnt;
+}
+
+static struct vfsmount *clone_mnt(struct vfsmount *old_mnt, struct dentry *root)
+{
+   char *name = old_mnt->mnt_devname;
+   struct vfsmount *mnt = alloc_vfsmnt();
+
+   if (!mnt)
+   goto out;
+
+   if (name) {
+   mnt->mnt_devname = kmalloc(strlen(name)+1, GFP_KERNEL);
+   if (mnt->mnt_devname)
+   strcpy(mnt->mnt_devname, name);
+   }
+   mnt->mnt_sb = old_mnt->mnt_sb;
+   mnt->mnt_root = dget(root);
+   mnt->mnt_mountpoint = mnt->mnt_root;
+   mnt->mnt_parent = mnt;
+
+   spin_lock(&dcache_lock);
+   list_add(&mnt->mnt_instances, &old_mnt->mnt_instances);
+   spin_unlock(&dcache_lock);
+out:
+   return mnt;
+}
+
+static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
+{
+   if (S_ISDIR(nd->dentry->d_inode->i_mode) !=
+ S_ISDIR(mnt->mnt_root->d_inode->i_mode))
+   return -ENOTDIR;
+
+   down(&nd->dentry->d_inode->i_zombie);
+   if (IS_DEADDIR(nd->dentry->d_inode))
+   goto fail1;
+
+   spin_lock(&dcache_lock);
+   if (!IS_ROOT(nd->dentry) && d_unhashed(nd->dentry))
+   goto fail;
+
+   attach_mnt(mnt, nd);
+   list_add(&mnt->mnt_list, vfsmntlist.prev);
+   spin_unlock(&dcache_lock);
+   up(&nd->dentry->d_inode->i_zombie);
+   if (mnt->mnt_sb->s_type->fs_flags & FS_SINGLE)
+   get_filesystem(mnt->mnt_sb->s_type);
+   mntget(mnt);
+   return 0;
 fail:
spin_unlock(&dcache_lock);
-   if (mnt->mnt_devname)
-   kfree(mnt->mnt_devname);
-   kfree(mnt);
-   return NULL;
+fail1:
+   up(&nd->dentry->d_inode->i_zombie);
+   return -ENOENT;
 }
 
 static void move_vfsmnt(struct vfsmount *mnt,
@@ -1154,35 +1193,30 @@
 static int do_loopback(struct nameidata *nd, char *old_name)
 {
struct nameidata old_nd;
-   int err = 0;
+   struct vfsmount *mnt;
+   int err;
+
+   err = mount_is_safe(nd);
+   if (err)
+   return err;
+
if (!old_name || !*old_name)
return -EINVAL;
-   if (path_init(old_name, LOOKUP_POSITIVE, &old_nd))
+

[PATCH] more fs/super.c cleanups (3)

2001-06-05 Thread Alexander Viro

Chunk 3:
Takes the normal mounting into a helper similar to do_loopback()
et.al., makes do_mount() cleaner. Please, apply
Al

diff -urN S6-pre1-do_mount/fs/super.c S6-pre1-do_add_mount/fs/super.c
--- S6-pre1-do_mount/fs/super.c Tue Jun  5 08:15:33 2001
+++ S6-pre1-do_add_mount/fs/super.c Tue Jun  5 08:16:35 2001
@@ -1203,6 +1203,76 @@
return do_remount_sb(nd->mnt->mnt_sb, flags, data);
 }
 
+static int do_add_mount(struct nameidata *nd, char *type, int flags,
+   char *name, void *data)
+{
+   struct file_system_type * fstype;
+   struct nameidata nd;
+   struct vfsmount *mnt = NULL;
+   struct super_block *sb;
+   int retval = 0;
+
+   if (!type || !memchr(type, 0, PAGE_SIZE))
+   return -EINVAL;
+
+   /* we need capabilities... */
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   /* ... filesystem driver... */
+   fstype = get_fs_type(type);
+   if (!fstype)
+   return -ENODEV;
+
+   /* get superblock, locks mount_sem on success */
+   if (fstype->fs_flags & FS_NOMOUNT)
+   sb = ERR_PTR(-EINVAL);
+   else if (fstype->fs_flags & FS_REQUIRES_DEV)
+   sb = get_sb_bdev(fstype, name, flags, data);
+   else if (fstype->fs_flags & FS_SINGLE)
+   sb = get_sb_single(fstype, flags, data);
+   else
+   sb = get_sb_nodev(fstype, flags, data);
+
+   retval = PTR_ERR(sb);
+   if (IS_ERR(sb))
+   goto fs_out;
+
+   /* Something was mounted here while we slept */
+   while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
+   ;
+
+   /* Refuse the same filesystem on the same mount point */
+   retval = -EBUSY;
+   if (nd->mnt->mnt_sb == sb && nd->mnt->mnt_root == nd->dentry)
+   goto fail;
+
+   retval = -ENOENT;
+   if (!nd->dentry->d_inode)
+   goto fail;
+   retval = -ENOTDIR;
+   if (!S_ISDIR(nd->dentry->d_inode->i_mode))
+   goto fail;
+   down(&nd->dentry->d_inode->i_zombie);
+   if (!IS_DEADDIR(nd->dentry->d_inode)) {
+   retval = -ENOMEM;
+   mnt = add_vfsmnt(nd, sb->s_root, name);
+   }
+   up(&nd->dentry->d_inode->i_zombie);
+   if (!mnt)
+   goto fail;
+   retval = 0;
+unlock_out:
+   up(&mount_sem);
+fs_out:
+   put_filesystem(fstype);
+   return retval;
+
+fail:
+   kill_super(sb);
+   goto unlock_out;
+}
+
 static int copy_mount_options (const void *data, unsigned long *where)
 {
int i;
@@ -1253,10 +1323,7 @@
 long do_mount(char * dev_name, char * dir_name, char *type_page,
  unsigned long flags, void *data_page)
 {
-   struct file_system_type * fstype;
struct nameidata nd;
-   struct vfsmount *mnt = NULL;
-   struct super_block *sb;
int retval = 0;
 
/* Discard magic */
@@ -1276,86 +1343,16 @@
if (retval)
return retval;
 
-   /* just change the flags? - capabilities are checked in do_remount() */
-   if (flags & MS_REMOUNT) {
-   retval = do_remount(&nd, flags&~MS_REMOUNT, (char *)data_page);
-   goto nd_out;
-   }
-
-   /* "mount --bind"? Equivalent to older "mount -t bind" */
-   /* No capabilities? What if users do thousands of these? */
-   if (flags & MS_BIND) {
+   if (flags & MS_REMOUNT)
+   retval = do_remount(&nd, flags&~MS_REMOUNT,
+ (char *)data_page);
+   else if (flags & MS_BIND)
retval = do_loopback(&nd, dev_name);
-   goto nd_out;
-   }
-
-   /* For the rest we need the type */
-
-   retval = -EINVAL;
-   if (!type_page || !memchr(type_page, 0, PAGE_SIZE))
-   goto nd_out;
-
-   retval = -EPERM;
-   /* for the rest we _really_ need capabilities... */
-   if (!capable(CAP_SYS_ADMIN))
-   goto nd_out;
-
-   retval = -ENODEV;
-   /* ... filesystem driver... */
-   fstype = get_fs_type(type_page);
-   if (!fstype)
-   goto nd_out;
-
-   /* get superblock, locks mount_sem on success */
-   if (fstype->fs_flags & FS_NOMOUNT)
-   sb = ERR_PTR(-EINVAL);
-   else if (fstype->fs_flags & FS_REQUIRES_DEV)
-   sb = get_sb_bdev(fstype, dev_name, flags, data_page);
-   else if (fstype->fs_flags & FS_SINGLE)
-   sb = get_sb_single(fstype, flags, data_page);
else
-   sb = get_sb_nodev(fstype, flags, data_page);
-
-   retval = PTR_ERR(sb);
-   if (IS_ERR(sb))
-   goto fs_out;
-
-   /* Something was mounted here while we slept */
-   while(d_mountpoint(nd.dentry) && follow_down(&nd.mnt, &nd.dentry))
-   ;
-
-   /* Refuse the same files

[PATCH] more fs/super.c fixes (2)

2001-06-05 Thread Alexander Viro

Chunk 2:
Since all branches of do_mount() (mounting, binding, remounting)
do the same thing (lookup of directory) we can take that lookup in the
beginning of do_mount() and pass to do_loopback() and do_remount()
nameidata instead of name.

Please, apply
Al

diff -urN S6-pre1-do_remount/fs/super.c S6-pre1-do_mount/fs/super.c
--- S6-pre1-do_remount/fs/super.c   Tue Jun  5 08:14:29 2001
+++ S6-pre1-do_mount/fs/super.c Tue Jun  5 08:15:33 2001
@@ -1151,9 +1151,9 @@
 /*
  * do loopback mount.
  */
-static int do_loopback(char *old_name, char *new_name)
+static int do_loopback(struct nameidata *nd, char *old_name)
 {
-   struct nameidata old_nd, new_nd;
+   struct nameidata old_nd;
int err = 0;
if (!old_name || !*old_name)
return -EINVAL;
@@ -1161,31 +1161,25 @@
err = path_walk(old_name, &old_nd);
if (err)
goto out;
-   if (path_init(new_name, LOOKUP_POSITIVE, &new_nd))
-   err = path_walk(new_name, &new_nd);
+   err = mount_is_safe(nd);
if (err)
goto out1;
-   err = mount_is_safe(&new_nd);
-   if (err)
-   goto out2;
err = -EINVAL;
-   if (S_ISDIR(new_nd.dentry->d_inode->i_mode) !=
+   if (S_ISDIR(nd->dentry->d_inode->i_mode) !=
  S_ISDIR(old_nd.dentry->d_inode->i_mode))
-   goto out2;
+   goto out1;
 
err = -ENOMEM;

down(&mount_sem);
/* there we go */
-   down(&new_nd.dentry->d_inode->i_zombie);
-   if (IS_DEADDIR(new_nd.dentry->d_inode))
+   down(&nd->dentry->d_inode->i_zombie);
+   if (IS_DEADDIR(nd->dentry->d_inode))
err = -ENOENT;
-   else if (add_vfsmnt(&new_nd, old_nd.dentry, old_nd.mnt->mnt_devname))
+   else if (add_vfsmnt(nd, old_nd.dentry, old_nd.mnt->mnt_devname))
err = 0;
-   up(&new_nd.dentry->d_inode->i_zombie);
+   up(&nd->dentry->d_inode->i_zombie);
up(&mount_sem);
-out2:
-   path_release(&new_nd);
 out1:
path_release(&old_nd);
 out:
@@ -1198,25 +1192,15 @@
  * on it - tough luck.
  */
 
-static int do_remount(const char *dir,int flags,char *data)
+static int do_remount(struct nameidata *nd, int flags, char *data)
 {
-   struct nameidata nd;
-   int retval = 0;
-
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
-   if (path_init(dir, LOOKUP_FOLLOW|LOOKUP_POSITIVE, &nd))
-   retval = path_walk(dir, &nd);
-   if (retval)
-   return retval;
-
-   retval = -EINVAL;
-   if (nd.dentry == nd.mnt->mnt_root)
-   retval = do_remount_sb(nd.mnt->mnt_sb, flags, data);
+   if (nd->dentry != nd->mnt->mnt_root)
+   return -EINVAL;
 
-   path_release(&nd);
-   return retval;
+   return do_remount_sb(nd->mnt->mnt_sb, flags, data);
 }
 
 static int copy_mount_options (const void *data, unsigned long *where)
@@ -1286,38 +1270,41 @@
if (dev_name && !memchr(dev_name, 0, PAGE_SIZE))
return -EINVAL;
 
-   /* OK, looks good, now let's see what do they want */
+   /* ... and get the mountpoint */
+   if (path_init(dir_name, LOOKUP_FOLLOW|LOOKUP_POSITIVE, &nd))
+   retval = path_walk(dir_name, &nd);
+   if (retval)
+   return retval;
 
/* just change the flags? - capabilities are checked in do_remount() */
-   if (flags & MS_REMOUNT)
-   return do_remount(dir_name, flags & ~MS_REMOUNT,
- (char *) data_page);
+   if (flags & MS_REMOUNT) {
+   retval = do_remount(&nd, flags&~MS_REMOUNT, (char *)data_page);
+   goto nd_out;
+   }
 
/* "mount --bind"? Equivalent to older "mount -t bind" */
/* No capabilities? What if users do thousands of these? */
-   if (flags & MS_BIND)
-   return do_loopback(dev_name, dir_name);
+   if (flags & MS_BIND) {
+   retval = do_loopback(&nd, dev_name);
+   goto nd_out;
+   }
 
/* For the rest we need the type */
 
+   retval = -EINVAL;
if (!type_page || !memchr(type_page, 0, PAGE_SIZE))
-   return -EINVAL;
+   goto nd_out;
 
+   retval = -EPERM;
/* for the rest we _really_ need capabilities... */
if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
+   goto nd_out;
 
+   retval = -ENODEV;
/* ... filesystem driver... */
fstype = get_fs_type(type_page);
if (!fstype)
-   return -ENODEV;
-
-   /* ... and mountpoint. Do the lookup first to force automounting. */
-   if (path_init(dir_name,
- LOOKUP_FOLLOW|LOOKUP_POSITIVE|LOOKUP_DIRECTORY, &nd))
-   retval = path_walk(dir_name, &nd);
-   if (retval)
-   goto fs_

[PATCH] more fs/super.c cleanups (1)

2001-06-05 Thread Alexander Viro

Linus, here's the next series of fs/super.c cleanups, cut into
small chunks. Patches are incremental.

Chunk #1:
Switches special case in do_umount() to do_remount_sb() (from
do_remount()); takes all per-superblock steps of remount into remount_sb().
That will allow to clean the lookup logics in the do_remount()/do_lookup()/
do_mount() (next chunk).

Please, apply.
Al

diff -urN S6-pre1/fs/super.c S6-pre1-do_remount/fs/super.c
--- S6-pre1/fs/super.c  Tue Jun  5 06:21:52 2001
+++ S6-pre1-do_remount/fs/super.c   Tue Jun  5 08:14:29 2001
@@ -55,7 +55,6 @@
 extern int root_mountflags;
 
 static int do_remount_sb(struct super_block *sb, int flags, char * data);
-static int do_remount(const char *dir, int flags, char * data);
 
 /* this is initialized in init/main.c */
 kdev_t ROOT_DEV;
@@ -923,6 +922,10 @@
if (!(flags & MS_RDONLY) && sb->s_dev && is_read_only(sb->s_dev))
return -EACCES;
/*flags |= MS_RDONLY;*/
+   if (flags & MS_RDONLY)
+   acct_auto_close(sb->s_dev);
+   shrink_dcache_sb(sb);
+   fsync_dev(sb->s_dev);
/* If we are remounting RDONLY, make sure there are no rw files open */
if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY))
if (!fs_may_remount_ro(sb))
@@ -1004,11 +1007,14 @@
 * call reboot(9). Then init(8) could umount root and exec /reboot.
 */
if (mnt == current->fs->rootmnt) {
+   int retval = 0;
/*
 * Special case for "unmounting" root ...
 * we just try to remount it readonly.
 */
-   return do_remount("/", MS_RDONLY, NULL);
+   if (!(sb->s_flags & MS_RDONLY))
+   retval = do_remount_sb(sb, MS_RDONLY, 0);
+   return retval;
}
 
spin_lock(&dcache_lock);
@@ -1202,24 +1208,14 @@
 
if (path_init(dir, LOOKUP_FOLLOW|LOOKUP_POSITIVE, &nd))
retval = path_walk(dir, &nd);
-   if (!retval) {
-   struct super_block * sb = nd.dentry->d_inode->i_sb;
-   retval = -ENODEV;
-   if (sb) {
-   retval = -EINVAL;
-   if (nd.dentry == sb->s_root) {
-   /*
-* Shrink the dcache and sync the device.
-*/
-   shrink_dcache_sb(sb);
-   fsync_dev(sb->s_dev);
-   if (flags & MS_RDONLY)
-   acct_auto_close(sb->s_dev);
-   retval = do_remount_sb(sb, flags, data);
-   }
-   }
-   path_release(&nd);
-   }
+   if (retval)
+   return retval;
+
+   retval = -EINVAL;
+   if (nd.dentry == nd.mnt->mnt_root)
+   retval = do_remount_sb(nd.mnt->mnt_sb, flags, data);
+
+   path_release(&nd);
return retval;
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-04 Thread Alexander Viro



On Sun, 3 Jun 2001 [EMAIL PROTECTED] wrote:

> What I did was: add a field  `char *mnt_symlink_prefix;'  to the
> struct vfsmount, fill it in super.c:add_vfsmnt(), use it in
> namei.c:vfs_follow_link(). Pick the value up by recognizing
> in super.c:do_mount() the option "symlink_prefix=" before
> giving the options to the separate filesystems.
> 
> [One could start a subdiscussion about that part. The mount(2)
> system call needs to transport vfs information and per-fs information.
> So far, the vfs information used flag bits only, but sooner or later
> we'll want to have strings, and need a vfs_parse_mount_options().
> Indeed, many filesystems today have uid= and gid= and umask= options
> that might be removed from the individual filesystems and put into vfs.
> After all, such options are also useful for (foreign) ext2 filesystems.]

_Please_, if we do anything of that kind - let's use a new syscall.
Ideally, I'd say
fs_fd = open("/fs/ext2", O_RDWR);
/* error -> no such filesystem */
write(fs_fd. "/dev/sda1", strlen("/dev/sda1"));
/* error handling */
write(fs_fd, "reserve=5", strlen());
...
dir = open("/usr/local", O_DIRECTORY);
/* error handling */
new_mount(dir, MNT_SET, fs_fd); /* closes dir and fs_fd */
/* error handling */

First open gives you a new channel. Preferably - wit datagram semantics (i.e.
write() boundaries are preserved). Then you convince fs driver to give you
fs. Then you mount it.

Notice that all cruft with "mount ncpfs and then use ioctls to authenticate"
goes away - authentication happens before you mount. Parsers are also easier
that way. Moreover, seeing what filesystem types are available is also trivial,
etc. We need only one special case - mounting that fstypefs. Fine, let's
make new_mount(dir, MNT_TYPES) do that.

BTW, bind and friends are also easy - it's
what = open(old, 0);
where = open(mountpoint, 0);
new_mount(where, MNT_BIND, what);

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-04 Thread Alexander Viro



On Sun, 3 Jun 2001 [EMAIL PROTECTED] wrote:

> > Current interface had grown an impressive collection of warts.
> > Worse yet, you _can't_ put parsing into generic code.
> > There are filesystems that have a binary object as 'data'.
> 
> Yes, that was a very unfortunate decision, back in the good old times
> when nfs was implemented. And smb, ncp, coda followed nfs.
> 
> Nevertheless, there is no problem adding vfs_parse_mount_options().
> For example, one can have a flag FS_HAS_BINARY_MOUNT_DATA in
> the fs_flags field of the struct file_system_type that describes
> the filesystem type, and refrain from trying to parse the mount data
> when this bit is set.

We can kludge around anything. The question being, what for? It still
leaves ncp with its ioctls ugliness. It still treats device name
in a special way for no good reason - it _is_ an option, just like any
other. Hell, less generic than nosuid or read-only. It still leaves us
with cruft in flags. What for? To maintain binary compatibility with
one utility? We can leave the old interface in place and freeze it.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-04 Thread Alexander Viro



On Sun, 3 Jun 2001 [EMAIL PROTECTED] wrote:

> [My version: keep interface constant, reorganize kernel source
> to do certain things in one place instead of in several places.
> Advantage: treatment becomes uniform and some options that make sense
> for all filesystem types but are available today for some only
> are generalized.
> Your version: invent a new interface, be silent about what happens
> inside the kernel.]

Current interface had grown an impressive collection of warts. Worse
yet, you _can't_ put parsing into generic code. There are filesystems
that have a binary object as 'data'. And there are filesystems that
do post-mount authentication via ioctls on root directory.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-02 Thread Alexander Viro



On Sun, 3 Jun 2001 [EMAIL PROTECTED] wrote:

> This evening I needed to work on a filesystem of a non-Linux OS,
> full of absolute symlinks. After mounting the fs on /mnt, each
> symlink pointing to /foo/bar in that filesystem should be
> regarded as pointing to /mnt/foo/bar.
> 
> Since doing ls -ld on every component of every pathname was
> far too slow, I made a small kernel wart, where a mount option
> -o symlink_prefix=/pathname would cause /pathname to be prepended
> in front of every absolute symlink in the given filesystem
> (when the symlink is followed). That works satisfactorily.
> 
> Remain the questions:
> (i) is there already a mechanism that would achieve this?
> (ii) if not, do we want something like this in the kernel?
> 
> There is already a vaguely similar (and much uglier) wart,
> namely that of "altroot". It is really ugly - requires a path
> set at kernel compile time. And the scope is different.
> Instead of all processes and a single filesystem and symlinks only,
> altroot affects a single process and all filesystems and all paths.
> 
> I do not immediately see a common generalization of these two.

altroot should be buried, not generalized. It was a mistake and
we will be better off forgetting about that nightmare instead of
trying to design something around it.

Absolute symlinks... Dunno. _If_ we want that at all, we probably
want it on per-mountpoint basis. However, that opens a door to
_really_ ugly feature requests. E.g. "if symlink starts with
/foo - replace it with /mnt/bar, but if it starts with /foo/baz -
replace with /mnt/splat instead".

I can see how to implement per-mountpoint variant. However, I'm
less than enthusiastic about the API side of that and about the
ugliness it will lead to. It smells like a wrong approach. And
no, I don't see a good one right now.

As for the API... How would you pass that option? Yet another
mount(2) argument?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: unmount issues with 2.4.5-ac5, 3ware, and ReiserFS (was: kernel-2.4.5

2001-06-01 Thread Alexander Viro



On Fri, 1 Jun 2001, Hans Reiser wrote:

> known VFS bug, ask viro for details, 2.4.5 is not stable because of it, use
> 2.4.4

Different issue. Missing lock_kernel()/unlock_kernel() in kill_super()
appeared in -pre6 and was fixed in -ac2 or so. -ac5 apparently had
introduced something new, that had been reverted (fixing the problem)
in -ac6.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.5-ac6

2001-06-01 Thread Alexander Viro



On Fri, 1 Jun 2001, Alan Cox wrote:

> o Fix the cs46xx right this time  (me)
> o Further FATfs cleanup   (OGAWA Hirofumi)
> o ISDN PPP code cleanup, cvs tag update   (Kai Germaschewski)
> o Large amount of UFS file system cleanup (Al Viro)

Tt's still broken on r/w. R/o should be OK now.

> o Move UFS file system to use dcache for metadata (Al Viro)

What???

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4 Kernel Oops and ls+rm segfaults

2001-06-01 Thread Alexander Viro



On Fri, 1 Jun 2001, Gregor Jasny wrote:

> Hi!
> 
> Can anyone tell me, where this oops came from?
> The machine is a HP NetServer II lc (EISA+PCI architecture).
> The distribution is a slackware 7.0 with parts of 7.1 and current.
> gcc: 2.95.4 20010319 (Debian prerelease)
> 
> I hope you can help me.

Pagecache corruption somewhere.
a) what filesystems do you have?
b) is the thing reproducable?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Configure.help is complete

2001-05-31 Thread Alexander Viro



On Thu, 31 May 2001, Arjan van de Ven wrote:

> José Luis Domingo López wrote:
> > 
> > On Thursday, 31 May 2001, at 13:24:54 -0400,
> > Eric S. Raymond wrote:
> > 
> > > It gives me great pleasure to announce that the Configure.help master
> > > file is now complete with respect to 2.4.5.  Every single one of the
> > > 2699 configuration symbols actually used in the 2.4.5 codebase's C
> > > source files or Makefiles now has an entry in Configure.help.
> > >
> > Would it be great to have a similar documentation for those hundreds of
> > "files" under /proc ?. Something like:
> 
> 
> Powertweak has descriptions for most of the usable /proc entries,
> in XML format but the descriptions are easily extractable. Maybe it's 
> a good idea to make the powertweak set complete instead / share the set
> with the kernel docs.

We should start removing the crap from procfs in 2.5. Documenting shit is
a good step, but taking it out would be better.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOPS with 2.4.5 [kernel BUG at inode.c:486]

2001-05-30 Thread Alexander Viro



On 30 May 2001, Trond Myklebust wrote:

> The reason we haven't seen this before is that we had 'force_delete'
> that would always set i_nlink = 0. Unfortunately force_delete is toxic
> to mmap(), as it will discard any dirty pages rather than flushing
> them to storage, so it was removed in the 2.4.5-pre series...
> 
> Al: Is there any reason why the cases
> 
>   if (!inode->i_nlink)
> 
> and the 'magic nfs path' should be treated differently? Personally,
> I'd rather prefer to merge the 2.

I don't think that it's a good idea. Why not fry the cache explicitly
when you invalidate the inode?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] 84 bugs in 2.4.4/2.4.4-ac8 where NULL pointers arederef'd

2001-05-29 Thread Alexander Viro



On Wed, 30 May 2001, Andreas Dilger wrote:

> > b) doesn't fix anything that could be triggered - ext2_delete_entry()
> > can happen only if you've already done lookup. I.e. no problems had been
> > found in that block back when we were finding the entry.
> 
> That means there is no need to check dir in ext2_check_dir_entry(),
> is there?  If all callers to ext2_delete_entry() already verify the
> buffer in ext2_find_entry() (which they appear to do), then there is
> no point in calling ext2_check_dir_entry() at all.

 modulo memory corruption right under you - yes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] 84 bugs in 2.4.4/2.4.4-ac8 where NULL pointers arederef'd

2001-05-29 Thread Alexander Viro



On Tue, 29 May 2001, Dawson Engler wrote:

> [BUG]  seems like it.  it's not guarded.  or is there some weird dependence?
> /u2/engler/mc/oses/linux/2.4.4-ac8/fs/ext2/dir.c:61:ext2_check_dir_entry: 
>ERROR:INTERNAL_NULL:53:61: [type=set] (set at line 53) Dereferencing NULL ptr "dir" 
>illegally!

No, it's simply a lump of fossilized crap. However, adding one more check
here is not a solution - it only adds to ugliness. The real fix is to get
rid of checking simgle entries and do all checks when we read the page -
at that point we obviously have the inode. Same goes for the second one.

Patch is available - see ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz
It's going to be very early 2.5.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOPS with 2.4.5 [kernel BUG at inode.c:486]

2001-05-29 Thread Alexander Viro



On Tue, 29 May 2001, Gergely Tamas wrote:
 
> Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says 
>c01c4020, System.map says c0154160.  Ignoring ksyms_base entry
> kernel BUG at inode.c:486!

[snip]

_Lovely_. NFS, apparently on revalidate path, doesn't care to hold on
the unhashed inode until its pages are gone.

Trond?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: BUG?: 2.4.5 breaks reiserfs (kernel BUG)

2001-05-27 Thread Alexander Viro



On Sun, 27 May 2001, Bjerkeset, Svein Olav wrote:

> Hi,
> 
> Today I downloaded kernel 2.4.5 and compiled it. The kernel worked fine
> until
> I tried to halt the computer. When trying to unmount the reiserfs
> filesystems,
> the system freezes with the following output:
> 
> journal_begin called without kernel lock held
> kernel BUG at journal.c:423!

Yes. My fault - badly merged patch in -pre6, actually.
Details:
* kill_super() gets called without BKL, but doesn't grab BKL around
the calls of ->write_super() and ->put_super()
* by the time when it calls these methods filesystem is quiet. I.e.
nothing else has a chance to touch its data structures. So actually only
reiserfs (which checks that we hold BKL) had noticed.
* It _is_ a bug - changing locking rules is for 2.5.

Fix:
--- fs/super.c  Fri May 25 21:51:14 2001
+++ fs/super.c  Sun May 27 00:21:53 2001
@@ -873,6 +873,7 @@
}
spin_unlock(&dcache_lock);
down_write(&sb->s_umount);
+   lock_kernel();
sb->s_root = NULL;
/* Need to clean after the sucker */
if (fs->fs_flags & FS_LITTER)
@@ -901,6 +902,7 @@
put_filesystem(fs);
sb->s_type = NULL;
unlock_super(sb);
+   unlock_kernel();
up_write(&sb->s_umount);
if (bdev) {
blkdev_put(bdev, BDEV_FS);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux-2.4.5 and Reiserfs, oops!

2001-05-26 Thread Alexander Viro



On Sat, 26 May 2001, Chris Rankin wrote:

> Well the first thing I checked was vanilla 2.4.5, and I managed to
> bring that down hard too. It has nothing at all to do with reiserfs,
> but may be related to USB instead. I have been able to reproduce the
> problem by doing the following:
> 
> a) Booting with X on vc/2
> b) Logging into vc/6 instead
> c) Mounting a filesystem on my USB Zip drive
> d) Unmounting the filesystem again
> e) Unmounting the NFS mount
> f) Executing "rmmod -a" twice to clean out the now-unused modules
> (e.g. sd_mod, scsi_mod, usb-storage)
> g) Trying to switch back to vc/2
> h) Oops!
> 
> 2.4.4 seems OK; I guess I'll have to build those -pre kernels now.

Interesting. See Pete's posting in thread about USB problems - it
may be related.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Linux-2.4.5 and Reiserfs, oops!

2001-05-26 Thread Alexander Viro



On Sat, 26 May 2001, Chris Rankin wrote:

> Hi,
> 
> Thanks for the patch; I successfully unmounted my reiserfs USB Zip 250
> MB disc. However, the box then locked up hard when I unmounted an NFS
> mount and tried to switch to another virtual console.

That's... interesting. With that patch changes to fs/super.c should make
no difference whatsoever.

OK, can you reproduce NFS lockup on 2.4.5-pre5 (without that patch)
and on 2.4.5-pre3 (ditto)? 

There were NFS changes in -pre4 and -pre5 and umount ones in -pre6. The
latter need the patch I've posted, so vanilla -pre5 and -pre3 are the
first candidates for checking.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[FIX] Re: umount segfault on shutdown

2001-05-26 Thread Alexander Viro



On Sat, 26 May 2001, Gavin wrote:

> Hi,
> Hope this is enough info :P
> 
> Unmounting file systems: journal_begin called without kernel lock held
> kernel BUG at journal.c:423!


--- fs/super.c  Fri May 25 21:51:14 2001
+++ fs/super.c.new  Sun May 27 00:21:53 2001
@@ -873,6 +873,7 @@
}
spin_unlock(&dcache_lock);
down_write(&sb->s_umount);
+   lock_kernel();
sb->s_root = NULL;
/* Need to clean after the sucker */
if (fs->fs_flags & FS_LITTER)
@@ -901,6 +902,7 @@
put_filesystem(fs);
sb->s_type = NULL;
unlock_super(sb);
+   unlock_kernel();
up_write(&sb->s_umount);
if (bdev) {
blkdev_put(bdev, BDEV_FS);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux-2.4.5, reiserfs, Oops!

2001-05-26 Thread Alexander Viro



On Sat, 26 May 2001, Chris Rankin wrote:

> Linux 2.4.5, SMP, devfs, < 1 GB memory, compiled with gcc-2.95.3
 
> drive. I didn't do anything clever with parameters or anything; just
> "mkreisferfs /dev/sda1", mounted it and then unmounted it again. And
> the kernel oopsed on me.

Bloody hell. 
--- fs/super.c   Fri May 25 21:51:14 2001
+++ fs/super.c.new Sun May 27 00:21:53 2001
@@ -873,6 +873,7 @@
}
spin_unlock(&dcache_lock);
down_write(&sb->s_umount);
+   lock_kernel();
sb->s_root = NULL;
/* Need to clean after the sucker */
if (fs->fs_flags & FS_LITTER)
@@ -901,6 +902,7 @@
put_filesystem(fs);
sb->s_type = NULL;
unlock_super(sb);
+   unlock_kernel();
up_write(&sb->s_umount);
if (bdev) {
blkdev_put(bdev, BDEV_FS);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kernel BUG at inode.c:654!

2001-05-26 Thread Alexander Viro



On Sat, 26 May 2001, Santiago Garcia Mantinan wrote:

> Hi!
> 
> That's what my server, wich is running 2.4.5, was shouting when I pluged in
> my laptop at the console (ttyS0), all I could do was copy the output I was
> seeing on minicom to a file, after rebooting I saw that the kernel had left
> some of the logging on kern.log, so I'm attaching a file with both the stuff
> on the console and the ones on the log.
> 
> The machine is an intel pentium 166 with 48 megs of mem, it has an stock
> 2.4.5 kernel with netfilter patches for the irc NAT, even though this
> patches were working ok on 2.4.4 and don't seem to have anything to do with
> this problem, I'm recompiling an stock 2.4.5 now, just to be sure.
> 
> Well, I don't know what else to say, if I'm missing something you want to
> know, don't hesitate to ask.

Lovely... It's one of the long lists and these asserts (lines 650 and 654)
are exactly what would happen if it was corrupted at some place. OTOH, it
may be for real - i.e. real inodes in wrong state getting on the list, rather
than corrupted pointer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[CFT][PATCH] namespaces patch (2.4.5-pre6)

2001-05-25 Thread Alexander Viro

Folks, new version of the patch is on
ftp.math.psu.edu/pub/viro/namespaces-c-S5-pre6.gz

News:
* ported to 2.4.5-pre6
* new (cleaner) locking mechanism
* lock_super() is starting to become fs-private thing - first steps to
  removing it from VFS code are done.

Please, help with testing. I'm feeding the pieces suitable for 2.4 into
the Linus' tree, so patch got smaller.

It works here(tm). It had survived rather sadistic tortu^Wtesting, but I
am _very_ interested in more eyes going through the thing and more people
giving it a beating.
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] (part 7) fs/super.c cleanups

2001-05-25 Thread Alexander Viro

Handling of refcounts for FS_SINGLE filesystems moved to
add_vfsmnt(). That's the first half of real fix for FS_SINGLE mess -
we should make it "read_super() if we hadn't done it yet, otherwise
return what we have". That will make kern_mount() uses simpler and
remove all special-casing with refcounts. in the hindsight, the trick
I've used in 2.4.0-test2 merge was ugly - kern_mount() should be used
only when kernel explicitly asks for a vfsmount of its own, not as
as part of init for FS_SINGLE filesystems. Fix is easy, but that chunk
touches several files besides fs/super.c and requires sane locking
to be safe. Patch below is the preliminary part local to fs/super.c.

Please, apply.

diff -urN S5-pre6-kern_mount/fs/super.c S5-pre6-single1/fs/super.c
--- S5-pre6-kern_mount/fs/super.c   Fri May 25 15:07:19 2001
+++ S5-pre6-single1/fs/super.c  Fri May 25 15:12:36 2001
@@ -367,6 +367,8 @@
list_add(&mnt->mnt_instances, &sb->s_mounts);
list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
+   if (sb->s_type->fs_flags & FS_SINGLE)
+   get_filesystem(sb->s_type);
 out:
return mnt;
 fail:
@@ -852,7 +854,6 @@
sb = fs_type->kern_mnt->mnt_sb;
if (!sb)
BUG();
-   get_filesystem(fs_type);
do_remount_sb(sb, flags, data);
return sb;
 }
@@ -1165,8 +1166,6 @@
goto out2;
 
err = -ENOMEM;
-   if (old_nd.mnt->mnt_sb->s_type->fs_flags & FS_SINGLE)
-   get_filesystem(old_nd.mnt->mnt_sb->s_type);

down(&mount_sem);
/* there we go */
@@ -1177,8 +1176,6 @@
err = 0;
up(&new_nd.dentry->d_inode->i_zombie);
up(&mount_sem);
-   if (err && old_nd.mnt->mnt_sb->s_type->fs_flags & FS_SINGLE)
-   put_filesystem(old_nd.mnt->mnt_sb->s_type);
 out2:
path_release(&new_nd);
 out1:
@@ -1369,8 +1366,6 @@
return retval;
 
 fail:
-   if (fstype->fs_flags & FS_SINGLE)
-   put_filesystem(fstype);
kill_super(sb);
goto unlock_out;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] (part 6) fs/super.c cleanups

2001-05-25 Thread Alexander Viro

Expands add_vfsmnt() call in kern_mount(), takes alloc_vfsmnt()
before reading superblock and makes (in add_vfsmnt()) insertion into
vfsmntlist unconditional (kern_mount()) was the only case when we didn't
want it to happen. Moreover, recovery in kern_mount() becomes simpler.

Please, apply.
diff -urN S5-pre6-alloc_vfsmnt/fs/super.c S5-pre6-kern_mount/fs/super.c
--- S5-pre6-alloc_vfsmnt/fs/super.c Fri May 25 04:13:30 2001
+++ S5-pre6-kern_mount/fs/super.c   Fri May 25 15:07:19 2001
@@ -365,8 +365,7 @@
mnt->mnt_parent = mnt;
}
list_add(&mnt->mnt_instances, &sb->s_mounts);
-   if (nd || dev_name)
-   list_add(&mnt->mnt_list, vfsmntlist.prev);
+   list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
 out:
return mnt;
@@ -945,21 +944,31 @@
 
 struct vfsmount *kern_mount(struct file_system_type *type)
 {
-   kdev_t dev = get_unnamed_dev();
struct super_block *sb;
-   struct vfsmount *mnt;
-   if (!dev)
+   struct vfsmount *mnt = alloc_vfsmnt();
+   kdev_t dev;
+
+   if (!mnt)
+   return ERR_PTR(-ENOMEM);
+
+   dev = get_unnamed_dev();
+   if (!dev) {
+   kfree(mnt);
return ERR_PTR(-EMFILE);
+   }
sb = read_super(dev, NULL, type, 0, NULL, 0);
if (!sb) {
put_unnamed_dev(dev);
+   kfree(mnt);
return ERR_PTR(-EINVAL);
}
-   mnt = add_vfsmnt(NULL, sb->s_root, NULL);
-   if (!mnt) {
-   kill_super(sb);
-   return ERR_PTR(-ENOMEM);
-   }
+   mnt->mnt_sb = sb;
+   mnt->mnt_root = dget(sb->s_root);
+   mnt->mnt_mountpoint = mnt->mnt_root;
+   mnt->mnt_parent = mnt;
+   spin_lock(&dcache_lock);
+   list_add(&mnt->mnt_instances, &sb->s_mounts);
+   spin_unlock(&dcache_lock);
type->kern_mnt = mnt;
return mnt;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] (part 5) fs/super.c cleanups

2001-05-25 Thread Alexander Viro

Takes allocation/initalization of vfsmounts into separate function.
We will need this separation to deal with several places where we need
a non-blocking (and non-failing) equivalent of add_vfsmnt(). There allocation
will be done outside of critical area.

Please, apply.

diff -urN S5-pre6-MNT_VISIBLE/fs/super.c S5-pre6-alloc_vfsmnt/fs/super.c
--- S5-pre6-MNT_VISIBLE/fs/super.c  Thu May 24 23:57:23 2001
+++ S5-pre6-alloc_vfsmnt/fs/super.c Fri May 25 04:13:30 2001
@@ -282,6 +282,21 @@
 
 static LIST_HEAD(vfsmntlist);
 
+struct vfsmount *alloc_vfsmnt(void)
+{
+   struct vfsmount *mnt = kmalloc(sizeof(struct vfsmount), GFP_KERNEL); 
+   if (mnt) {
+   memset(mnt, 0, sizeof(struct vfsmount));
+   atomic_set(&mnt->mnt_count,1);
+   INIT_LIST_HEAD(&mnt->mnt_clash);
+   INIT_LIST_HEAD(&mnt->mnt_child);
+   INIT_LIST_HEAD(&mnt->mnt_mounts);
+   INIT_LIST_HEAD(&mnt->mnt_list);
+   mnt->mnt_owner = current->uid;
+   }
+   return mnt;
+}
+
 static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd)
 {
old_nd->dentry = mnt->mnt_mountpoint;
@@ -324,10 +339,9 @@
struct super_block *sb = root->d_inode->i_sb;
char *name;
 
-   mnt = kmalloc(sizeof(struct vfsmount), GFP_KERNEL);
+   mnt = alloc_vfsmnt();
if (!mnt)
goto out;
-   memset(mnt, 0, sizeof(struct vfsmount));
 
/* It may be NULL, but who cares? */
if (dev_name) {
@@ -337,8 +351,6 @@
mnt->mnt_devname = name;
}
}
-   mnt->mnt_owner = current->uid;
-   atomic_set(&mnt->mnt_count,1);
mnt->mnt_sb = sb;
 
spin_lock(&dcache_lock);
@@ -351,10 +363,7 @@
} else {
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
-   INIT_LIST_HEAD(&mnt->mnt_child);
-   INIT_LIST_HEAD(&mnt->mnt_clash);
}
-   INIT_LIST_HEAD(&mnt->mnt_mounts);
list_add(&mnt->mnt_instances, &sb->s_mounts);
if (nd || dev_name)
list_add(&mnt->mnt_list, vfsmntlist.prev);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] (part 4) fs/super.c cleanup

2001-05-25 Thread Alexander Viro

* MNT_VISIBLE is gone. We simply do not insert vfsmounts we don't
want to see into the vfsmntlist. The only place where it is used is
get_filesystem_info(), so it's obviously correct.

Please, apply.

PS: I've done a different locking scheme for superblocks, so right
now I'm testing it on a complete patch. I.e. that part is postponed until
it gets some testing. So the next several pieces will be just a bunch
of trivial cleanups.

diff -urN S5-pre6/fs/super.c S5-pre6-MNT_VISIBLE/fs/super.c
--- S5-pre6/fs/super.c  Thu May 24 22:15:03 2001
+++ S5-pre6-MNT_VISIBLE/fs/super.c  Thu May 24 23:57:23 2001
@@ -314,13 +314,6 @@
  * Potential reason for failure (aside of trivial lack of memory) is a
  * deleted mountpoint. Caller must hold ->i_zombie on mountpoint
  * dentry (if any).
- *
- * Node is marked as MNT_VISIBLE (visible in /proc/mounts) unless both
- * @nd and @devname are %NULL. It works since we pass non-%NULL @devname
- * when we are mounting root and kern_mount() filesystems are deviceless.
- * If we will get a kern_mount() filesystem with nontrivial @devname we
- * will have to pass the visibility flag explicitly, so if we will add
- * support for such beasts we'll have to change prototype.
  */
 
 static struct vfsmount *add_vfsmnt(struct nameidata *nd,
@@ -336,9 +329,6 @@
goto out;
memset(mnt, 0, sizeof(struct vfsmount));
 
-   if (nd || dev_name)
-   mnt->mnt_flags = MNT_VISIBLE;
-
/* It may be NULL, but who cares? */
if (dev_name) {
name = kmalloc(strlen(dev_name)+1, GFP_KERNEL);
@@ -366,7 +356,8 @@
}
INIT_LIST_HEAD(&mnt->mnt_mounts);
list_add(&mnt->mnt_instances, &sb->s_mounts);
-   list_add(&mnt->mnt_list, vfsmntlist.prev);
+   if (nd || dev_name)
+   list_add(&mnt->mnt_list, vfsmntlist.prev);
spin_unlock(&dcache_lock);
 out:
return mnt;
@@ -500,8 +491,6 @@
 
for (p = vfsmntlist.next; p != &vfsmntlist; p = p->next) {
struct vfsmount *tmp = list_entry(p, struct vfsmount, mnt_list);
-   if (!(tmp->mnt_flags & MNT_VISIBLE))
-   continue;
path = d_path(tmp->mnt_root, tmp, buffer, PAGE_SIZE);
if (!path)
continue;
diff -urN S5-pre6/include/linux/mount.h S5-pre6-MNT_VISIBLE/include/linux/mount.h
--- S5-pre6/include/linux/mount.h   Thu May 24 22:15:06 2001
+++ S5-pre6-MNT_VISIBLE/include/linux/mount.h   Thu May 24 23:58:00 2001
@@ -12,8 +12,6 @@
 #define _LINUX_MOUNT_H
 #ifdef __KERNEL__
 
-#define MNT_VISIBLE1
-
 struct vfsmount
 {
struct dentry *mnt_mountpoint;  /* dentry of mountpoint */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   4   5   6   7   8   9   10   >