Re: [RFC 00/26] VFS based Union Mount (V2)

2007-08-01 Thread Bharata B Rao
On Mon, Jul 30, 2007 at 06:13:23PM +0200, Jan Blunck wrote:
> Here is another post of the VFS based union mount implementation. Unlike the
> traditional mount which hides the contents of the mount point, union mounts
> present the merged view of the mount point and the mounted filesytem.

Doesn't compile without CONFIG_DEBUG_UNION_MOUNT.

fs/namei.c: In function `hash_lookup_union':
fs/namei.c:1798: error: implicit declaration of function `UM_DEBUG_LOOKUP'
make[1]: *** [fs/namei.o] Error 1

Regards,
Bharata.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Ph. Marek
On Mittwoch, 1. August 2007, Josef Sipek wrote:
> Alright not the greatest of examples, there is something to be said about
> symmetry, so...let me try again :)
...
> Oops! There's a whiteout in /b that hides the directory in /c -- rename(2)
> shouldn't make directory subtrees disappear.
>
> There are two ways to solve this:
>
> 1) "cp -r" the entire subtree ...
>
> 2) Don't store whiteouts within branches ...
Sorry for making uninformed guesses, but if there are already special nodes 
(whiteout), why not extending them to some more general format - specifying a 
(source, destination) pair at the topmost level?
- A delete is a (source, NULL) pair
- A rename is a (source, destination) pair, which causes lookups on source to
  use the string destination in the lower branches.


Would that work?


Regards,

Phil

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 11/26] tmpfs white-out support

2007-08-01 Thread Matt Mackall
On Wed, Aug 01, 2007 at 04:13:46PM +0100, Hugh Dickins wrote:
> On Mon, 30 Jul 2007, Jan Blunck wrote:
> 
> > Introduce white-out support to tmpfs.
> > 
> > Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> > ---
> >  include/linux/shmem_fs.h |1 
> >  mm/shmem.c   |   54 
> > +++
> >  2 files changed, 55 insertions(+)
> 
> I see there's debate about whether this (and its fellows) give the
> right semantic to whiteouts; and I've not begun to think about that.
> 
> But as a patch to tmpfs for what you're trying to do, it looks just
> about fine.  I say "just about" because the reference counting looks
> right, but I wouldn't dare say that it _is_ right without testing.
> 
> And I'd probably want to add a minor adjustment, so that a mount with
> nr_inodes=1000 could still support exactly 1000 inodes, despite your
> allocating one for the whiteout (usually never used) at mount time.
> But that can follow along later, no problem.

Also, you might want to make sure whiteouts work with ramfs, which
replaces tmpfs when tmpfs is disabled.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kupdate weirdness

2007-08-01 Thread David Chinner
On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote:
> The following strange behavior can be observed:
> 
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
> 
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
> 
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It's been doing this for a long time.

http://marc.info/?l=linux-kernel&m=113919849421679&w=2

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Dave Kleikamp writes:
> On Wed, 2007-08-01 at 15:33 -0400, Josef Sipek wrote:
> > On Wed, Aug 01, 2007 at 02:10:31PM -0500, Dave Kleikamp wrote:
> > > On Wed, 2007-08-01 at 14:44 -0400, Josef Sipek wrote:
> > > > Now what? How do you rename? Do you rename in the same branch (assuming 
> > > > it
> > > > is rw)?
> > > 
> > > Er, no.  According to Documentation/filesystems/union-mounts.txt, "only
> > > the topmost layer of the mount stack can be altered".
> > 
> > This brings up an very interesting (but painful) question...which makes more
> > sense? Allowing the modifications in only the top-most branch, or any branch
> > (given the user allows it at mount-time)?
> 
> Your examples point out the complexity of trying to allow modifications
> at lower levels.  It seems to me to be simpler (even if recursive copies
> are needed) to leave it as proposed.
[...]

There are three other reasons why Unionfs and our users like to have
multiple writable branches:

1. If only the topmost layer is writable, then every little change tends to
   cause a copyup, which tends to clutter the top layer more quickly.  Some
   of our users didn't like that idea, while others explicitly wanted it --
   so we give them a choice to decide, on a per layer/branch whether it
   should be writable or readonly.

2. Some users unify different packages together.  Imagine you union under
   /union, several installed packages: /X11R6/{bin,man,lib,conf},
   /apache/{bin,man,lib,etc}, and /mysql/{bin,man,lib,etc}, and so on.  If a
   user modifies /union/apache/etc/apache.conf, they sometimes want
   apache.conf to remain in the writable branch it came from, not copied up.
   That way all apache related files are logically left where they came
   from, which makes administration easier.  Again, some users like to have
   multiple writable branches, and some don't -- so in Unionfs we give them
   the choice.  And yes, it does make our implementation more complex.

3. Some people use Unionfs in the scenario described in point #2 above, as a
   poor man's space- and load- distribution system.  Some of our users like
   the idea of controlling how much storage space they give each branch, and
   how much it might grow, and even how much CPU or I/O load might be placed
   on each of the lower filesystems which serve a given branch.  That way
   they worry less about the top-layer's space filling up more quickly than
   expected.  Now Unionfs was never designed to be a load-balancing f/s (we
   have RAIF for that, see ),
   but users seems to always find creative ways to [ab]use one's software in
   ways one never thought of. :-)

BTW, does Union Mounts copyup on meta-data changes (e.g., chmod, chgrp,
etc.)?

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kupdate weirdness

2007-08-01 Thread Andrew Morton
On Wed, 01 Aug 2007 22:45:16 +0200
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> The following strange behavior can be observed:
> 
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
> 
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
> 
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It does all sorts of weird crap.

> Since there will probably be a recently dirtied inode on s_io, this
> will take some of time, but always less than 30 sec.
> 
> I don't know what's the easiest solution.
> 
> Any ideas?

Try 2.6.23-rc1-mm2.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kupdate weirdness

2007-08-01 Thread Miklos Szeredi
The following strange behavior can be observed:

1. large file is written
2. after 30 seconds, nr_dirty goes down by 1024
3. then for some time (< 30 sec) nothing happens (disk idle)
4. then nr_dirty again goes down by 1024
5. repeat from 3. until whole file is written

So basically a 4Mbyte chunk of the file is written every 30 seconds.
I'm quite sure this is not the intended behavior.

The reason seems to be that __sync_single_inode() will move the
partially written inode from s_io onto s_dirty, and sync_sb_inode()
will not splice it back onto s_io until the rest of the inodes on s_io
has been processed.

Since there will probably be a recently dirtied inode on s_io, this
will take some of time, but always less than 30 sec.

I don't know what's the easiest solution.

Any ideas?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Dave Kleikamp
On Wed, 2007-08-01 at 15:33 -0400, Josef Sipek wrote:
> On Wed, Aug 01, 2007 at 02:10:31PM -0500, Dave Kleikamp wrote:
> > On Wed, 2007-08-01 at 14:44 -0400, Josef Sipek wrote:
> > > Now what? How do you rename? Do you rename in the same branch (assuming it
> > > is rw)?
> > 
> > Er, no.  According to Documentation/filesystems/union-mounts.txt, "only
> > the topmost layer of the mount stack can be altered".
> 
> This brings up an very interesting (but painful) question...which makes more
> sense? Allowing the modifications in only the top-most branch, or any branch
> (given the user allows it at mount-time)?

Your examples point out the complexity of trying to allow modifications
at lower levels.  It seems to me to be simpler (even if recursive copies
are needed) to leave it as proposed.

> This is really question to the community at large, not just you, Dave :)

I agree, but I have to add my $.02.

> > > 1) "cp -r" the entire subtree being renamed to highest-priority branch, 
> > > and
> > > rename there (you might have to recreate a series of directories to have a
> > > place to "cp" to...so you got "cp -r" _AND_ "mkdir -p"-like code in the 
> > > VFS!
> > > 1/2 a :) )
> > 
> > I think this is the only alternative, given the design.
> 
> Right. Doing something like this at the filesystem level (as we do in
> unionfs) seems less painful - filesystems are places full of all sorts of
> nefarious activities to begin with. Having it in the VFS seems...even
> uglier.

I haven't looked at either implementation close enough to offer an
opinion here that I would be able to defend.  I'm sure others have their
opinions.

> Josef 'Jeff' Sipek.
> 

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Josef Sipek
On Wed, Aug 01, 2007 at 02:10:31PM -0500, Dave Kleikamp wrote:
> On Wed, 2007-08-01 at 14:44 -0400, Josef Sipek wrote:
> > Alright not the greatest of examples, there is something to be said about
> > symmetry, so...let me try again :)
> > 
> > /a/
> > /b/bar  (whiteout for bar)
> > /c/foo/qwerty
> > 
> > Now, let's mount a union of {a,b,c}, and we'll see:
> > 
> > $ find /u
> > /u
> > /u/foo
> > /u/foo/qwerty
> > $ mv /u/foo /u/bar
> > 
> > Now what? How do you rename? Do you rename in the same branch (assuming it
> > is rw)?
> 
> Er, no.  According to Documentation/filesystems/union-mounts.txt, "only
> the topmost layer of the mount stack can be altered".
 
This brings up an very interesting (but painful) question...which makes more
sense? Allowing the modifications in only the top-most branch, or any branch
(given the user allows it at mount-time)?

This is really question to the community at large, not just you, Dave :)

> > 1) "cp -r" the entire subtree being renamed to highest-priority branch, and
> > rename there (you might have to recreate a series of directories to have a
> > place to "cp" to...so you got "cp -r" _AND_ "mkdir -p"-like code in the VFS!
> > 1/2 a :) )
> 
> I think this is the only alternative, given the design.
 
Right. Doing something like this at the filesystem level (as we do in
unionfs) seems less painful - filesystems are places full of all sorts of
nefarious activities to begin with. Having it in the VFS seems...even
uglier.

Josef 'Jeff' Sipek.

-- 
*NOTE: This message is ROT-13 encrypted twice for extra protection*
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Dave Kleikamp
On Wed, 2007-08-01 at 14:44 -0400, Josef Sipek wrote:
> Alright not the greatest of examples, there is something to be said about
> symmetry, so...let me try again :)
> 
> /a/
> /b/bar(whiteout for bar)
> /c/foo/qwerty
> 
> Now, let's mount a union of {a,b,c}, and we'll see:
> 
> $ find /u
> /u
> /u/foo
> /u/foo/qwerty
> $ mv /u/foo /u/bar
> 
> Now what? How do you rename? Do you rename in the same branch (assuming it
> is rw)?

Er, no.  According to Documentation/filesystems/union-mounts.txt, "only
the topmost layer of the mount stack can be altered".

> If you do, you'll get:
> 
> $ find /u
> /u
> 
> Oops! There's a whiteout in /b that hides the directory in /c -- rename(2)
> shouldn't make directory subtrees disappear.
> 
> There are two ways to solve this:
> 
> 1) "cp -r" the entire subtree being renamed to highest-priority branch, and
> rename there (you might have to recreate a series of directories to have a
> place to "cp" to...so you got "cp -r" _AND_ "mkdir -p"-like code in the VFS!
> 1/2 a :) )

I think this is the only alternative, given the design.

> 2) Don't store whiteouts within branches. This makes it really easy to
> rename and remove the whiteout.
> 
> Sure, you could try to rename in-place and remove the whiteout, but what if
> you have:
> 
> /a/
> /b/bar(whiteout)
> /c/bar/blah
> /d/foo/qwerty
> 
> $ mv /u/foo /u/bar
> 
> You can't just remove the whiteout, because that'd uncover the whited-out
> directory bar in /c.
> 
> Josef 'Jeff' Sipek.
> 
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Josef Sipek
On Wed, Aug 01, 2007 at 10:23:29AM -0500, Dave Kleikamp wrote:
> On Tue, 2007-07-31 at 13:11 -0400, Josef Sipek wrote:
> > On Tue, Jul 31, 2007 at 07:00:12PM +0200, Jan Blunck wrote:
> > > On Tue, Jul 31, Josef Sipek wrote:
> > > 
> > > > On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > > > > Introduce white-out support to ext2.
> > > > 
> > > > I think storing whiteouts on the branches is wrong. It creates all sort 
> > > > of
> > > > nasty cases when people actually try to use unioning. Imagine a (no-so
> > > > unlikely) scenario where you have 2 unions, and they share a branch. If 
> > > > you
> > > > create a whiteout in one union on that shared branch, the whiteout 
> > > > magically
> > > > affects the other union as well! Whiteouts are a union-level construct, 
> > > > and
> > > > therefore storing them at the branch level is wrong.
> > > 
> > > So you think that just because you mounted the filesystem somewhere else 
> > > it
> > > should look different? This is what sharing is all about. If you share a
> > > filesystem you also share the removal of objects.
> > 
> > The removal happens at the union level, not the branch level. Say you have:
> > 
> > /a/
> > /b/foo
> > /c/foo
> > 
> > And you mount /u1 as a union of {a,b}, and /u2 as union of {a,c}.
> 
> Who does this?  I'm assuming that a is the "top" layer.  Aren't union
> mounts typically about sharing lower layers and having a separate rw
> layer for each union mount?
 
Alright not the greatest of examples, there is something to be said about
symmetry, so...let me try again :)

/a/
/b/bar  (whiteout for bar)
/c/foo/qwerty

Now, let's mount a union of {a,b,c}, and we'll see:

$ find /u
/u
/u/foo
/u/foo/qwerty
$ mv /u/foo /u/bar

Now what? How do you rename? Do you rename in the same branch (assuming it
is rw)? If you do, you'll get:

$ find /u
/u

Oops! There's a whiteout in /b that hides the directory in /c -- rename(2)
shouldn't make directory subtrees disappear.

There are two ways to solve this:

1) "cp -r" the entire subtree being renamed to highest-priority branch, and
rename there (you might have to recreate a series of directories to have a
place to "cp" to...so you got "cp -r" _AND_ "mkdir -p"-like code in the VFS!
1/2 a :) )

2) Don't store whiteouts within branches. This makes it really easy to
rename and remove the whiteout.

Sure, you could try to rename in-place and remove the whiteout, but what if
you have:

/a/
/b/bar  (whiteout)
/c/bar/blah
/d/foo/qwerty

$ mv /u/foo /u/bar

You can't just remove the whiteout, because that'd uncover the whited-out
directory bar in /c.

Josef 'Jeff' Sipek.

-- 
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Josef Sipek
On Wed, Aug 01, 2007 at 07:58:49PM +0200, Jan Engelhardt wrote:
> 
> On Jul 31 2007 12:36, Josef Sipek wrote:
> >[2] http://www.filesystems.org/unionfs-odf.txt
> 
> >Instead, the new ODF code stores whiteouts as hardlinks to a special
> >(regular) zero-length file in odf (/odf/whiteout), and it stores opaqueness
> >information for directories in the inode GID bits in an ODF file system
> >(e.g., ext2, XFS, etc.) on the local machine.  This avoids the name-space
> >pollution and avoids races with network file systems, while minimizing inode
> >consummation in /odf.
> 
> Inode GID bits - are you reducing my 32 bits of gid_t to 31 bits?
> That does not work out either.

No. The ODF code just uses the GID bits to store extra info. The GID is
_NOT_ used to store the GID of the file. The GID of the file is still coming
from the branches.

Josef 'Jeff' Sipek.

-- 
I abhor a system designed for the "user", if that word is a coded pejorative
meaning "stupid and unsophisticated."
- Ken Thompson
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Jan Engelhardt

On Aug 1 2007 12:00, Hans-Peter Jansen wrote:
>
>*) The amount of administration work of any (necessary, unfortunately) 
>VMware XP instance running on top of those diskless clients excels that of 
>all diskless clients by an order of magnitude.

Hardly :)
Install XP, snapshot it when done. Copy .vmdk to 'all' machines.
On security upgrades, revert to snapshot (well - if the workflow allows it),
install, snapshot again. Etc.
Work: 1 1/2.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Jan Engelhardt

On Jul 31 2007 12:36, Josef Sipek wrote:
>[2] http://www.filesystems.org/unionfs-odf.txt

>Instead, the new ODF code stores whiteouts as hardlinks to a special
>(regular) zero-length file in odf (/odf/whiteout), and it stores opaqueness
>information for directories in the inode GID bits in an ODF file system
>(e.g., ext2, XFS, etc.) on the local machine.  This avoids the name-space
>pollution and avoids races with network file systems, while minimizing inode
>consummation in /odf.

Inode GID bits - are you reducing my 32 bits of gid_t to 31 bits?
That does not work out either.



Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Dave Kleikamp
On Tue, 2007-07-31 at 13:11 -0400, Josef Sipek wrote:
> On Tue, Jul 31, 2007 at 07:00:12PM +0200, Jan Blunck wrote:
> > On Tue, Jul 31, Josef Sipek wrote:
> > 
> > > On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > > > Introduce white-out support to ext2.
> > > 
> > > I think storing whiteouts on the branches is wrong. It creates all sort of
> > > nasty cases when people actually try to use unioning. Imagine a (no-so
> > > unlikely) scenario where you have 2 unions, and they share a branch. If 
> > > you
> > > create a whiteout in one union on that shared branch, the whiteout 
> > > magically
> > > affects the other union as well! Whiteouts are a union-level construct, 
> > > and
> > > therefore storing them at the branch level is wrong.
> > 
> > So you think that just because you mounted the filesystem somewhere else it
> > should look different? This is what sharing is all about. If you share a
> > filesystem you also share the removal of objects.
> 
> The removal happens at the union level, not the branch level. Say you have:
> 
> /a/
> /b/foo
> /c/foo
> 
> And you mount /u1 as a union of {a,b}, and /u2 as union of {a,c}.

Who does this?  I'm assuming that a is the "top" layer.  Aren't union
mounts typically about sharing lower layers and having a separate rw
layer for each union mount?

> $ find /u*
> /u1
> /u1/foo
> /u2
> /u2/foo
> $ rm /u1/foo # this creates whiteout for "foo" in /a
> $ find /u*
> /u1
> /u2
> 
> Is that what you'd expect as a user? I don't think so.

That's exactly what I would expect.

If I were to:
$ echo "this is new" > /u1/foo

I would expect:
$ cat /u2/foo
this is new

So why should rm behave differently?

I haven't really been tuned into union mounts, so maybe I'm missing out
on something basic here.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 11/26] tmpfs white-out support

2007-08-01 Thread Hugh Dickins
On Mon, 30 Jul 2007, Jan Blunck wrote:

> Introduce white-out support to tmpfs.
> 
> Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> ---
>  include/linux/shmem_fs.h |1 
>  mm/shmem.c   |   54 
> +++
>  2 files changed, 55 insertions(+)

I see there's debate about whether this (and its fellows) give the
right semantic to whiteouts; and I've not begun to think about that.

But as a patch to tmpfs for what you're trying to do, it looks just
about fine.  I say "just about" because the reference counting looks
right, but I wouldn't dare say that it _is_ right without testing.

And I'd probably want to add a minor adjustment, so that a mount with
nr_inodes=1000 could still support exactly 1000 inodes, despite your
allocating one for the whiteout (usually never used) at mount time.
But that can follow along later, no problem.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Josef Sipek
On Wed, Aug 01, 2007 at 12:00:42PM +0200, Hans-Peter Jansen wrote:
> Am Dienstag, 31. Juli 2007 19:00 schrieb Jan Blunck:
> > On Tue, Jul 31, Josef Sipek wrote:
> > > On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > > > Introduce white-out support to ext2.
> > >
> > > I think storing whiteouts on the branches is wrong. It creates all sort
> > > of nasty cases when people actually try to use unioning. Imagine a
> > > (no-so unlikely) scenario where you have 2 unions, and they share a
> > > branch. If you create a whiteout in one union on that shared branch,
> > > the whiteout magically affects the other union as well! Whiteouts are a
> > > union-level construct, and therefore storing them at the branch level
> > > is wrong.
> >
> > So you think that just because you mounted the filesystem somewhere else
> > it should look different? This is what sharing is all about. If you share
> > a filesystem you also share the removal of objects.
> 
> No. At least I don't. 
> 
> Usage case: I heavily depend on using union mounts in diskless nfs setups, 
> since it drops the amount of administration of many systems _near_ one. It 
> boils down on installing the distribution of your choice in a directory, 
> union mount it ro, overlayed with a node private one (doing this in initrd 
> on the client for several reasons),

You're not sharing the rw layer so it's a different scenario, and will not
have the problem I'm talking about. See my other post [1] for exact scenario
where storing whiteouts on a branch would cause problems.

> add a little boot and automatic setup 
> machinery and be done. Since all changes are persistant, any system can be 
> set up individually, and still mostly only one tree is needed to keep up to 
> date.. Being in production in an office environment since two years without 
> major hassle (*).

Unionfs is used by many people in this way.

Josef 'Jeff' Sipek.

[1] http://lkml.org/lkml/2007/7/31/365

-- 
Intellectuals solve problems; geniuses prevent them
- Albert Einstein
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Hans-Peter Jansen
Am Dienstag, 31. Juli 2007 19:00 schrieb Jan Blunck:
> On Tue, Jul 31, Josef Sipek wrote:
> > On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > > Introduce white-out support to ext2.
> >
> > I think storing whiteouts on the branches is wrong. It creates all sort
> > of nasty cases when people actually try to use unioning. Imagine a
> > (no-so unlikely) scenario where you have 2 unions, and they share a
> > branch. If you create a whiteout in one union on that shared branch,
> > the whiteout magically affects the other union as well! Whiteouts are a
> > union-level construct, and therefore storing them at the branch level
> > is wrong.
>
> So you think that just because you mounted the filesystem somewhere else
> it should look different? This is what sharing is all about. If you share
> a filesystem you also share the removal of objects.

No. At least I don't. 

Usage case: I heavily depend on using union mounts in diskless nfs setups, 
since it drops the amount of administration of many systems _near_ one. It 
boils down on installing the distribution of your choice in a directory, 
union mount it ro, overlayed with a node private one (doing this in initrd 
on the client for several reasons), add a little boot and automatic setup 
machinery and be done. Since all changes are persistant, any system can be 
set up individually, and still mostly only one tree is needed to keep up to 
date.. Being in production in an office environment since two years without 
major hassle (*).

This setup is likely to be useful for virtualization needs, too, but side 
effects via the base directory from one node to another would render this 
setup void.

Cheers,
  Pete

*) The amount of administration work of any (necessary, unfortunately) 
VMware XP instance running on top of those diskless clients excels that of 
all diskless clients by an order of magnitude. 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/14] FS-Cache: Recruit a couple of page flags for cache management

2007-08-01 Thread David Howells
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Not sure its a good idea to overload page_has_private() with an
> overloadable page-flag. What if some future FS wants to use
> PG_owner_priv_2 for other purposes?

All that it means is that releasepage() and co will get called if a page is to
be released or invalidated that has that bit set.  I think that's something a
future FS could probably live with.

However, I do have to trigger a call to releasepage() and co *somehow*.

> Obviously filesystems cannot use these two page-flags if they want to be
> compatible with FS-cache, but need all filesystems be?

What do you mean?  That's why I went for the PG_owner_priv_* approach rather
than just naming the bits unto FS-Cache directly.

> (also, ouch! - 2 pageflags)

Yeah.  The consequence of having things asynchronous is that you have to find
signalling mechanisms to synchronise around the asynchronicity:-/

Furthermore, it occurs to me that I can't use PG_private or page->private to
store this information because I want to make isofs use caching, and those two
pieces of information are owned by the buffering code.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 8][PATCH 1/1]Add journal checksums

2007-08-01 Thread Girish Shilamkar
On Wed, 2007-07-11 at 17:16 +0530, Girish Shilamkar wrote:

> I will make the changes and send an incremental patch.
> 
Hi,
I have made the changes and attached the incremental patch as per the
review.

This is the actual changelog which was missing in the original patch.

--
The journal checksum feature adds two new flags i.e 
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.
-

Thanks & Regards,
Girish. 

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Girish Shilamkar <[EMAIL PROTECTED]>

Index: linux-2.6.22/Documentation/filesystems/ext4.txt
===
--- linux-2.6.22.orig/Documentation/filesystems/ext4.txt
+++ linux-2.6.22/Documentation/filesystems/ext4.txt
@@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the fo
 extents			ext4 will use extents to address file data.  The
 			file system will no longer be mountable by ext3.
 
+journal_checksum	Enable checksumming of the journal transactions.
+			This will allow the recovery code in e2fsck and the
+			kernel to detect corruption in the kernel.  It is a 
+			compatible change and will be ignored by older kernels.
+
+journal_async_commit	Commit block can be written to disk without waiting 
+			for descriptor blocks. If enabled older kernels cannot
+			mount the device. This will enable 'journal_checksum'
+			internally.
+
 journal=update		Update the ext4 file system's journal to the current
 			format.
 
Index: linux-2.6.22/fs/Kconfig
===
--- linux-2.6.22.orig/fs/Kconfig
+++ linux-2.6.22/fs/Kconfig
@@ -235,6 +235,7 @@ config JBD_DEBUG
 
 config JBD2
 	tristate
+	select CRC32
 	help
 	  This is a generic journaling layer for block devices that support
 	  both 32-bit and 64-bit block numbers.  It is currently used by
Index: linux-2.6.22/fs/jbd2/commit.c
===
--- linux-2.6.22.orig/fs/jbd2/commit.c
+++ linux-2.6.22/fs/jbd2/commit.c
@@ -108,8 +108,9 @@ static int journal_submit_commit_record(
 	__u32 crc32_sum)
 {
 	struct journal_head *descriptor;
+	struct commit_header *tmp;
 	struct buffer_head *bh;
-	int i, ret;
+	int ret;
 	int barrier_done = 0;
 
 	if (is_journal_aborted(journal))
@@ -121,19 +122,16 @@ static int journal_submit_commit_record(
 
 	bh = jh2bh(descriptor);
 
-	for (i = 0; i < bh->b_size; i += 512) {
-		struct commit_header *tmp =
-			(struct commit_header *)(bh->b_data + i);
-		tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
-		tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
-		tmp->h_sequence = cpu_to_be32(commit_transaction->t_tid);
-
-		if (JBD2_HAS_COMPAT_FEATURE(journal,
-	JBD2_FEATURE_COMPAT_CHECKSUM)) {
-			tmp->h_chksum_type 	= JBD2_CRC32_CHKSUM;
-			tmp->h_chksum_size 	= JBD2_CRC32_CHKSUM_SIZE;
-			tmp->h_chksum[0] 	= cpu_to_be32(crc32_sum);
-		}
+	tmp = (struct commit_header *)bh->b_data;
+	tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
+	tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
+	tmp->h_sequence = cpu_to_be32(commit_transaction->t_tid);
+
+	if (JBD2_HAS_COMPAT_FEATURE(journal,
+JBD2_FEATURE_COMPAT_CHECKSUM)) {
+		tmp->h_chksum_type 	= JBD2_CRC32_CHKSUM;
+		tmp->h_chksum_size 	= JBD2_CRC32_CHKSUM_SIZE;
+		tmp->h_chksum[0] 	= cpu_to_be32(crc32_sum);
 	}
 
 	JBUFFER_TRACE(descriptor, "submit commit block");
@@ -185,8 +183,8 @@ static int journal_wait_on_commit_record
 {
 	int ret = 0;
 
-	if (buffer_locked(bh))
-		wait_on_buffer(bh);
+	clear_buffer_dirty(bh);
+	wait_on_buffer(bh);
 
 	if (unlikely(!buffer_uptodate(bh)))
 		ret = -EIO;
Index: linux-2.6.22/fs/jbd2/recovery.c
===
--- linux-2.6.22.orig/fs/jbd2/recovery.c
+++ linux-2.6.22/fs/jbd2/recovery.c
@@ -318,14 +318,14 @@ static inline unsigned long long read_ta
 }
 
 /*
- * cal_chksums calculates the checksums for the blocks described in the
+ * calc_chksums calculates the checksums for the blocks described in the
  * descriptor block.
  */
-static int cal_chksums(journal_t *journal, struct buffer_head *bh,
-		   unsigned long *next_log_block, __u32 *crc32_sum)
+static int calc_chksums(journal_t *journal, struct buffer_head *bh,
+			unsign