Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-21 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Josef Sipek writes:
> On Thu, Jun 21, 2007 at 10:55:45AM +0530, Bharata B Rao wrote:
> ... 
> > Talking about copyup and whiteout at VFS layer, we have already
> > demonstrated what complexity it takes to have these within VFS. Please
> > take a look at the copyup and whiteout patches in our previous
> > releases at:
> > 
> > http://lkml.org/lkml/2007/4/17/150
> > http://lkml.org/lkml/2007/5/14/69
> > 
> > Or may be wait till I clean all those up to work with the new union
> > new stack infrastructure which I have posted here.
> 
> Really, the problem for both, union mounts and unionfs, is that the concept
> of unioning spans the two layers. You have the unification part - which is
> very VFS-level concept, but at the same time, you got whiteouts, copyup,
> (semi-?)persistent inode numbers, and a bunch of other details that just
> don't belong in the VFS at all.
> 
> Josef "Jeff" Sipek.

Yup, which is why I feel that the eventual solution may involve a hybrid
solution: a file system "driver" plus ample VFS support.  The question will
always be how much should go into the VFS and how much should go into the
f/s driver?  Our approach w/ unionfs had been to keep as much of it into the
f/s driver, and slowly offer VFS-level support, so as not to perturb the VFS
too much all at once.  That way we can offer users something that works now,
and internally change the implementation with minimal user-visible changes.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-21 Thread Josef Sipek
On Thu, Jun 21, 2007 at 10:55:45AM +0530, Bharata B Rao wrote:
... 
> Talking about copyup and whiteout at VFS layer, we have already
> demonstrated what complexity it takes to have these within VFS. Please
> take a look at the copyup and whiteout patches in our previous
> releases at:
> 
> http://lkml.org/lkml/2007/4/17/150
> http://lkml.org/lkml/2007/5/14/69
> 
> Or may be wait till I clean all those up to work with the new union
> new stack infrastructure which I have posted here.

Really, the problem for both, union mounts and unionfs, is that the concept
of unioning spans the two layers. You have the unification part - which is
very VFS-level concept, but at the same time, you got whiteouts, copyup,
(semi-?)persistent inode numbers, and a bunch of other details that just
don't belong in the VFS at all.

Josef "Jeff" Sipek.

-- 
Evolution, n.:
  A hypothetical process whereby infinitely improbable events occur with
  alarming frequency, order arises from chaos, and no one is given credit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Bharata B Rao

On 6/20/07, Erez Zadok <[EMAIL PROTECTED]> wrote:

In message <[EMAIL PROTECTED]>, Jan Blunck writes:
> On Tue, 19 Jun 2007 22:59:51 -0700, Arjan van de Ven wrote:
>
> > first of all I'm happy to see that people are still working on unionfs;
> > I'd love to have functionality like this show up in Linux.
>
> This has nothing to do with unionfs. This is about doing a VFS based
> approach to union mounts. Unification is a name-based construct so it
> belongs into VFS and not into a separate file system.

Jan, while I agree with you in principle that unification is a VFS-level
namespace construct, I disagree with you that unioning doesn't belong in a
separate f/s.

As someone whose group developed three generations of the stackable file
system Unionfs (see http://unionfs.filesystems.org/), I can tell you from my
experience and the experience of numerous users, that the devil is the
details -- or the so-called orthogonal issues.  To get a fully working
unioning implementation, one that the many current users of Unionfs could
use, you'll have to deal with many issues and corner cases: cache coherency,
inode persistency (and network f/s exports), copyups, whiteouts and opaque
dirs, how to deal with "odd" file systems which don't support native
whiteouts and such, directory reading (seekdir), and more.  Our third
generation Unionfs, the one with On-Disk Format (ODF), handles all of these.
Rather than reproduce all that discussion here, I'll point people to read
more info here: 


Erez, thanks, will definetely have to look at that to understand how
unionfs is addressing all these corner cases.

Though I don't understand all the issues involved with cache coherency
atm, one of the things you said during unionfs 2.0 release is that it
is now possible to make modifications to the lower layer directly and
they will be visible from the union. Note that since we do unioning at
VFS layer, we don't explicitly address this. Direct
modifications/additions to the lower layer will automatically get
reflected in the union. Anyway before commenting anything more on
this, let me get back and study the coherency issues more closely :)



So, to have a fully usable union mounts implementation, you're going to have
to support a lot of existing features; but if you were to support them all
at the VFS level, you will have bloated the VFS considerably with stuff that
many would argue does not belong in the VFS.


Talking about copyup and whiteout at VFS layer, we have already
demonstrated what complexity it takes to have these within VFS. Please
take a look at the copyup and whiteout patches in our previous
releases at:

http://lkml.org/lkml/2007/4/17/150
http://lkml.org/lkml/2007/5/14/69

Or may be wait till I clean all those up to work with the new union
new stack infrastructure which I have posted here.

Regards,
Bharata.
--
"Men come and go but mountains remain" -- Ruskin Bond.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Jan Blunck writes:
> On Tue, 19 Jun 2007 22:59:51 -0700, Arjan van de Ven wrote:
> 
> > first of all I'm happy to see that people are still working on unionfs;
> > I'd love to have functionality like this show up in Linux.
> 
> This has nothing to do with unionfs. This is about doing a VFS based
> approach to union mounts. Unification is a name-based construct so it
> belongs into VFS and not into a separate file system.

Jan, while I agree with you in principle that unification is a VFS-level
namespace construct, I disagree with you that unioning doesn't belong in a
separate f/s.

As someone whose group developed three generations of the stackable file
system Unionfs (see http://unionfs.filesystems.org/), I can tell you from my
experience and the experience of numerous users, that the devil is the
details -- or the so-called orthogonal issues.  To get a fully working
unioning implementation, one that the many current users of Unionfs could
use, you'll have to deal with many issues and corner cases: cache coherency,
inode persistency (and network f/s exports), copyups, whiteouts and opaque
dirs, how to deal with "odd" file systems which don't support native
whiteouts and such, directory reading (seekdir), and more.  Our third
generation Unionfs, the one with On-Disk Format (ODF), handles all of these.
Rather than reproduce all that discussion here, I'll point people to read
more info here: 

So, to have a fully usable union mounts implementation, you're going to have
to support a lot of existing features; but if you were to support them all
at the VFS level, you will have bloated the VFS considerably with stuff that
many would argue does not belong in the VFS.

Sincerely,
Erez Zadok.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Christoph Hellwig
On Wed, Jun 20, 2007 at 12:43:56PM +, Jan Blunck wrote:
> On Wed, 20 Jun 2007 13:32:23 +0100, Christoph Hellwig wrote:
> 
> > On Wed, Jun 20, 2007 at 07:29:55AM +, Jan Blunck wrote:
> >> Mounting a file system twice is bad in the first place. This should be
> >> done by using bind mounts and bind a mounted file system into a union.
> >> After that the normal locking rules apply (and hopefully work ;).
> > 
> > From the kernel POV mounting a filesystem twice is the same as doing
> > a bind mount.
> 
> Somehow I thought about doing this:
> 
>  mount /dev/dasda1 /mnt/A
>  mount /dev/dasda1 /mnt/B
> 
> ... which doesn't result in a bind mount.

But the kernel internal effect is exactly the same.  One superblock instance,
two vfsmounts referring to it. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Tue, 19 Jun 2007 22:59:51 -0700, Arjan van de Ven wrote:

> user does on FS A: 
> mkdir  /mnt/A/somedir
> touch /mnt/A/somedir/somefile
> 
> and then 2 things happen in parallel
> 1) touch /mnt/B/somefile
> 2) mv /mnt/union/somedir /mnt/union/somefile
> 
> since the underlying FS for 2) is FS A... how will this work out locking
> wise? Will the VS lock the union directory only? Or will this operate
> only on the underlying FS? How is dcache consistency guaranteed for
> scenarios like this?

Ok, with Christophs help I guess I know now what the question is :)

touch /mnt/B/somefile is doing a lookup in "B" for "somefile". Therefore it
locks B->i_mutex for that. When it gets a negative dentry it creates the
file.

mv /mnt/union/somedir /mnt/union/somefile is doing a lookup in "union" for
"somefile". Therefore it first locks the i_mutex of the topmost directory
in the union of "/mnt/union" (which happens to be "B"). When it gets a
negative dentry it than follows the union down to the next layer (with the
topmost directory still locked). Lookup is repeated until a filled dentry
is found or the topmost dentry negative dentry is used as a target for the
move. Thats it.

Did that answer your question?

Cheers,
Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 13:32:23 +0100, Christoph Hellwig wrote:

> On Wed, Jun 20, 2007 at 07:29:55AM +, Jan Blunck wrote:
>> Mounting a file system twice is bad in the first place. This should be
>> done by using bind mounts and bind a mounted file system into a union.
>> After that the normal locking rules apply (and hopefully work ;).
> 
> From the kernel POV mounting a filesystem twice is the same as doing
> a bind mount.

Somehow I thought about doing this:

 mount /dev/dasda1 /mnt/A
 mount /dev/dasda1 /mnt/B

... which doesn't result in a bind mount.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Christoph Hellwig
On Wed, Jun 20, 2007 at 07:29:55AM +, Jan Blunck wrote:
> Mounting a file system twice is bad in the first place. This should be
> done by using bind mounts and bind a mounted file system into a union.
> After that the normal locking rules apply (and hopefully work ;).

>From the kernel POV mounting a filesystem twice is the same as doing
a bind mount.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Bharata B Rao

On 6/20/07, Jan Blunck <[EMAIL PROTECTED]> wrote:

On Wed, 20 Jun 2007 11:21:57 +0530, Bharata B Rao wrote:

Well done. I like your approach much more than the simple chaining of
dentries. When I told you about the idea of maintaining a list of
 objects I always though about one big structure for all
the layers of an union. Smaller objects that only point to the next layer
seem to be better but make the search for the topmost layer impossible.
You should maintain a reference to the topmost struct union_mount though.


Even in our last version I didn't understand clearly why you had
pointers from the bottom layers to the topmost layer. Could you please
explain under what circumstances there needs to be a bottom to top
traversal ?



> +5. Union stack: destroying
> +--
> +In addition to storing the union_mounts in a hash table for quick
> lookups, +they are also stored as a list, headed at vsmount->mnt_union.
> So, all +union_mounts that occur under a vfsmount (starting from the
> mountpoint +followed by the subdir unions) are stored within the
> vfsmount. During +umount (specifically, during the last mntput()), this
> list is traversed +to destroy all union stacks under this vfsmount. +
> +Hence, all union stacks under a vfsmount continue to exist until the
> +vfsmount is unmounted. It may be noted that the union_mount structure
> +holds a reference to the current dentry also. Becasue of this, for
> +subdir unions, both the top and bottom level dentries become pinned
> +till the upper layer filesystem is unmounted. Is this behaviour
> +acceptable ? Would this lead to a lot of pinned dentries over a period
> +of time ? (CHECK) If we don't do this, the top layer dentry might go
> +out of cache, during which time we have no means to release the
> +corresponding union_mount and the union_mount becomes stale. Would it
> +be necessary and worthwhile to add intelligence to prune_dcache() to
> +prune unused union_mounts thereby releasing the dentries ? +
> +As noted above, we hold the refernce to current dentry from union_mount
> +but don't get a reference to the corresponding vfsmount. We depend on
> +the user of the union stack to hold the reference to the topmost
> vfsmount +until he is done with the stack traversal. Not holding a
> reference to the +top vfsmount from within union_mount allows us to free
> all the union_mounts +from last mntput of the top vfsmount. Is this
> approach acceptable ? +
> +NOTE: union_mount structures are part of two lists: the hash list for
> +quick lookups and a linked list to aid the freeing of these structures
> +during unmount.

This must changed. This is the only reason why the dentry chaining
approach was so complex. You need a way to get rid of all unused dentries
in a union.


The second list headed at mnt->mnt_union was added precisely  to get
rid of all the union_mounts under a vfsmount at umount time. So umount
is the time to destroy the union stack.


Besides that, I wonder why you left out the rest of my code? The readdir,
whiteout and copyup parts are orthogonal to the code for maintaining the
union structure itself. I just rewrote most of it myself to use functions
like follow_union_down() etc to get rid of the dentry chaining in the long
run.


The idea was to start simple, get some feedback and concensus and add
features after that. Some of the feedback I got from our last two
posts was that the code was too complex and big to review and we had
so many patches. So this time I have started with the bare minimum so
that it becomes easier for the reviewers. I plan to add copyup and
whiteout only when there is an agreement that this approach of
unioning is acceptable.

And about readdir, I digressed from your approach a bit and made
readdir cache persistant across readdir()/getdents() calls. Also, made
readdir on union mounted directories filesystem independent unlike our
earlier approach. But again this breaks lseek as I have noted, which
needs to be fixed.

Regards,
Bharata.
--
"Men come and go but mountains remain" -- Ruskin Bond.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 11:21:57 +0530, Bharata B Rao wrote:

> +4. Union stack: building and traversal
> +-- +Union stack needs to be built
> from two places: during an explicit union +mount (or mount propagation)
> and during the lookup of a directory that +appears in more than one
> layer of the union. +
> +The link between two layers of union stack is maintained using the
> +union_mount structure:
> +
> +struct union_mount {
> + /* vfsmount and dentry of this layer */
> + struct vfsmount *src_mnt;
> + struct dentry *src_dentry;
> +
> + /* vfsmount and dentry of the next lower layer */
> + struct vfsmount *dst_mnt;
> + struct dentry *dst_dentry;
> +
> + /*
> +  * This list_head hashes this union_mount based on this layer's +   
>  *
> vfsmount and dentry. This is used to get to the next layer of +* the
> stack (dst_mnt, dst_dentry) given the (src_mnt, src_dentry) +  * and is
> used for stack traversal. +*/
> + struct list_head hash;
> +
> + /*
> +  * All union_mounts under a vfsmount(src_mnt) are linked together + 
>  *
> at mnt->mnt_union using this list_head. This is needed to destroy +*
> all the union_mounts when the mnt goes away. + */
> + struct list_head list;
> +};
> +
> +These union mount structures are stored in a hash
> table(union_mount_hashtable) +which uses the same hash as used for
> mount_hashtable since both of them use +(vfsmount, dentry) pairs to
> calculate the hash. +
> +During a new mount (or mount propagation), a new union_mount structure
> is +created. A reference to the mountpoint's vfsmount and dentry is
> taken and +stored in the union_mount (as dst_mnt, dst_dentry). And this
> union_mount +is inserted in the union_mount_hashtable based on the hash
> generated by +the mount root's vfsmount and dentry. +
> +Similar method is employed to create a union stack during first time
> lookup +of a common named directory within a union mount point. But
> here, the top +level directory's vfsmount and dentry are hashed to get
> to the lower level +directory's vfsmount and dentry.
> +
> +The insertion, deletion and lookup of union_mounts in the
> +union_mount_hashtable is protected by vfsmount_lock. While traversing
> the +stack, we hold this spinlock only briefly during lookup time and
> release +it as soon as we get the next union stack member. The top level
> of the +stack holds a reference to the next level (via union_mount
> structure) and +so on. Therefore, as long as we hold a reference to a
> union stack member, +its lower layers can't go away. And since we don't
> do the complete +traversal under any lock, it is possible for the stack
> to change over the +level from where we started traversing. For eg. when
> traversing the stack +downwards, a new filesystem can be mounted on top
> of it. When this happens, +the user who had a reference to the old top
> wouldn't have visibility to +the new top and would continue as if the
> new top didn't exist for him. +I believe this is fine as long as members
> of the stack don't go away from +under us(CHECK). And to be sure of
> this, we need to hold a reference to the +level from where we start the
> traversal and should continue to hold it +till we are done with the
> traversal.

Well done. I like your approach much more than the simple chaining of
dentries. When I told you about the idea of maintaining a list of
 objects I always though about one big structure for all
the layers of an union. Smaller objects that only point to the next layer
seem to be better but make the search for the topmost layer impossible.
You should maintain a reference to the topmost struct union_mount though.

> +5. Union stack: destroying
> +--
> +In addition to storing the union_mounts in a hash table for quick
> lookups, +they are also stored as a list, headed at vsmount->mnt_union.
> So, all +union_mounts that occur under a vfsmount (starting from the
> mountpoint +followed by the subdir unions) are stored within the
> vfsmount. During +umount (specifically, during the last mntput()), this
> list is traversed +to destroy all union stacks under this vfsmount. +
> +Hence, all union stacks under a vfsmount continue to exist until the
> +vfsmount is unmounted. It may be noted that the union_mount structure
> +holds a reference to the current dentry also. Becasue of this, for
> +subdir unions, both the top and bottom level dentries become pinned
> +till the upper layer filesystem is unmounted. Is this behaviour
> +acceptable ? Would this lead to a lot of pinned dentries over a period
> +of time ? (CHECK) If we don't do this, the top layer dentry might go
> +out of cache, during which time we have no means to release the
> +corresponding union_mount and the union_mount becomes stale. Would it
> +be necessary and worthwhile to add intelligence to prune_dcache() to
> +prune unused union_mounts thereby releasing the dentries ? +
> +As n

Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Tue, 19 Jun 2007 22:59:51 -0700, Arjan van de Ven wrote:

> first of all I'm happy to see that people are still working on unionfs;
> I'd love to have functionality like this show up in Linux.

This has nothing to do with unionfs. This is about doing a VFS based
approach to union mounts. Unification is a name-based construct so it
belongs into VFS and not into a separate file system.

> I'll not claim to have any VFS knowledge whatsoever, but I was just
> wondering what happens in the following scenario:
> 
> FS A is mounted twice, in /mnt/A and /mnt/union
> 
> FS B is mounted twice, in /mnt/B and as topmost union mount
> on /mnt/union
> 
> lets for simplicity say both filesystems are entirely empty
> 
> user does on FS A: 
> mkdir  /mnt/A/somedir
> touch /mnt/A/somedir/somefile
> 
> and then 2 things happen in parallel
> 1) touch /mnt/B/somefile
> 2) mv /mnt/union/somedir /mnt/union/somefile
> 
> since the underlying FS for 2) is FS A... how will this work out locking
> wise? Will the VS lock the union directory only? Or will this operate
> only on the underlying FS? How is dcache consistency guaranteed for
> scenarios like this?

Mounting a file system twice is bad in the first place. This should be
done by using bind mounts and bind a mounted file system into a union.
After that the normal locking rules apply (and hopefully work ;).

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-19 Thread Arjan van de Ven
On Wed, 2007-06-20 at 11:21 +0530, Bharata B Rao wrote:
> From: Bharata B Rao <[EMAIL PROTECTED]>
> Subject: Union mount documentation.
Hi,

first of all I'm happy to see that people are still working on unionfs;
I'd love to have functionality like this show up in Linux.

I'll not claim to have any VFS knowledge whatsoever, but I was just
wondering what happens in the following scenario:

FS A is mounted twice, in /mnt/A and /mnt/union

FS B is mounted twice, in /mnt/B and as topmost union mount
on /mnt/union

lets for simplicity say both filesystems are entirely empty

user does on FS A: 
mkdir  /mnt/A/somedir
touch /mnt/A/somedir/somefile

and then 2 things happen in parallel
1) touch /mnt/B/somefile
2) mv /mnt/union/somedir /mnt/union/somefile

since the underlying FS for 2) is FS A... how will this work out locking
wise? Will the VS lock the union directory only? Or will this operate
only on the underlying FS? How is dcache consistency guaranteed for
scenarios like this?


Greetings,
   Arjan van de Ven

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/