Re: [PATCH] ipv6: no addrconf for slave devices

2015-10-16 Thread Jan Blunck
On Fri, Oct 16, 2015 at 6:14 PM, David Ahern  wrote:
> On 10/16/15 10:12 AM, Jan Blunck wrote:
>>
>> On Fri, Oct 16, 2015 at 6:02 PM, David Ahern 
>> wrote:
>>>
>>> On 10/16/15 9:57 AM, Jan Blunck wrote:
>>>>
>>>>
>>>>
>>>> I don't think that enslaved ports should get network layer addresses.
>>>> This is one example with a team device:
>>>
>>>
>>>
>>> for VRF devices we do want the enslaved links to have link local
>>> addresses.
>>>
>>
>> That is interesting. As far I can see you are setting IFF_SLAVE in
>> do_vrf_add_slave() and therefore already stop IPv6 addrconf.
>>
>
> Check net-next. That had to be removed to get IPv6 working.
>

Thanks for the pointer.

So it would be better to differentiate between L2 and L3 ports and
only start addrconf on later ones? I don't think there is a flag that
allows for that though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipv6: no addrconf for slave devices

2015-10-16 Thread Jan Blunck
On Fri, Oct 16, 2015 at 6:02 PM, David Ahern  wrote:
> On 10/16/15 9:57 AM, Jan Blunck wrote:
>>
>>
>> I don't think that enslaved ports should get network layer addresses.
>> This is one example with a team device:
>
>
> for VRF devices we do want the enslaved links to have link local addresses.
>

That is interesting. As far I can see you are setting IFF_SLAVE in
do_vrf_add_slave() and therefore already stop IPv6 addrconf.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipv6: no addrconf for slave devices

2015-10-16 Thread Jan Blunck
On Fri, Oct 16, 2015 at 1:54 PM, Jiri Pirko  wrote:
> Fri, Oct 16, 2015 at 12:21:51PM CEST, jblu...@infradead.org wrote:
>>If a device without the IFF_SLAVE flag set (e.g. team, bridge, openvswitch
>>vport, batman) is enslaved and IPv6 is active then addrconf will be
>>initiated and a link-local address is added to the slave interface.
>>
>>This patch alters the behavior so that addrconf will only run on the master
>>device itself. This is achieved by checking the device tree instead of
>>checking for a specific flag.
>>
>>Signed-off-by: Jan Blunck 
>>---
>> net/ipv6/addrconf.c | 6 +-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>>diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>index 9001133..26d61f0 100644
>>--- a/net/ipv6/addrconf.c
>>+++ b/net/ipv6/addrconf.c
>>@@ -3141,8 +3141,12 @@ static int addrconf_notify(struct notifier_block 
>>*this, unsigned long event,
>>
>>   case NETDEV_UP:
>>   case NETDEV_CHANGE:
>>-  if (dev->flags & IFF_SLAVE)
>>+  /* If a master is set stop IPv6 on this interface */
>>+  if (netdev_master_upper_dev_get(dev)) {
>>+  if (idev)
>>+  addrconf_ifdown(dev, 1);
>
> This breaks teamd if it's using NS/NA ping link-watch on link-local addresses.
>
> What is the reason for this patch? Does it recolve any issue you are
> having?

I don't think that enslaved ports should get network layer addresses.
This is one example with a team device:

3: eth1:  mtu 1500 qdisc pfifo_fast
master team0 state UP group default qlen 1000
link/ether 52:54:00:ef:5f:a1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:feef:5fa1/64 scope link
   valid_lft forever preferred_lft forever
4: eth2:  mtu 1500 qdisc pfifo_fast
master team0 state UP group default qlen 1000
link/ether 52:54:00:ef:5f:a1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:feef:5fa1/64 scope link
   valid_lft forever preferred_lft forever
6: team0:  mtu 1500 qdisc noqueue
state UP group default
link/ether 52:54:00:ef:5f:a1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:feef:5fa1/64 scope link
   valid_lft forever preferred_lft forever

All link-layer addresses are identical due to the fact that the link
aggregation group is syncing the MAC addresses. Having the IPv6
link-local address set in this case is pretty useless. The partner
device is unable to differentiate if the port is addressed or the team
device. Even if the addrconf started before the device was enslaved
(and therefore at least one port got a different IPv6 link-local
address than the link aggregation group) the partner device usually
learns the address for the aggregated link.

For LACP the standard states that one port should only bind to at most
one aggregator. The additional IPv6 link-local address allows the port
to be used by another stack besides the aggregator. Besides that, the
distribution of any user traffic (e.g. ICMPv6) is forbidden in LACP
before the partner aggregator signals being ready. So having the
link-local traffic on the wire is clearly a violation of that.

In other cases like openvswitch the link-local address is added to the
system but it is not usable since the bridge port stays in state
UNKNOWN.

Regards,
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ipv6: no addrconf for slave devices

2015-10-16 Thread Jan Blunck
If a device without the IFF_SLAVE flag set (e.g. team, bridge, openvswitch
vport, batman) is enslaved and IPv6 is active then addrconf will be
initiated and a link-local address is added to the slave interface.

This patch alters the behavior so that addrconf will only run on the master
device itself. This is achieved by checking the device tree instead of
checking for a specific flag.

Signed-off-by: Jan Blunck 
---
 net/ipv6/addrconf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 9001133..26d61f0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3141,8 +3141,12 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
 
case NETDEV_UP:
case NETDEV_CHANGE:
-   if (dev->flags & IFF_SLAVE)
+   /* If a master is set stop IPv6 on this interface */
+   if (netdev_master_upper_dev_get(dev)) {
+   if (idev)
+   addrconf_ifdown(dev, 1);
break;
+   }
 
if (idev && idev->cnf.disable_ipv6)
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] team: set IFF_SLAVE on team ports

2015-07-09 Thread Jan Blunck
On Thu, Jul 9, 2015 at 12:07 PM, Jiri Pirko  wrote:
> Thu, Jul 09, 2015 at 11:58:34AM CEST, jblu...@infradead.org wrote:
>>The code in net/ipv6/addrconf.c:addrconf_notify() tests for IFF_SLAVE to
>>decide if it should start the address configuration. Since team ports
>>shouldn't get link-local addresses assigned lets set IFF_SLAVE when linking
>>a port to the team master.
>
> I don't want to use IFF_SLAVE in team. Other master-slave devices are
> not using that as well, for example bridge, ovs, etc.
>

Maybe they need to get fixed too. I've used that flag because it is
documented as
a "slave of a load balancer" which describes what a team port is.


> I think that this should be fixed in addrconf_notify. It should lookup
> if there is a master on top and bail out in that case.

There are other virtual interfaces that have a master assigned and want to
participate in IPv6 address configuration.

Unless we want to have a cascade of conditionals testing the priv_flags in
addrconf_notify() this is asking for a new net_device_flags flag.
Maybe something
generic like IFF_L2PORT ?

Thanks,
Jan

[ Jiri, sorry for getting that mail twice ]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] team: set IFF_SLAVE on team ports

2015-07-09 Thread Jan Blunck
The code in net/ipv6/addrconf.c:addrconf_notify() tests for IFF_SLAVE to
decide if it should start the address configuration. Since team ports
shouldn't get link-local addresses assigned lets set IFF_SLAVE when linking
a port to the team master.

Signed-off-by: Jan Blunck 
---
 drivers/net/team/team.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index daa054b..4cd02c8 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1086,6 +1086,7 @@ static int team_upper_dev_link(struct net_device *dev,
err = netdev_master_upper_dev_link(port_dev, dev);
if (err)
return err;
+   port_dev->flags |= IFF_SLAVE;
port_dev->priv_flags |= IFF_TEAM_PORT;
return 0;
 }
@@ -1094,6 +1095,7 @@ static void team_upper_dev_unlink(struct net_device *dev,
  struct net_device *port_dev)
 {
netdev_upper_dev_unlink(port_dev, dev);
+   port_dev->flags &= ~IFF_SLAVE;
port_dev->priv_flags &= ~IFF_TEAM_PORT;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mmotm] fs/sysfs/file.c d_path fix

2008-02-18 Thread Jan Blunck
On Sun, Feb 17, Christoph Hellwig wrote:

> On Sat, Feb 16, 2008 at 02:12:05PM -0500, Erez Zadok wrote:
> > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > index 02223e2..a57b024 100644
> > --- a/fs/sysfs/file.c
> > +++ b/fs/sysfs/file.c
> > @@ -329,9 +329,11 @@ static int sysfs_open_file(struct inode *inode, struct 
> > file *file)
> > struct sysfs_ops *ops;
> > int error = -EACCES;
> > char *p;
> > +   struct path sysfs_path;
> >  
> > -   p = d_path(file->f_dentry, sysfs_mount, last_sysfs_file,
> > -  sizeof(last_sysfs_file));
> > +   sysfs_path.dentry = file->f_dentry;
> > +   sysfs_path.mnt = sysfs_mount;
> > +   p = d_path(&sysfs_path, last_sysfs_file, sizeof(last_sysfs_file));
> 
> A d_path(file->f_path, ..); should do it, but I'd really like to know
> what sysfs crowd was smoking when adding a d_path in ->open.  Guys,
> please explain what's going on here.
> 

This is from gregkh-driver-sysfs-crash-debugging.patch which is only in -mm I
guess.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support

2007-12-06 Thread Jan Blunck
On Wed, Dec 05, Dave Hansen wrote:

> I think the key here is what kind of consistency we're trying to
> provide.  If a directory is being changed underneath a reader, what
> kinds of guarantees do they get about the contents of their directory
> read?  When do those guarantees start?  Are there any at open() time?

But we still want to be compliant to what POSIX defines. The problem isn't the
consistency of the readdir result but the seekdir/telldir interface. IMHO that
interface is totally broken: you need to be able to find every offset given by
telldir since the last open. The problem is that seekdir isn't able to return
errors. Otherwise you could just forbid seeking on union directories.

> Rather than give each _dirent_ an offset, could we give each sub-mount
> an offset?  Let's say we have three members comprising a union mount
> directory.  The first has 100 dirents, the second 200, and the third
> 10,000.  When the first readdir is done, we populate the table like
> this:
> 
>   mount_offset[0] = 0;
>   mount_offset[1] = 100;
>   mount_offset[2] = 300;
> 
> If someone seeks back to 150, then we subtrack the mount[1]'s offset
> (100), and realize that we want the 50th dirent from mount[1].

Yes, that is a nice idea and it is exactly what I have implemented in my patch
series. But you forgot one thing: directories are not flat files. The dentry
offset in a directory is a random cookie. Therefore it is not possible to have
a linear mapping without allocating memory.

> I don't know whether we're bound to this:
> 
> http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html
> 
> "If a file is removed from or added to the directory after the
> most recent call to opendir() or rewinddir(), whether a
> subsequent call to readdir() returns an entry for that file is
> unspecified."
> 
> But that would seem to tell me that once you populate a table such as
> the one I've described and create it at open(dir) time, you don't
> actually ever need to update it.

Yes, I'm using such a patch on our S390 buildservers to work around some
readdir/seek/rm problem with old glibc versions. It seems to work but on the
other hand this are really huge systems and I haven't run out of memory while
doing a readdir yet ;)

The proper way to implement this would be to cache the offsets on a per inode
base. Otherwise the user could easily DoS this by opening a number of
directories and never close them.

Regards,
Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1

2007-11-16 Thread Jan Blunck
On Thu, Nov 15, Torsten Kaiser wrote:

> While the next bisect proved that these patches are innocent, I'm
> still blaming you for my problems. ;)

 :(

> The only thing that looks suspicious to me in that patch is the
> following change in nfs4_atomic_open(), nfs4_open_revalidate() and
> nfs4_proc_create()
> 
> - struct path path = {
> - .mnt = nd->mnt,
> - .dentry = dentry,
> - };
> + struct path path = nd->path;
> 
> This changes the path.dentry from the explizit parameter 'dentry' to
> the embedded dentry from the parameter 'nd'.

Ouch! You are totally right. This really looks wrong and I even don't remember
how that went into the patch. Can you test if the following patch fixes the
problem? (BTW: thanks for the detailed analysis)

Thanks,
Jan

---

Subject: Embed a struct path into struct nameidata breakes NFSv4

I accidently break NFSv4. Here is the original report by Torsten Kaiser:

 > > Breaks nfsv4 in a rather funny way:
 > >
 > > treogen ~ # cd /usr/portage/x
 > > treogen x # touch bla
 > > touch: cannot touch `bla': File exists
 > > treogen x # mkdir bla
 > > treogen x # touch bla/bla
 > > touch: cannot touch `bla/bla': File exists
 > > treogen x # ls -lad *
 > > drwxr-xr-x 2 root root 6 Nov 14 20:03 bla
 > > treogen x # ls -la *
 > > total 0
 > > drwxr-xr-x 2 root root  6 Nov 14 20:03 .
 > > drwxr-xr-x 3 root root 16 Nov 14 20:03 ..
 > > treogen x #
  
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/nfs/nfs4proc.c |   15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

Index: b/fs/nfs/nfs4proc.c
===
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1372,7 +1372,10 @@ out_close:
 struct dentry *
 nfs4_atomic_open(struct inode *dir, struct dentry *dentry, struct nameidata 
*nd)
 {
-   struct path path = nd->path;
+   struct path path = {
+   .mnt = nd->path.mnt,
+   .dentry = dentry,
+   };
struct iattr attr;
struct rpc_cred *cred;
struct nfs4_state *state;
@@ -1411,7 +1414,10 @@ nfs4_atomic_open(struct inode *dir, stru
 int
 nfs4_open_revalidate(struct inode *dir, struct dentry *dentry, int openflags, 
struct nameidata *nd)
 {
-   struct path path = nd->path;
+   struct path path = {
+   .mnt = nd->path.mnt,
+   .dentry = dentry,
+   };
struct rpc_cred *cred;
struct nfs4_state *state;
 
@@ -1860,7 +1866,10 @@ static int
 nfs4_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr,
  int flags, struct nameidata *nd)
 {
-   struct path path = nd->path;
+   struct path path = {
+   .mnt = nd->path.mnt,
+   .dentry = dentry,
+   };
struct nfs4_state *state;
struct rpc_cred *cred;
int status = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1

2007-11-15 Thread Jan Blunck
On Wed, Nov 14, Torsten Kaiser wrote:

> > > So I can create new directories, but not new files. Reading files works 
> > > normal.
> > >>
> > > The client is 2.6.24-rc2-mm1, the server 2.6.22-gentoo-r9.
> 
> I added Jan Blunck to the recipents, as he wrote
> use-struct-path-in-struct-svc_expkey and
> use-struct-path-in-struct-svc_export

These patches only change the server code. Hard to imagine how this could
break the client. The other patches are pure cleanups only.

Regards,
Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in oprofile/dump_trace/X86 with 2.6.24-rcX

2007-11-09 Thread Jan Blunck
On Thu, Nov 08, Robert Fitzsimons wrote:

> A couple of days ago I tried to use oprofile with a recent build of
> 2.6.24-rc1, this resulted in a oops 'BUG: unable to handle kernel paging
> request at virtual address'.

Sorry,

this only happens 32bit. Somehow I broke this when I introduced
stack_pointer(). Here is a patch that fixes the problem.

Thanks,
Jan

--
Subject: oprofile: Fix oops on x86 32-bit

x86 32-bit isn't saving the stack pointer to pt_regs->esp on when an
interrupt occures.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/asm-x86/ptrace.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/include/asm-x86/ptrace.h
===
--- a/include/asm-x86/ptrace.h
+++ b/include/asm-x86/ptrace.h
@@ -60,7 +60,7 @@ static inline int v8086_mode(struct pt_r
 
 #define instruction_pointer(regs) ((regs)->eip)
 #define frame_pointer(regs) ((regs)->ebp)
-#define stack_pointer(regs) ((regs)->esp)
+#define stack_pointer(regs) ((unsigned long)(regs))
 #define regs_return_value(regs) ((regs)->eax)
 
 extern unsigned long profile_pc(struct pt_regs *regs);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] d_path: Make proc_get_link() use a struct path argument

2007-11-05 Thread Jan Blunck
proc_get_link() is always called with a dentry and a vfsmount from a struct
path. Make proc_get_link() take it directly as an argument.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/proc/base.c  |   60 
 fs/proc/internal.h  |2 -
 fs/proc/task_mmu.c  |6 ++--
 fs/proc/task_nommu.c|6 ++--
 include/linux/proc_fs.h |2 -
 5 files changed, 34 insertions(+), 42 deletions(-)

Index: b/fs/proc/base.c
===
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -153,7 +153,7 @@ static int get_nr_threads(struct task_st
return count;
 }
 
-static int proc_cwd_link(struct inode *inode, struct dentry **dentry, struct 
vfsmount **mnt)
+static int proc_cwd_link(struct inode *inode, struct path *path)
 {
struct task_struct *task = get_proc_task(inode);
struct fs_struct *fs = NULL;
@@ -165,8 +165,8 @@ static int proc_cwd_link(struct inode *i
}
if (fs) {
read_lock(&fs->lock);
-   *mnt = mntget(fs->pwd.mnt);
-   *dentry = dget(fs->pwd.dentry);
+   *path = fs->pwd;
+   path_get(&fs->pwd);
read_unlock(&fs->lock);
result = 0;
put_fs_struct(fs);
@@ -174,7 +174,7 @@ static int proc_cwd_link(struct inode *i
return result;
 }
 
-static int proc_root_link(struct inode *inode, struct dentry **dentry, struct 
vfsmount **mnt)
+static int proc_root_link(struct inode *inode, struct path *path)
 {
struct task_struct *task = get_proc_task(inode);
struct fs_struct *fs = NULL;
@@ -186,8 +186,8 @@ static int proc_root_link(struct inode *
}
if (fs) {
read_lock(&fs->lock);
-   *mnt = mntget(fs->root.mnt);
-   *dentry = dget(fs->root.dentry);
+   *path = fs->root;
+   path_get(&fs->root);
read_unlock(&fs->lock);
result = 0;
put_fs_struct(fs);
@@ -1039,34 +1039,30 @@ static void *proc_pid_follow_link(struct
if (!proc_fd_access_allowed(inode))
goto out;
 
-   error = PROC_I(inode)->op.proc_get_link(inode, &nd->path.dentry,
-   &nd->path.mnt);
+   error = PROC_I(inode)->op.proc_get_link(inode, &nd->path);
nd->last_type = LAST_BIND;
 out:
return ERR_PTR(error);
 }
 
-static int do_proc_readlink(struct dentry *dentry, struct vfsmount *mnt,
-   char __user *buffer, int buflen)
+static int do_proc_readlink(struct path *path, char __user *buffer, int buflen)
 {
-   struct inode * inode;
char *tmp = (char*)__get_free_page(GFP_TEMPORARY);
-   char *path;
+   char *pathname;
int len;
 
if (!tmp)
return -ENOMEM;
 
-   inode = dentry->d_inode;
-   path = d_path(dentry, mnt, tmp, PAGE_SIZE);
-   len = PTR_ERR(path);
-   if (IS_ERR(path))
+   pathname = d_path(path->dentry, path->mnt, tmp, PAGE_SIZE);
+   len = PTR_ERR(pathname);
+   if (IS_ERR(pathname))
goto out;
-   len = tmp + PAGE_SIZE - 1 - path;
+   len = tmp + PAGE_SIZE - 1 - pathname;
 
if (len > buflen)
len = buflen;
-   if (copy_to_user(buffer, path, len))
+   if (copy_to_user(buffer, pathname, len))
len = -EFAULT;
  out:
free_page((unsigned long)tmp);
@@ -1077,20 +1073,18 @@ static int proc_pid_readlink(struct dent
 {
int error = -EACCES;
struct inode *inode = dentry->d_inode;
-   struct dentry *de;
-   struct vfsmount *mnt = NULL;
+   struct path path;
 
/* Are we allowed to snoop on the tasks file descriptors? */
if (!proc_fd_access_allowed(inode))
goto out;
 
-   error = PROC_I(inode)->op.proc_get_link(inode, &de, &mnt);
+   error = PROC_I(inode)->op.proc_get_link(inode, &path);
if (error)
goto out;
 
-   error = do_proc_readlink(de, mnt, buffer, buflen);
-   dput(de);
-   mntput(mnt);
+   error = do_proc_readlink(&path, buffer, buflen);
+   path_put(&path);
 out:
return error;
 }
@@ -1317,8 +1311,7 @@ out:
 
 #define PROC_FDINFO_MAX 64
 
-static int proc_fd_info(struct inode *inode, struct dentry **dentry,
-   struct vfsmount **mnt, char *info)
+static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 {
struct task_struct *task = get_proc_task(inode);
struct files_struct *files = NULL;
@@ -1337,10 +1330,10 @@ static int proc_fd_info(struct inode *in
spin_lock(&files->file_lock);
file 

[PATCH 6/9] Use struct path in struct svc_export

2007-11-05 Thread Jan Blunck
I'm embedding struct path into struct svc_export.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: J. Bruce Fields <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/nfsd/export.c|   68 +---
 fs/nfsd/nfs3proc.c  |2 -
 fs/nfsd/nfs3xdr.c   |4 +-
 fs/nfsd/nfs4proc.c  |4 +-
 fs/nfsd/nfs4xdr.c   |   12 +++
 fs/nfsd/nfsfh.c |   26 
 fs/nfsd/nfsproc.c   |6 +--
 fs/nfsd/nfsxdr.c|2 -
 fs/nfsd/vfs.c   |   22 +++---
 include/linux/nfsd/export.h |5 +--
 10 files changed, 74 insertions(+), 77 deletions(-)

Index: b/fs/nfsd/export.c
===
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -332,10 +332,9 @@ static void nfsd4_fslocs_free(struct nfs
 static void svc_export_put(struct kref *ref)
 {
struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-   dput(exp->ex_dentry);
-   mntput(exp->ex_mnt);
+   path_put(&exp->ex_path);
auth_domain_put(exp->ex_client);
-   kfree(exp->ex_path);
+   kfree(exp->ex_pathname);
nfsd4_fslocs_free(&exp->ex_fslocs);
kfree(exp);
 }
@@ -349,7 +348,7 @@ static void svc_export_request(struct ca
char *pth;
 
qword_add(bpp, blen, exp->ex_client->name);
-   pth = d_path(exp->ex_dentry, exp->ex_mnt, *bpp, *blen);
+   pth = d_path(exp->ex_path.dentry, exp->ex_path.mnt, *bpp, *blen);
if (IS_ERR(pth)) {
/* is this correct? */
(*bpp)[0] = '\n';
@@ -508,7 +507,7 @@ static int svc_export_parse(struct cache
int an_int;
 
nd.path.dentry = NULL;
-   exp.ex_path = NULL;
+   exp.ex_pathname = NULL;
 
/* fs locations */
exp.ex_fslocs.locations = NULL;
@@ -547,11 +546,11 @@ static int svc_export_parse(struct cache
 
exp.h.flags = 0;
exp.ex_client = dom;
-   exp.ex_mnt = nd.path.mnt;
-   exp.ex_dentry = nd.path.dentry;
-   exp.ex_path = kstrdup(buf, GFP_KERNEL);
+   exp.ex_path.mnt = nd.path.mnt;
+   exp.ex_path.dentry = nd.path.dentry;
+   exp.ex_pathname = kstrdup(buf, GFP_KERNEL);
err = -ENOMEM;
-   if (!exp.ex_path)
+   if (!exp.ex_pathname)
goto out;
 
/* expiry */
@@ -628,7 +627,7 @@ static int svc_export_parse(struct cache
  out:
nfsd4_fslocs_free(&exp.ex_fslocs);
kfree(exp.ex_uuid);
-   kfree(exp.ex_path);
+   kfree(exp.ex_pathname);
if (nd.path.dentry)
path_put(&nd.path);
  out_no_path:
@@ -653,7 +652,7 @@ static int svc_export_show(struct seq_fi
return 0;
}
exp = container_of(h, struct svc_export, h);
-   seq_path(m, exp->ex_mnt, exp->ex_dentry, " \t\n\\");
+   seq_path(m, exp->ex_path.mnt, exp->ex_path.dentry, " \t\n\\");
seq_putc(m, '\t');
seq_escape(m, exp->ex_client->name, " \t\n\\");
seq_putc(m, '(');
@@ -680,8 +679,8 @@ static int svc_export_match(struct cache
struct svc_export *orig = container_of(a, struct svc_export, h);
struct svc_export *new = container_of(b, struct svc_export, h);
return orig->ex_client == new->ex_client &&
-   orig->ex_dentry == new->ex_dentry &&
-   orig->ex_mnt == new->ex_mnt;
+   orig->ex_path.dentry == new->ex_path.dentry &&
+   orig->ex_path.mnt == new->ex_path.mnt;
 }
 
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
@@ -691,9 +690,9 @@ static void svc_export_init(struct cache
 
kref_get(&item->ex_client->ref);
new->ex_client = item->ex_client;
-   new->ex_dentry = dget(item->ex_dentry);
-   new->ex_mnt = mntget(item->ex_mnt);
-   new->ex_path = NULL;
+   new->ex_path.dentry = dget(item->ex_path.dentry);
+   new->ex_path.mnt = mntget(item->ex_path.mnt);
+   new->ex_pathname = NULL;
new->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = 0;
@@ -711,8 +710,8 @@ static void export_update(struct cache_h
new->ex_fsid = item->ex_fsid;
new->ex_uuid = item->ex_uuid;
item->ex_uuid = NULL;
-   new->ex_path = item->ex_path;
-   item->ex_path = NULL;
+   new->ex_pathname = item->ex_pathname;
+   item->ex_pathname = NULL;
new->ex_fslocs.locations = item->ex_fslocs.locations;
item->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_count = item->ex_fslocs.locations_count

[PATCH 2/9] d_path: kerneldoc cleanup

2007-11-05 Thread Jan Blunck
Move and update d_path() kernel API documentation.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/dcache.c |   35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1762,22 +1762,6 @@ shouldnt_be_hashed:
goto shouldnt_be_hashed;
 }
 
-/**
- * d_path - return the path of a dentry
- * @dentry: dentry to report
- * @vfsmnt: vfsmnt to which the dentry belongs
- * @root: root dentry
- * @rootmnt: vfsmnt to which the root dentry belongs
- * @buffer: buffer to return value in
- * @buflen: buffer length
- *
- * Convert a dentry into an ASCII path name. If the entry has been deleted
- * the string " (deleted)" is appended. Note that this is ambiguous.
- *
- * Returns the buffer or an error code if the path was too long.
- *
- * "buflen" should be positive. Caller holds the dcache_lock.
- */
 static char * __d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
   struct path *root, char *buffer, int buflen)
 {
@@ -1845,9 +1829,22 @@ Elong:
return ERR_PTR(-ENAMETOOLONG);
 }
 
-/* write full pathname into buffer and return start of pathname */
-char * d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
-   char *buf, int buflen)
+/**
+ * d_path - return the path of a dentry
+ * @dentry: dentry to report
+ * @vfsmnt: vfsmnt to which the dentry belongs
+ * @buf: buffer to return value in
+ * @buflen: buffer length
+ *
+ * Convert a dentry into an ASCII path name. If the entry has been deleted
+ * the string " (deleted)" is appended. Note that this is ambiguous.
+ *
+ * Returns the buffer or an error code if the path was too long.
+ *
+ * "buflen" should be positive. Caller holds the dcache_lock.
+ */
+char *d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+char *buf, int buflen)
 {
char *res;
struct path root;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] d_path: Make seq_path() use a struct path argument

2007-11-05 Thread Jan Blunck
seq_path() is always called with a dentry and a vfsmount from a struct
path. Make seq_path() take it directly as an argument.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 drivers/md/md.c  |3 +--
 fs/namespace.c   |6 --
 fs/nfsd/export.c |4 ++--
 fs/proc/nommu.c  |2 +-
 fs/proc/task_mmu.c   |2 +-
 fs/seq_file.c|7 +++
 include/linux/seq_file.h |5 ++---
 mm/mempolicy.c   |2 +-
 mm/swapfile.c|2 +-
 9 files changed, 16 insertions(+), 17 deletions(-)

Index: b/drivers/md/md.c
===
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5011,8 +5011,7 @@ static int md_seq_show(struct seq_file *
chunk_kb ? "KB" : "B");
if (bitmap->file) {
seq_printf(seq, ", file: ");
-   seq_path(seq, bitmap->file->f_path.mnt,
-bitmap->file->f_path.dentry," \t\n");
+   seq_path(seq, &bitmap->file->f_path, " \t\n");
}
 
seq_printf(seq, "\n");
Index: b/fs/namespace.c
===
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -684,10 +684,11 @@ static int show_vfsmnt(struct seq_file *
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
+   struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
 
mangle(m, mnt->mnt_devname ? mnt->mnt_devname : "none");
seq_putc(m, ' ');
-   seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
+   seq_path(m, &mnt_path, " \t\n\\");
seq_putc(m, ' ');
mangle(m, mnt->mnt_sb->s_type->name);
if (mnt->mnt_sb->s_subtype && mnt->mnt_sb->s_subtype[0]) {
@@ -721,6 +722,7 @@ struct seq_operations mounts_op = {
 static int show_vfsstat(struct seq_file *m, void *v)
 {
struct vfsmount *mnt = list_entry(v, struct vfsmount, mnt_list);
+   struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
int err = 0;
 
/* device */
@@ -732,7 +734,7 @@ static int show_vfsstat(struct seq_file 
 
/* mount point */
seq_puts(m, " mounted on ");
-   seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
+   seq_path(m, &mnt_path, " \t\n\\");
seq_putc(m, ' ');
 
/* file system type */
Index: b/fs/nfsd/export.c
===
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -203,7 +203,7 @@ static int expkey_show(struct seq_file *
if (test_bit(CACHE_VALID, &h->flags) && 
!test_bit(CACHE_NEGATIVE, &h->flags)) {
seq_printf(m, " ");
-   seq_path(m, ek->ek_path.mnt, ek->ek_path.dentry, "\\ \t\n");
+   seq_path(m, &ek->ek_path, "\\ \t\n");
}
seq_printf(m, "\n");
return 0;
@@ -649,7 +649,7 @@ static int svc_export_show(struct seq_fi
return 0;
}
exp = container_of(h, struct svc_export, h);
-   seq_path(m, exp->ex_path.mnt, exp->ex_path.dentry, " \t\n\\");
+   seq_path(m, &exp->ex_path, " \t\n\\");
seq_putc(m, '\t');
seq_escape(m, exp->ex_client->name, " \t\n\\");
seq_putc(m, '(');
Index: b/fs/proc/nommu.c
===
--- a/fs/proc/nommu.c
+++ b/fs/proc/nommu.c
@@ -67,7 +67,7 @@ int nommu_vma_show(struct seq_file *m, s
if (len < 1)
len = 1;
seq_printf(m, "%*c", len, ' ');
-   seq_path(m, file->f_path.mnt, file->f_path.dentry, "");
+   seq_path(m, &file->f_path, "");
}
 
seq_putc(m, '\n');
Index: b/fs/proc/task_mmu.c
===
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -268,7 +268,7 @@ static int show_map(struct seq_file *m, 
 */
if (file) {
pad_len_spaces(m, len);
-   seq_path(m, file->f_path.mnt, file->f_path.dentry, "\n");
+   seq_path(m, &file->f_path, "\n");
} else {
const char *name = arch_vma_name(vma);
if (!name) {
Index: b/fs/seq_file.c
===
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -349,

[PATCH 0/9] struct path related cleanups of d_path() code (V2)

2007-11-05 Thread Jan Blunck

Andrew,
please apply this series to -mm. I just added the seq_file patch, another nfsd
cleanup and changed the patch order.

This patch series changes d_path() to take a struct path argument. The
existing users are changed to give struct path more deeply into the call
chain. In some cases I replaced existing  pairs and embed a
struct path instead.

Thanks,
Jan


>>one-less-parameter-to-__d_path.patch<<
One less parameter to __d_path

>>d_path-kerneldoc_cleanup.diff<<
d_path: kerneldoc cleanup

>>d_path-Use_struct_path_in_struct_avc_audit_data.diff<<
d_path: Use struct path in struct avc_audit_data

>>d_path-Make_proc_get_link_use_a_struct_path_argument.diff<<
d_path: Make proc_get_link() use a struct path argument

>>d_path_Make_get_dcookie_use_a_struct_path_argument.diff<<
d_path: Make get_dcookie() use a struct path argument

>>nfsd-svc_export_use_struct_path.diff<<
Use struct path in struct svc_export

>>nfsd-svc_expkey_use_struct_path.diff<<
Use struct path in struct svc_expkey

>>d_path-Make_seq_path_use_a_struct_path_argument.diff<<
d_path: Make seq_path() use a struct path argument

>>d_path-use_struct_path.diff<<
d_path: Make d_path() use a struct path


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/9] d_path: Use struct path in struct avc_audit_data

2007-11-05 Thread Jan Blunck
audit_log_d_path() is a d_path() wrapper that is used by the audit code. To
use a struct path in audit_log_d_path() I need to embed it into struct
avc_audit_data.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 include/linux/audit.h  |5 ++---
 kernel/audit.c |   12 ++--
 kernel/auditsc.c   |   28 +++-
 security/selinux/avc.c |   13 -
 security/selinux/hooks.c   |   28 
 security/selinux/include/avc.h |6 ++
 6 files changed, 41 insertions(+), 51 deletions(-)

Index: b/include/linux/audit.h
===
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -527,8 +527,7 @@ extern const char * audit_log_n_untr
const char *string);
 extern voidaudit_log_d_path(struct audit_buffer *ab,
 const char *prefix,
-struct dentry *dentry,
-struct vfsmount *vfsmnt);
+struct path *path);
 extern voidaudit_log_lost(const char *message);
/* Private API (for audit.c only) */
 extern int audit_filter_user(struct netlink_skb_parms *cb, int type);
@@ -545,7 +544,7 @@ extern int audit_enabled;
 #define audit_log_hex(a,b,l) do { ; } while (0)
 #define audit_log_untrustedstring(a,s) do { ; } while (0)
 #define audit_log_n_untrustedstring(a,n,s) do { ; } while (0)
-#define audit_log_d_path(b,p,d,v) do { ; } while (0)
+#define audit_log_d_path(b,p,d) do { ; } while (0)
 #define audit_enabled 0
 #endif
 #endif
Index: b/kernel/audit.c
===
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1394,26 +1394,26 @@ const char *audit_log_untrustedstring(st
 
 /* This is a helper-function to print the escaped d_path */
 void audit_log_d_path(struct audit_buffer *ab, const char *prefix,
- struct dentry *dentry, struct vfsmount *vfsmnt)
+ struct path *path)
 {
-   char *p, *path;
+   char *p, *pathname;
 
if (prefix)
audit_log_format(ab, " %s", prefix);
 
/* We will allow 11 spaces for ' (deleted)' to be appended */
-   path = kmalloc(PATH_MAX+11, ab->gfp_mask);
-   if (!path) {
+   pathname = kmalloc(PATH_MAX+11, ab->gfp_mask);
+   if (!pathname) {
audit_log_format(ab, "");
return;
}
-   p = d_path(dentry, vfsmnt, path, PATH_MAX+11);
+   p = d_path(path->dentry, path->mnt, pathname, PATH_MAX+11);
if (IS_ERR(p)) { /* Should never happen since we send PATH_MAX */
/* FIXME: can we save some information here? */
audit_log_format(ab, "");
} else
audit_log_untrustedstring(ab, p);
-   kfree(path);
+   kfree(pathname);
 }
 
 /**
Index: b/kernel/auditsc.c
===
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -201,8 +201,7 @@ struct audit_context {
int name_count;
struct audit_names  names[AUDIT_NAMES];
char *  filterkey;  /* key for rule that triggered record */
-   struct dentry * pwd;
-   struct vfsmount *   pwdmnt;
+   struct path pwd;
struct audit_context *previous; /* For nested syscalls */
struct audit_aux_data *aux;
struct audit_aux_data *aux_pids;
@@ -758,12 +757,9 @@ static inline void audit_free_names(stru
__putname(context->names[i].name);
}
context->name_count = 0;
-   if (context->pwd)
-   dput(context->pwd);
-   if (context->pwdmnt)
-   mntput(context->pwdmnt);
-   context->pwd = NULL;
-   context->pwdmnt = NULL;
+   path_put(&context->pwd);
+   context->pwd.dentry = NULL;
+   context->pwd.mnt = NULL;
 }
 
 static inline void audit_free_aux(struct audit_context *context)
@@ -910,8 +906,7 @@ static void audit_log_task_info(struct a
if ((vma->vm_flags & VM_EXECUTABLE) &&
vma->vm_file) {
audit_log_d_path(ab, "exe=",
-vma->vm_file->f_path.dentry,
-vma->vm_file->f_path.mnt);
+&vma->vm_file->f_path);
break;
}
vma = vma->vm_next;
@@ -1

[PATCH 9/9] d_path: Make d_path() use a struct path

2007-11-05 Thread Jan Blunck
d_path() is used on a  pair. Lets use a struct path to
reflect this.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Bryan Wu <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 arch/blackfin/kernel/traps.c  |   12 +---
 drivers/md/bitmap.c   |8 +---
 drivers/usb/gadget/file_storage.c |8 +++-
 fs/compat_ioctl.c |2 +-
 fs/dcache.c   |   12 +---
 fs/dcookies.c |2 +-
 fs/ecryptfs/super.c   |5 ++---
 fs/nfsd/export.c  |3 ++-
 fs/proc/base.c|2 +-
 fs/seq_file.c |4 +++-
 fs/sysfs/file.c   |5 ++---
 fs/unionfs/super.c|3 +--
 include/linux/dcache.h|5 +++--
 kernel/audit.c|2 +-
 14 files changed, 31 insertions(+), 42 deletions(-)

Index: b/arch/blackfin/kernel/traps.c
===
--- a/arch/blackfin/kernel/traps.c
+++ b/arch/blackfin/kernel/traps.c
@@ -98,15 +98,13 @@ static int printk_address(unsigned long 
struct vm_area_struct *vma = vml->vma;
 
if (address >= vma->vm_start && address < vma->vm_end) {
+   char _tmpbuf[256];
char *name = p->comm;
struct file *file = vma->vm_file;
-   if (file) {
-   char _tmpbuf[256];
-   name = d_path(file->f_dentry,
- file->f_vfsmnt,
- _tmpbuf,
- sizeof(_tmpbuf));
-   }
+
+   if (file)
+   name = d_path(&file->f_path, _tmpbuf,
+ sizeof(_tmpbuf));
 
/* FLAT does not have its text aligned to the 
start of
 * the map while FDPIC ELF does ...
Index: b/drivers/md/bitmap.c
===
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -206,16 +206,10 @@ static void bitmap_checkfree(struct bitm
 /* copy the pathname of a file to a buffer */
 char *file_path(struct file *file, char *buf, int count)
 {
-   struct dentry *d;
-   struct vfsmount *v;
-
if (!buf)
return NULL;
 
-   d = file->f_path.dentry;
-   v = file->f_path.mnt;
-
-   buf = d_path(d, v, buf, count);
+   buf = d_path(&file->f_path, buf, count);
 
return IS_ERR(buf) ? NULL : buf;
 }
Index: b/drivers/usb/gadget/file_storage.c
===
--- a/drivers/usb/gadget/file_storage.c
+++ b/drivers/usb/gadget/file_storage.c
@@ -3567,8 +3567,7 @@ static ssize_t show_file(struct device *
 
down_read(&fsg->filesem);
if (backing_file_is_open(curlun)) { // Get the complete pathname
-   p = d_path(curlun->filp->f_path.dentry,
-   curlun->filp->f_path.mnt, buf, PAGE_SIZE - 1);
+   p = d_path(&curlun->filp->f_path, buf, PAGE_SIZE - 1);
if (IS_ERR(p))
rc = PTR_ERR(p);
else {
@@ -3985,9 +3984,8 @@ static int __init fsg_bind(struct usb_ga
if (backing_file_is_open(curlun)) {
p = NULL;
if (pathbuf) {
-   p = d_path(curlun->filp->f_path.dentry,
-   curlun->filp->f_path.mnt,
-   pathbuf, PATH_MAX);
+   p = d_path(&curlun->filp->f_path,
+  pathbuf, PATH_MAX);
if (IS_ERR(p))
p = NULL;
}
Index: b/fs/compat_ioctl.c
===
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -3544,7 +3544,7 @@ static void compat_ioctl_error(struct fi
/* find the name of the device. */
path = (char *)__get_free_page(GFP_KERNEL);
if (path) {
-   fn = d_path(filp->f_path.dentry, filp->f_path.mnt, path, 
PAGE_SIZE);
+   fn = d_path(&filp->f_path, path, PAGE_SIZE);
if (IS_ERR(fn))
fn = "?";
}
Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/d

[PATCH 5/9] d_path: Make get_dcookie() use a struct path argument

2007-11-05 Thread Jan Blunck
get_dcookie() is always called with a dentry and a vfsmount from a struct
path. Make get_dcookie() take it directly as an argument.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 arch/powerpc/oprofile/cell/spu_task_sync.c |   15 +-
 drivers/oprofile/buffer_sync.c |   21 ---
 fs/dcookies.c  |   31 -
 include/linux/dcookies.h   |   15 ++
 4 files changed, 35 insertions(+), 47 deletions(-)

Index: b/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -198,14 +198,13 @@ out:
  * dcookie user still being registered (namely, the reader
  * of the event buffer).
  */
-static inline unsigned long fast_get_dcookie(struct dentry *dentry,
-struct vfsmount *vfsmnt)
+static inline unsigned long fast_get_dcookie(struct path *path)
 {
unsigned long cookie;
 
-   if (dentry->d_cookie)
-   return (unsigned long)dentry;
-   get_dcookie(dentry, vfsmnt, &cookie);
+   if (path->dentry->d_cookie)
+   return (unsigned long)path->dentry;
+   get_dcookie(path, &cookie);
return cookie;
 }
 
@@ -240,8 +239,7 @@ get_exec_dcookie_and_offset(struct spu *
continue;
if (!(vma->vm_flags & VM_EXECUTABLE))
continue;
-   app_cookie = fast_get_dcookie(vma->vm_file->f_dentry,
- vma->vm_file->f_vfsmnt);
+   app_cookie = fast_get_dcookie(&vma->vm_file->f_path);
pr_debug("got dcookie for %s\n",
 vma->vm_file->f_dentry->d_name.name);
app = vma->vm_file;
@@ -262,8 +260,7 @@ get_exec_dcookie_and_offset(struct spu *
break;
}
 
-   *spu_bin_dcookie = fast_get_dcookie(vma->vm_file->f_dentry,
-vma->vm_file->f_vfsmnt);
+   *spu_bin_dcookie = fast_get_dcookie(&vma->vm_file->f_path);
pr_debug("got dcookie for %s\n", vma->vm_file->f_dentry->d_name.name);
 
up_read(&mm->mmap_sem);
Index: b/drivers/oprofile/buffer_sync.c
===
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -187,23 +187,22 @@ void sync_stop(void)
end_sync();
 }
 
- 
+
 /* Optimisation. We can manage without taking the dcookie sem
  * because we cannot reach this code without at least one
  * dcookie user still being registered (namely, the reader
  * of the event buffer). */
-static inline unsigned long fast_get_dcookie(struct dentry * dentry,
-   struct vfsmount * vfsmnt)
+static inline unsigned long fast_get_dcookie(struct path *path)
 {
unsigned long cookie;
- 
-   if (dentry->d_cookie)
-   return (unsigned long)dentry;
-   get_dcookie(dentry, vfsmnt, &cookie);
+
+   if (path->dentry->d_cookie)
+   return (unsigned long)path->dentry;
+   get_dcookie(path, &cookie);
return cookie;
 }
 
- 
+
 /* Look up the dcookie for the task's first VM_EXECUTABLE mapping,
  * which corresponds loosely to "application name". This is
  * not strictly necessary but allows oprofile to associate
@@ -222,8 +221,7 @@ static unsigned long get_exec_dcookie(st
continue;
if (!(vma->vm_flags & VM_EXECUTABLE))
continue;
-   cookie = fast_get_dcookie(vma->vm_file->f_path.dentry,
-   vma->vm_file->f_path.mnt);
+   cookie = fast_get_dcookie(&vma->vm_file->f_path);
break;
}
 
@@ -248,8 +246,7 @@ static unsigned long lookup_dcookie(stru
continue;
 
if (vma->vm_file) {
-   cookie = fast_get_dcookie(vma->vm_file->f_path.dentry,
-   vma->vm_file->f_path.mnt);
+   cookie = fast_get_dcookie(&vma->vm_file->f_path);
*offset = (vma->vm_pgoff << PAGE_SHIFT) + addr -
vma->vm_start;
} else {
Index: b/fs/dcookies.c
===
--- a/fs/dcookies.c
+++ b/fs/dcookies.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* The dcookies are allocated from a kmem_cache and
@@ -31,8 +32,7 @@
  * code here is particularly performance critical
  */
 struct dcookie_struct

[PATCH 1/9] One less parameter to __d_path

2007-11-05 Thread Jan Blunck
All callers to __d_path pass the dentry and vfsmount of a struct
path to __d_path. Pass the struct path directly, instead.

Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/dcache.c |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1778,9 +1778,8 @@ shouldnt_be_hashed:
  *
  * "buflen" should be positive. Caller holds the dcache_lock.
  */
-static char * __d_path( struct dentry *dentry, struct vfsmount *vfsmnt,
-   struct dentry *root, struct vfsmount *rootmnt,
-   char *buffer, int buflen)
+static char * __d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+  struct path *root, char *buffer, int buflen)
 {
char * end = buffer+buflen;
char * retval;
@@ -1805,7 +1804,7 @@ static char * __d_path( struct dentry *d
for (;;) {
struct dentry * parent;
 
-   if (dentry == root && vfsmnt == rootmnt)
+   if (dentry == root->dentry && vfsmnt == root->mnt)
break;
if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
/* Global root? */
@@ -1868,7 +1867,7 @@ char * d_path(struct dentry *dentry, str
path_get(¤t->fs->root);
read_unlock(¤t->fs->lock);
spin_lock(&dcache_lock);
-   res = __d_path(dentry, vfsmnt, root.dentry, root.mnt, buf, buflen);
+   res = __d_path(dentry, vfsmnt, &root, buf, buflen);
spin_unlock(&dcache_lock);
path_put(&root);
return res;
@@ -1936,8 +1935,7 @@ asmlinkage long sys_getcwd(char __user *
unsigned long len;
char * cwd;
 
-   cwd = __d_path(pwd.dentry, pwd.mnt, root.dentry, root.mnt,
-  page, PAGE_SIZE);
+   cwd = __d_path(pwd.dentry, pwd.mnt, &root, page, PAGE_SIZE);
spin_unlock(&dcache_lock);
 
error = PTR_ERR(cwd);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] Use struct path in struct svc_expkey

2007-11-05 Thread Jan Blunck
I'm embedding struct path into struct svc_expkey.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/nfsd/export.c|   30 +-
 include/linux/nfsd/export.h |3 +--
 2 files changed, 14 insertions(+), 19 deletions(-)

Index: b/fs/nfsd/export.c
===
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -63,10 +63,8 @@ static void expkey_put(struct kref *ref)
struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
 
if (test_bit(CACHE_VALID, &key->h.flags) &&
-   !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
-   dput(key->ek_dentry);
-   mntput(key->ek_mnt);
-   }
+   !test_bit(CACHE_NEGATIVE, &key->h.flags))
+   path_put(&key->ek_path);
auth_domain_put(key->ek_client);
kfree(key);
 }
@@ -169,9 +167,8 @@ static int expkey_parse(struct cache_det
goto out;
 
dprintk("Found the path %s\n", buf);
-   key.ek_mnt = nd.path.mnt;
-   key.ek_dentry = nd.path.dentry;
-   
+   key.ek_path = nd.path;
+
ek = svc_expkey_update(&key, ek);
if (ek)
cache_put(&ek->h, &svc_expkey_cache);
@@ -206,7 +203,7 @@ static int expkey_show(struct seq_file *
if (test_bit(CACHE_VALID, &h->flags) && 
!test_bit(CACHE_NEGATIVE, &h->flags)) {
seq_printf(m, " ");
-   seq_path(m, ek->ek_mnt, ek->ek_dentry, "\\ \t\n");
+   seq_path(m, ek->ek_path.mnt, ek->ek_path.dentry, "\\ \t\n");
}
seq_printf(m, "\n");
return 0;
@@ -243,8 +240,8 @@ static inline void expkey_update(struct 
struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
 
-   new->ek_mnt = mntget(item->ek_mnt);
-   new->ek_dentry = dget(item->ek_dentry);
+   new->ek_path = item->ek_path;
+   path_get(&item->ek_path);
 }
 
 static struct cache_head *expkey_alloc(void)
@@ -814,8 +811,7 @@ static int exp_set_key(svc_client *clp, 
key.ek_client = clp;
key.ek_fsidtype = fsid_type;
memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
-   key.ek_mnt = exp->ex_path.mnt;
-   key.ek_dentry = exp->ex_path.dentry;
+   key.ek_path = exp->ex_path;
key.h.expiry_time = NEVER;
key.h.flags = 0;
 
@@ -864,7 +860,7 @@ static svc_export *exp_get_by_name(svc_c
 {
struct svc_export *exp, key;
int err;
-   
+
if (!clp)
return ERR_PTR(-ENOENT);
 
@@ -1036,9 +1032,9 @@ exp_export(struct nfsctl_export *nxp)
/* must make sure there won't be an ex_fsid clash */
if ((nxp->ex_flags & NFSEXP_FSID) &&
(!IS_ERR(fsid_key = exp_get_fsid_key(clp, nxp->ex_dev))) &&
-   fsid_key->ek_mnt &&
-   (fsid_key->ek_mnt != nd.path.mnt ||
-fsid_key->ek_dentry != nd.path.dentry))
+   fsid_key->ek_path.mnt &&
+   (fsid_key->ek_path.mnt != nd.path.mnt ||
+fsid_key->ek_path.dentry != nd.path.dentry))
goto finish;
 
if (!IS_ERR(exp)) {
@@ -1219,7 +1215,7 @@ static struct svc_export *exp_find(struc
if (IS_ERR(ek))
return ERR_PTR(PTR_ERR(ek));
 
-   exp = exp_get_by_name(clp, ek->ek_mnt, ek->ek_dentry, reqp);
+   exp = exp_get_by_name(clp, ek->ek_path.mnt, ek->ek_path.dentry, reqp);
cache_put(&ek->h, &svc_expkey_cache);
 
if (IS_ERR(exp))
Index: b/include/linux/nfsd/export.h
===
--- a/include/linux/nfsd/export.h
+++ b/include/linux/nfsd/export.h
@@ -106,8 +106,7 @@ struct svc_expkey {
int ek_fsidtype;
u32 ek_fsid[6];
 
-   struct vfsmount *   ek_mnt;
-   struct dentry * ek_dentry;
+   struct path ek_path;
 };
 
 #define EX_SECURE(exp) (!((exp)->ex_flags & NFSEXP_INSECURE_PORT))


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] d_path: Make d_path() use a struct path (2nd try)

2007-11-02 Thread Jan Blunck
d_path() is used on a  pair. Lets use a struct path to
reflect this.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 arch/blackfin/kernel/traps.c  |   12 +---
 drivers/md/bitmap.c   |8 +---
 drivers/usb/gadget/file_storage.c |8 +++-
 fs/compat_ioctl.c |2 +-
 fs/dcache.c   |   12 +---
 fs/dcookies.c |2 +-
 fs/ecryptfs/super.c   |5 ++---
 fs/nfsd/export.c  |3 ++-
 fs/proc/base.c|2 +-
 fs/seq_file.c |4 +++-
 fs/sysfs/file.c   |5 ++---
 fs/unionfs/super.c|3 +--
 include/linux/dcache.h|5 +++--
 kernel/audit.c|2 +-
 14 files changed, 31 insertions(+), 42 deletions(-)

Index: b/arch/blackfin/kernel/traps.c
===
--- a/arch/blackfin/kernel/traps.c
+++ b/arch/blackfin/kernel/traps.c
@@ -98,15 +98,13 @@ static int printk_address(unsigned long 
struct vm_area_struct *vma = vml->vma;
 
if (address >= vma->vm_start && address < vma->vm_end) {
+   char _tmpbuf[256];
char *name = p->comm;
struct file *file = vma->vm_file;
-   if (file) {
-   char _tmpbuf[256];
-   name = d_path(file->f_dentry,
- file->f_vfsmnt,
- _tmpbuf,
- sizeof(_tmpbuf));
-   }
+
+   if (file)
+   name = d_path(&file->f_path, _tmpbuf,
+ sizeof(_tmpbuf));
 
/* FLAT does not have its text aligned to the 
start of
 * the map while FDPIC ELF does ...
Index: b/drivers/md/bitmap.c
===
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -206,16 +206,10 @@ static void bitmap_checkfree(struct bitm
 /* copy the pathname of a file to a buffer */
 char *file_path(struct file *file, char *buf, int count)
 {
-   struct dentry *d;
-   struct vfsmount *v;
-
if (!buf)
return NULL;
 
-   d = file->f_path.dentry;
-   v = file->f_path.mnt;
-
-   buf = d_path(d, v, buf, count);
+   buf = d_path(&file->f_path, buf, count);
 
return IS_ERR(buf) ? NULL : buf;
 }
Index: b/drivers/usb/gadget/file_storage.c
===
--- a/drivers/usb/gadget/file_storage.c
+++ b/drivers/usb/gadget/file_storage.c
@@ -3567,8 +3567,7 @@ static ssize_t show_file(struct device *
 
down_read(&fsg->filesem);
if (backing_file_is_open(curlun)) { // Get the complete pathname
-   p = d_path(curlun->filp->f_path.dentry,
-   curlun->filp->f_path.mnt, buf, PAGE_SIZE - 1);
+   p = d_path(&curlun->filp->f_path, buf, PAGE_SIZE - 1);
if (IS_ERR(p))
rc = PTR_ERR(p);
else {
@@ -3985,9 +3984,8 @@ static int __init fsg_bind(struct usb_ga
if (backing_file_is_open(curlun)) {
p = NULL;
if (pathbuf) {
-   p = d_path(curlun->filp->f_path.dentry,
-   curlun->filp->f_path.mnt,
-   pathbuf, PATH_MAX);
+   p = d_path(&curlun->filp->f_path,
+  pathbuf, PATH_MAX);
if (IS_ERR(p))
p = NULL;
}
Index: b/fs/compat_ioctl.c
===
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -3544,7 +3544,7 @@ static void compat_ioctl_error(struct fi
/* find the name of the device. */
path = (char *)__get_free_page(GFP_KERNEL);
if (path) {
-   fn = d_path(filp->f_path.dentry, filp->f_path.mnt, path, 
PAGE_SIZE);
+   fn = d_path(&filp->f_path, path, PAGE_SIZE);
if (IS_ERR(fn))
fn = "?";
}
Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1831,8 +1831,7 @@ Elong:
 
 /**
  * d_path - return the path of a dentry
- * @dentry: den

Re: [PATCH 6/7] d_path: Make d_path() use a struct path

2007-11-02 Thread Jan Blunck
On Fri, Nov 02, Bharata B Rao wrote:

> 
> Did you miss the d_path() caller 
> arch/blackfin/kernel/traps.c:printk_address() ?
> 

Sorry, yes I missed that one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] d_path: Use struct path in struct avc_audit_data

2007-11-01 Thread Jan Blunck
audit_log_d_path() is a d_path() wrapper that is used by the audit code. To
use a struct path in audit_log_d_path() I need to embed it into struct
avc_audit_data.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/linux/audit.h  |5 ++---
 kernel/audit.c |   12 ++--
 kernel/auditsc.c   |   28 +++-
 security/selinux/avc.c |   13 -
 security/selinux/hooks.c   |   28 
 security/selinux/include/avc.h |6 ++
 6 files changed, 41 insertions(+), 51 deletions(-)

Index: b/include/linux/audit.h
===
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -527,8 +527,7 @@ extern const char * audit_log_n_untr
const char *string);
 extern voidaudit_log_d_path(struct audit_buffer *ab,
 const char *prefix,
-struct dentry *dentry,
-struct vfsmount *vfsmnt);
+struct path *path);
 extern voidaudit_log_lost(const char *message);
/* Private API (for audit.c only) */
 extern int audit_filter_user(struct netlink_skb_parms *cb, int type);
@@ -545,7 +544,7 @@ extern int audit_enabled;
 #define audit_log_hex(a,b,l) do { ; } while (0)
 #define audit_log_untrustedstring(a,s) do { ; } while (0)
 #define audit_log_n_untrustedstring(a,n,s) do { ; } while (0)
-#define audit_log_d_path(b,p,d,v) do { ; } while (0)
+#define audit_log_d_path(b,p,d) do { ; } while (0)
 #define audit_enabled 0
 #endif
 #endif
Index: b/kernel/audit.c
===
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1394,26 +1394,26 @@ const char *audit_log_untrustedstring(st
 
 /* This is a helper-function to print the escaped d_path */
 void audit_log_d_path(struct audit_buffer *ab, const char *prefix,
- struct dentry *dentry, struct vfsmount *vfsmnt)
+ struct path *path)
 {
-   char *p, *path;
+   char *p, *pathname;
 
if (prefix)
audit_log_format(ab, " %s", prefix);
 
/* We will allow 11 spaces for ' (deleted)' to be appended */
-   path = kmalloc(PATH_MAX+11, ab->gfp_mask);
-   if (!path) {
+   pathname = kmalloc(PATH_MAX+11, ab->gfp_mask);
+   if (!pathname) {
audit_log_format(ab, "");
return;
}
-   p = d_path(dentry, vfsmnt, path, PATH_MAX+11);
+   p = d_path(path->dentry, path->mnt, pathname, PATH_MAX+11);
if (IS_ERR(p)) { /* Should never happen since we send PATH_MAX */
/* FIXME: can we save some information here? */
audit_log_format(ab, "");
} else
audit_log_untrustedstring(ab, p);
-   kfree(path);
+   kfree(pathname);
 }
 
 /**
Index: b/kernel/auditsc.c
===
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -201,8 +201,7 @@ struct audit_context {
int name_count;
struct audit_names  names[AUDIT_NAMES];
char *  filterkey;  /* key for rule that triggered record */
-   struct dentry * pwd;
-   struct vfsmount *   pwdmnt;
+   struct path pwd;
struct audit_context *previous; /* For nested syscalls */
struct audit_aux_data *aux;
struct audit_aux_data *aux_pids;
@@ -758,12 +757,9 @@ static inline void audit_free_names(stru
__putname(context->names[i].name);
}
context->name_count = 0;
-   if (context->pwd)
-   dput(context->pwd);
-   if (context->pwdmnt)
-   mntput(context->pwdmnt);
-   context->pwd = NULL;
-   context->pwdmnt = NULL;
+   path_put(&context->pwd);
+   context->pwd.dentry = NULL;
+   context->pwd.mnt = NULL;
 }
 
 static inline void audit_free_aux(struct audit_context *context)
@@ -910,8 +906,7 @@ static void audit_log_task_info(struct a
if ((vma->vm_flags & VM_EXECUTABLE) &&
vma->vm_file) {
audit_log_d_path(ab, "exe=",
-vma->vm_file->f_path.dentry,
-vma->vm_file->f_path.mnt);
+&vma->vm_file->f_path);
break;
}
vma = vma->vm_next;
@@ -1177,10 +1172,10 @@ static void audit_log_exit(struct 

[PATCH 7/7] Use struct path in struct svc_export

2007-11-01 Thread Jan Blunck
I'm embedding struct path into struct svc_export.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/nfsd/export.c|   69 +---
 fs/nfsd/nfs3proc.c  |2 -
 fs/nfsd/nfs3xdr.c   |4 +-
 fs/nfsd/nfs4proc.c  |4 +-
 fs/nfsd/nfs4xdr.c   |   12 +++
 fs/nfsd/nfsfh.c |   26 
 fs/nfsd/nfsproc.c   |6 +--
 fs/nfsd/nfsxdr.c|2 -
 fs/nfsd/vfs.c   |   22 +++---
 include/linux/nfsd/export.h |5 +--
 10 files changed, 74 insertions(+), 78 deletions(-)

Index: b/fs/nfsd/export.c
===
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -332,10 +332,9 @@ static void nfsd4_fslocs_free(struct nfs
 static void svc_export_put(struct kref *ref)
 {
struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-   dput(exp->ex_dentry);
-   mntput(exp->ex_mnt);
+   path_put(&exp->ex_path);
auth_domain_put(exp->ex_client);
-   kfree(exp->ex_path);
+   kfree(exp->ex_pathname);
nfsd4_fslocs_free(&exp->ex_fslocs);
kfree(exp);
 }
@@ -346,11 +345,10 @@ static void svc_export_request(struct ca
 {
/*  client path */
struct svc_export *exp = container_of(h, struct svc_export, h);
-   struct path path = { .dentry = exp->ex_dentry, .mnt = exp->ex_mnt };
char *pth;
 
qword_add(bpp, blen, exp->ex_client->name);
-   pth = d_path(&path, *bpp, *blen);
+   pth = d_path(&exp->ex_path, *bpp, *blen);
if (IS_ERR(pth)) {
/* is this correct? */
(*bpp)[0] = '\n';
@@ -509,7 +507,7 @@ static int svc_export_parse(struct cache
int an_int;
 
nd.path.dentry = NULL;
-   exp.ex_path = NULL;
+   exp.ex_pathname = NULL;
 
/* fs locations */
exp.ex_fslocs.locations = NULL;
@@ -548,11 +546,11 @@ static int svc_export_parse(struct cache
 
exp.h.flags = 0;
exp.ex_client = dom;
-   exp.ex_mnt = nd.path.mnt;
-   exp.ex_dentry = nd.path.dentry;
-   exp.ex_path = kstrdup(buf, GFP_KERNEL);
+   exp.ex_path.mnt = nd.path.mnt;
+   exp.ex_path.dentry = nd.path.dentry;
+   exp.ex_pathname = kstrdup(buf, GFP_KERNEL);
err = -ENOMEM;
-   if (!exp.ex_path)
+   if (!exp.ex_pathname)
goto out;
 
/* expiry */
@@ -629,7 +627,7 @@ static int svc_export_parse(struct cache
  out:
nfsd4_fslocs_free(&exp.ex_fslocs);
kfree(exp.ex_uuid);
-   kfree(exp.ex_path);
+   kfree(exp.ex_pathname);
if (nd.path.dentry)
path_put(&nd.path);
  out_no_path:
@@ -654,7 +652,7 @@ static int svc_export_show(struct seq_fi
return 0;
}
exp = container_of(h, struct svc_export, h);
-   seq_path(m, exp->ex_mnt, exp->ex_dentry, " \t\n\\");
+   seq_path(m, exp->ex_path.mnt, exp->ex_path.dentry, " \t\n\\");
seq_putc(m, '\t');
seq_escape(m, exp->ex_client->name, " \t\n\\");
seq_putc(m, '(');
@@ -681,8 +679,8 @@ static int svc_export_match(struct cache
struct svc_export *orig = container_of(a, struct svc_export, h);
struct svc_export *new = container_of(b, struct svc_export, h);
return orig->ex_client == new->ex_client &&
-   orig->ex_dentry == new->ex_dentry &&
-   orig->ex_mnt == new->ex_mnt;
+   orig->ex_path.dentry == new->ex_path.dentry &&
+   orig->ex_path.mnt == new->ex_path.mnt;
 }
 
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
@@ -692,9 +690,9 @@ static void svc_export_init(struct cache
 
kref_get(&item->ex_client->ref);
new->ex_client = item->ex_client;
-   new->ex_dentry = dget(item->ex_dentry);
-   new->ex_mnt = mntget(item->ex_mnt);
-   new->ex_path = NULL;
+   new->ex_path.dentry = dget(item->ex_path.dentry);
+   new->ex_path.mnt = mntget(item->ex_path.mnt);
+   new->ex_pathname = NULL;
new->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = 0;
@@ -712,8 +710,8 @@ static void export_update(struct cache_h
new->ex_fsid = item->ex_fsid;
new->ex_uuid = item->ex_uuid;
item->ex_uuid = NULL;
-   new->ex_path = item->ex_path;
-   item->ex_path = NULL;
+   new->ex_pathname = item->ex_pathname;
+   item->ex_pathname = NULL;
new->ex_fslocs.locations = item->ex_fslocs.locations;
item->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_c

[PATCH 6/7] d_path: Make d_path() use a struct path

2007-11-01 Thread Jan Blunck
d_path() is used on a  pair. Lets use a struct path to
reflect this.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 drivers/md/bitmap.c   |8 +---
 drivers/usb/gadget/file_storage.c |3 +--
 fs/compat_ioctl.c |2 +-
 fs/dcache.c   |   12 +---
 fs/dcookies.c |2 +-
 fs/ecryptfs/super.c   |5 ++---
 fs/nfsd/export.c  |3 ++-
 fs/proc/base.c|2 +-
 fs/seq_file.c |4 +++-
 fs/sysfs/file.c   |5 ++---
 fs/unionfs/super.c|3 +--
 include/linux/dcache.h|5 +++--
 kernel/audit.c|2 +-
 13 files changed, 24 insertions(+), 32 deletions(-)

Index: b/drivers/md/bitmap.c
===
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -206,16 +206,10 @@ static void bitmap_checkfree(struct bitm
 /* copy the pathname of a file to a buffer */
 char *file_path(struct file *file, char *buf, int count)
 {
-   struct dentry *d;
-   struct vfsmount *v;
-
if (!buf)
return NULL;
 
-   d = file->f_path.dentry;
-   v = file->f_path.mnt;
-
-   buf = d_path(d, v, buf, count);
+   buf = d_path(&file->f_path, buf, count);
 
return IS_ERR(buf) ? NULL : buf;
 }
Index: b/drivers/usb/gadget/file_storage.c
===
--- a/drivers/usb/gadget/file_storage.c
+++ b/drivers/usb/gadget/file_storage.c
@@ -3567,8 +3567,7 @@ static ssize_t show_file(struct device *
 
down_read(&fsg->filesem);
if (backing_file_is_open(curlun)) { // Get the complete pathname
-   p = d_path(curlun->filp->f_path.dentry,
-   curlun->filp->f_path.mnt, buf, PAGE_SIZE - 1);
+   p = d_path(&curlun->filp->f_path, buf, PAGE_SIZE - 1);
if (IS_ERR(p))
rc = PTR_ERR(p);
else {
Index: b/fs/compat_ioctl.c
===
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -3544,7 +3544,7 @@ static void compat_ioctl_error(struct fi
/* find the name of the device. */
path = (char *)__get_free_page(GFP_KERNEL);
if (path) {
-   fn = d_path(filp->f_path.dentry, filp->f_path.mnt, path, 
PAGE_SIZE);
+   fn = d_path(&filp->f_path, path, PAGE_SIZE);
if (IS_ERR(fn))
fn = "?";
}
Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1831,8 +1831,7 @@ Elong:
 
 /**
  * d_path - return the path of a dentry
- * @dentry: dentry to report
- * @vfsmnt: vfsmnt to which the dentry belongs
+ * @path: path to report
  * @buf: buffer to return value in
  * @buflen: buffer length
  *
@@ -1843,8 +1842,7 @@ Elong:
  *
  * "buflen" should be positive. Caller holds the dcache_lock.
  */
-char *d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
-char *buf, int buflen)
+char *d_path(struct path *path, char *buf, int buflen)
 {
char *res;
struct path root;
@@ -1856,15 +1854,15 @@ char *d_path(struct dentry *dentry, stru
 * user wants to identify the object in /proc/pid/fd/.  The little hack
 * below allows us to generate a name for these objects on demand:
 */
-   if (dentry->d_op && dentry->d_op->d_dname)
-   return dentry->d_op->d_dname(dentry, buf, buflen);
+   if (path->dentry->d_op && path->dentry->d_op->d_dname)
+   return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
 
read_lock(¤t->fs->lock);
root = current->fs->root;
path_get(¤t->fs->root);
read_unlock(¤t->fs->lock);
spin_lock(&dcache_lock);
-   res = __d_path(dentry, vfsmnt, &root, buf, buflen);
+   res = __d_path(path->dentry, path->mnt, &root, buf, buflen);
spin_unlock(&dcache_lock);
path_put(&root);
return res;
Index: b/fs/dcookies.c
===
--- a/fs/dcookies.c
+++ b/fs/dcookies.c
@@ -170,7 +170,7 @@ asmlinkage long sys_lookup_dcookie(u64 c
goto out;
 
/* FIXME: (deleted) ? */
-   path = d_path(dcs->path.dentry, dcs->path.mnt, kbuf, PAGE_SIZE);
+   path = d_path(&dcs->path, kbuf, PAGE_SIZE);
 
if (IS_ERR(path)) {
err = PTR_ERR(path);
Index: b/fs/ecryptfs/super.c
===
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -163,8 +163,7 @@ stat

[PATCH 5/7] d_path: Make get_dcookie() use a struct path argument

2007-11-01 Thread Jan Blunck
get_dcookie() is always called with a dentry and a vfsmount from a struct
path. Make get_dcookie() take it directly as an argument.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 arch/powerpc/oprofile/cell/spu_task_sync.c |   15 +-
 drivers/oprofile/buffer_sync.c |   21 ---
 fs/dcookies.c  |   31 -
 include/linux/dcookies.h   |   15 ++
 4 files changed, 35 insertions(+), 47 deletions(-)

Index: b/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -198,14 +198,13 @@ out:
  * dcookie user still being registered (namely, the reader
  * of the event buffer).
  */
-static inline unsigned long fast_get_dcookie(struct dentry *dentry,
-struct vfsmount *vfsmnt)
+static inline unsigned long fast_get_dcookie(struct path *path)
 {
unsigned long cookie;
 
-   if (dentry->d_cookie)
-   return (unsigned long)dentry;
-   get_dcookie(dentry, vfsmnt, &cookie);
+   if (path->dentry->d_cookie)
+   return (unsigned long)path->dentry;
+   get_dcookie(path, &cookie);
return cookie;
 }
 
@@ -240,8 +239,7 @@ get_exec_dcookie_and_offset(struct spu *
continue;
if (!(vma->vm_flags & VM_EXECUTABLE))
continue;
-   app_cookie = fast_get_dcookie(vma->vm_file->f_dentry,
- vma->vm_file->f_vfsmnt);
+   app_cookie = fast_get_dcookie(&vma->vm_file->f_path);
pr_debug("got dcookie for %s\n",
 vma->vm_file->f_dentry->d_name.name);
app = vma->vm_file;
@@ -262,8 +260,7 @@ get_exec_dcookie_and_offset(struct spu *
break;
}
 
-   *spu_bin_dcookie = fast_get_dcookie(vma->vm_file->f_dentry,
-vma->vm_file->f_vfsmnt);
+   *spu_bin_dcookie = fast_get_dcookie(&vma->vm_file->f_path);
pr_debug("got dcookie for %s\n", vma->vm_file->f_dentry->d_name.name);
 
up_read(&mm->mmap_sem);
Index: b/drivers/oprofile/buffer_sync.c
===
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -187,23 +187,22 @@ void sync_stop(void)
end_sync();
 }
 
- 
+
 /* Optimisation. We can manage without taking the dcookie sem
  * because we cannot reach this code without at least one
  * dcookie user still being registered (namely, the reader
  * of the event buffer). */
-static inline unsigned long fast_get_dcookie(struct dentry * dentry,
-   struct vfsmount * vfsmnt)
+static inline unsigned long fast_get_dcookie(struct path *path)
 {
unsigned long cookie;
- 
-   if (dentry->d_cookie)
-   return (unsigned long)dentry;
-   get_dcookie(dentry, vfsmnt, &cookie);
+
+   if (path->dentry->d_cookie)
+   return (unsigned long)path->dentry;
+   get_dcookie(path, &cookie);
return cookie;
 }
 
- 
+
 /* Look up the dcookie for the task's first VM_EXECUTABLE mapping,
  * which corresponds loosely to "application name". This is
  * not strictly necessary but allows oprofile to associate
@@ -222,8 +221,7 @@ static unsigned long get_exec_dcookie(st
continue;
if (!(vma->vm_flags & VM_EXECUTABLE))
continue;
-   cookie = fast_get_dcookie(vma->vm_file->f_path.dentry,
-   vma->vm_file->f_path.mnt);
+   cookie = fast_get_dcookie(&vma->vm_file->f_path);
break;
}
 
@@ -248,8 +246,7 @@ static unsigned long lookup_dcookie(stru
continue;
 
if (vma->vm_file) {
-   cookie = fast_get_dcookie(vma->vm_file->f_path.dentry,
-   vma->vm_file->f_path.mnt);
+   cookie = fast_get_dcookie(&vma->vm_file->f_path);
*offset = (vma->vm_pgoff << PAGE_SHIFT) + addr -
vma->vm_start;
} else {
Index: b/fs/dcookies.c
===
--- a/fs/dcookies.c
+++ b/fs/dcookies.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* The dcookies are allocated from a kmem_cache and
@@ -31,8 +32,7 @@
  * code here is particularly performance critical
  */
 struct dcookie_struct {
-   struct dentry * dentry;
-   struct vfs

[PATCH 4/7] d_path: Make proc_get_link() use a struct path argument

2007-11-01 Thread Jan Blunck
proc_get_link() is always called with a dentry and a vfsmount from a struct
path. Make proc_get_link() take it directly as an argument.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/proc/base.c  |   60 
 fs/proc/internal.h  |2 -
 fs/proc/task_mmu.c  |6 ++--
 fs/proc/task_nommu.c|6 ++--
 include/linux/proc_fs.h |2 -
 5 files changed, 34 insertions(+), 42 deletions(-)

Index: b/fs/proc/base.c
===
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -153,7 +153,7 @@ static int get_nr_threads(struct task_st
return count;
 }
 
-static int proc_cwd_link(struct inode *inode, struct dentry **dentry, struct 
vfsmount **mnt)
+static int proc_cwd_link(struct inode *inode, struct path *path)
 {
struct task_struct *task = get_proc_task(inode);
struct fs_struct *fs = NULL;
@@ -165,8 +165,8 @@ static int proc_cwd_link(struct inode *i
}
if (fs) {
read_lock(&fs->lock);
-   *mnt = mntget(fs->pwd.mnt);
-   *dentry = dget(fs->pwd.dentry);
+   *path = fs->pwd;
+   path_get(&fs->pwd);
read_unlock(&fs->lock);
result = 0;
put_fs_struct(fs);
@@ -174,7 +174,7 @@ static int proc_cwd_link(struct inode *i
return result;
 }
 
-static int proc_root_link(struct inode *inode, struct dentry **dentry, struct 
vfsmount **mnt)
+static int proc_root_link(struct inode *inode, struct path *path)
 {
struct task_struct *task = get_proc_task(inode);
struct fs_struct *fs = NULL;
@@ -186,8 +186,8 @@ static int proc_root_link(struct inode *
}
if (fs) {
read_lock(&fs->lock);
-   *mnt = mntget(fs->root.mnt);
-   *dentry = dget(fs->root.dentry);
+   *path = fs->root;
+   path_get(&fs->root);
read_unlock(&fs->lock);
result = 0;
put_fs_struct(fs);
@@ -1039,34 +1039,30 @@ static void *proc_pid_follow_link(struct
if (!proc_fd_access_allowed(inode))
goto out;
 
-   error = PROC_I(inode)->op.proc_get_link(inode, &nd->path.dentry,
-   &nd->path.mnt);
+   error = PROC_I(inode)->op.proc_get_link(inode, &nd->path);
nd->last_type = LAST_BIND;
 out:
return ERR_PTR(error);
 }
 
-static int do_proc_readlink(struct dentry *dentry, struct vfsmount *mnt,
-   char __user *buffer, int buflen)
+static int do_proc_readlink(struct path *path, char __user *buffer, int buflen)
 {
-   struct inode * inode;
char *tmp = (char*)__get_free_page(GFP_TEMPORARY);
-   char *path;
+   char *pathname;
int len;
 
if (!tmp)
return -ENOMEM;
 
-   inode = dentry->d_inode;
-   path = d_path(dentry, mnt, tmp, PAGE_SIZE);
-   len = PTR_ERR(path);
-   if (IS_ERR(path))
+   pathname = d_path(path->dentry, path->mnt, tmp, PAGE_SIZE);
+   len = PTR_ERR(pathname);
+   if (IS_ERR(pathname))
goto out;
-   len = tmp + PAGE_SIZE - 1 - path;
+   len = tmp + PAGE_SIZE - 1 - pathname;
 
if (len > buflen)
len = buflen;
-   if (copy_to_user(buffer, path, len))
+   if (copy_to_user(buffer, pathname, len))
len = -EFAULT;
  out:
free_page((unsigned long)tmp);
@@ -1077,20 +1073,18 @@ static int proc_pid_readlink(struct dent
 {
int error = -EACCES;
struct inode *inode = dentry->d_inode;
-   struct dentry *de;
-   struct vfsmount *mnt = NULL;
+   struct path path;
 
/* Are we allowed to snoop on the tasks file descriptors? */
if (!proc_fd_access_allowed(inode))
goto out;
 
-   error = PROC_I(inode)->op.proc_get_link(inode, &de, &mnt);
+   error = PROC_I(inode)->op.proc_get_link(inode, &path);
if (error)
goto out;
 
-   error = do_proc_readlink(de, mnt, buffer, buflen);
-   dput(de);
-   mntput(mnt);
+   error = do_proc_readlink(&path, buffer, buflen);
+   path_put(&path);
 out:
return error;
 }
@@ -1317,8 +1311,7 @@ out:
 
 #define PROC_FDINFO_MAX 64
 
-static int proc_fd_info(struct inode *inode, struct dentry **dentry,
-   struct vfsmount **mnt, char *info)
+static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 {
struct task_struct *task = get_proc_task(inode);
struct files_struct *files = NULL;
@@ -1337,10 +1330,10 @@ static int proc_fd_info(struct inode *in
spin_lock(&files->file_lock);
file = fcheck_files(files, fd);

[PATCH 0/7] struct path related cleanups of d_path() code

2007-11-01 Thread Jan Blunck

Here are some more struct path cleanups. This patch series changes d_path() to
take a struct path argument. The existing users are changed to give struct path
more deeply into the call chain. In some structures I need to replace existing
 pairs and embed a struct path instead.

Andreas,
since JJ just posted the AppArmor related d_path() patches I don't include
them here too.

Comments?

Jan


>>one-less-parameter-to-__d_path.patch<<
One less parameter to __d_path

>>d_path-kerneldoc_cleanup.diff<<
d_path: kerneldoc cleanup

>>d_path-Use_struct_path_in_struct_avc_audit_data.diff<<
d_path: Use struct path in struct avc_audit_data

>>d_path-Make_proc_get_link_use_a_struct_path_argument.diff<<
d_path: Make proc_get_link() use a struct path argument

>>d_path_Make_get_dcookie_use_a_struct_path_argument.diff<<
d_path: Make get_dcookie() use a struct path argument

>>d_path-use_struct_path.diff<<
d_path: Make d_path() use a struct path

>>nfsd-svc_export_use_struct_path.diff<<
Use struct path in struct svc_export

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] d_path: kerneldoc cleanup

2007-11-01 Thread Jan Blunck
Move and update d_path() kernel API documentation.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/dcache.c |   35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1762,22 +1762,6 @@ shouldnt_be_hashed:
goto shouldnt_be_hashed;
 }
 
-/**
- * d_path - return the path of a dentry
- * @dentry: dentry to report
- * @vfsmnt: vfsmnt to which the dentry belongs
- * @root: root dentry
- * @rootmnt: vfsmnt to which the root dentry belongs
- * @buffer: buffer to return value in
- * @buflen: buffer length
- *
- * Convert a dentry into an ASCII path name. If the entry has been deleted
- * the string " (deleted)" is appended. Note that this is ambiguous.
- *
- * Returns the buffer or an error code if the path was too long.
- *
- * "buflen" should be positive. Caller holds the dcache_lock.
- */
 static char * __d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
   struct path *root, char *buffer, int buflen)
 {
@@ -1845,9 +1829,22 @@ Elong:
return ERR_PTR(-ENAMETOOLONG);
 }
 
-/* write full pathname into buffer and return start of pathname */
-char * d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
-   char *buf, int buflen)
+/**
+ * d_path - return the path of a dentry
+ * @dentry: dentry to report
+ * @vfsmnt: vfsmnt to which the dentry belongs
+ * @buf: buffer to return value in
+ * @buflen: buffer length
+ *
+ * Convert a dentry into an ASCII path name. If the entry has been deleted
+ * the string " (deleted)" is appended. Note that this is ambiguous.
+ *
+ * Returns the buffer or an error code if the path was too long.
+ *
+ * "buflen" should be positive. Caller holds the dcache_lock.
+ */
+char *d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+char *buf, int buflen)
 {
char *res;
struct path root;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] One less parameter to __d_path

2007-11-01 Thread Jan Blunck
All callers to __d_path pass the dentry and vfsmount of a struct
path to __d_path. Pass the struct path directly, instead.

Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/dcache.c |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1778,9 +1778,8 @@ shouldnt_be_hashed:
  *
  * "buflen" should be positive. Caller holds the dcache_lock.
  */
-static char * __d_path( struct dentry *dentry, struct vfsmount *vfsmnt,
-   struct dentry *root, struct vfsmount *rootmnt,
-   char *buffer, int buflen)
+static char * __d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+  struct path *root, char *buffer, int buflen)
 {
char * end = buffer+buflen;
char * retval;
@@ -1805,7 +1804,7 @@ static char * __d_path( struct dentry *d
for (;;) {
struct dentry * parent;
 
-   if (dentry == root && vfsmnt == rootmnt)
+   if (dentry == root->dentry && vfsmnt == root->mnt)
break;
if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
/* Global root? */
@@ -1868,7 +1867,7 @@ char * d_path(struct dentry *dentry, str
path_get(¤t->fs->root);
read_unlock(¤t->fs->lock);
spin_lock(&dcache_lock);
-   res = __d_path(dentry, vfsmnt, root.dentry, root.mnt, buf, buflen);
+   res = __d_path(dentry, vfsmnt, &root, buf, buflen);
spin_unlock(&dcache_lock);
path_put(&root);
return res;
@@ -1936,8 +1935,7 @@ asmlinkage long sys_getcwd(char __user *
unsigned long len;
char * cwd;
 
-   cwd = __d_path(pwd.dentry, pwd.mnt, root.dentry, root.mnt,
-  page, PAGE_SIZE);
+   cwd = __d_path(pwd.dentry, pwd.mnt, &root, page, PAGE_SIZE);
spin_unlock(&dcache_lock);
 
error = PTR_ERR(cwd);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] Use struct path in struct nameidata

2007-10-23 Thread Jan Blunck
On Tue, Oct 23, Bharata B Rao wrote:

> On Mon, Oct 22, 2007 at 03:57:58PM +0200, Christoph Hellwig wrote:
> > 
> > Any reason we've got this patchset posted by three people now? :)
> 
> Two reasons actually !
> 
> - The set of patches posted by Jan last was on 2.6.23-rc8-mm1. So I
> thought let me help Andrew a bit by making them available on latest
> -mm :) And I didn't know that these were already under consideration
> by Andrew.

And they have been ported to 2.6.23-mm1 since the day it came out ...

> - The set of patches posted by Jan didn't even pass compile test for me.
> So I made sure that the patches compiled and worked on x86, x86_64 and 
> powerpc.

And I merged your feedback already in ...

The thing is: how do we keep going from here? Do you want to send my patches
in the future or are you going to ask me before sending things out? We don't
need to duplicate the work here. I already put my quilt stack into a public
place for you to work on them but I don't like the way this is going on at the
moment.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] Use struct path in struct nameidata

2007-10-23 Thread Jan Blunck
On Mon, Oct 22, Andrew Morton wrote:

> On Mon, 22 Oct 2007 15:57:58 +0200
> Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Any reason we've got this patchset posted by three people now? :)
> 
> presumably because I haven't been merging it.
> 
> I was in bugfix-only mode from a week prior to 2.6.24 release and during
> the merge window.  Partly caused by the already-idiotic amount of stuff we
> had queued for 2.6.24, partly because we needed to concentrate on
> stabilising the 2.6.25 patchpile rather than writing new stuff.
> 
> And partly to send the signal that rather than beavering away on new
> features all the time, we should also be spending some (more) time testing,
> reviewing and bugfixing the current and soon-to-be-current code.
> 
> Probably I should have been more explicit about it, but it wasn't really
> planned.  Next time I'll send more "thanks, I parked this for consideration
> at a more appropriate time" emails.

Oh, I got that one. I don't know why Bharata sent the patch series
again. Especially since I don't know which version of the patches he sent out :(

The original patches where rotting in my queue for quite some time until
Andreas got fed up with me having no time to work on them. Now since Andreas
is busy working on other things I took over again. Thats it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 1/2] i_op->readdir: Change readdir() to be an inode operation

2007-10-20 Thread Jan Blunck
This patch adds a new readdir() inode operation. The purpose of this patch is
to enable the VFS to support directory reading on a stack of directories. The
new interface isn't passing the struct file to the filesystem implementation
anymore. Normally the filesystem implementation shouldn't depend on any
information in struct file except for the dentry, the cookie (f_pos) and the
users credentials.

The new interface for the readdir inode operation is as follows:

int (*readdir) (struct dentry *dentry, loff_t *pos, void *private,
filldir_t filler, void *dirent);

@dentry: the dentry of the directory
@pos: pointer to the cookie
@private: the credentials (at the moment it is still filp->private_data
@filler: the filldir to call
@dirent: the dirent buffer

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/readdir.c   |   14 --
 include/linux/fs.h |2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

Index: b/fs/readdir.c
===
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -23,7 +24,8 @@ int vfs_readdir(struct file *file, filld
 {
struct inode *inode = file->f_path.dentry->d_inode;
int res = -ENOTDIR;
-   if (!file->f_op || !file->f_op->readdir)
+   if ((!file->f_op || !file->f_op->readdir) &&
+   (!inode->i_op || !inode->i_op->readdir))
goto out;
 
res = security_file_permission(file, MAY_READ);
@@ -33,7 +35,15 @@ int vfs_readdir(struct file *file, filld
mutex_lock(&inode->i_mutex);
res = -ENOENT;
if (!IS_DEADDIR(inode)) {
-   res = file->f_op->readdir(file, buf, filler);
+   if (inode->i_op->readdir) {
+   printk(KERN_DEBUG "i_op->readdir @ ");
+   print_ip_sym((unsigned long)inode->i_op);
+   res = inode->i_op->readdir(file->f_path.dentry,
+  &file->f_pos,
+  file->private_data,
+  filler, buf);
+   } else
+   res = file->f_op->readdir(file, buf, filler);
file_accessed(file);
}
mutex_unlock(&inode->i_mutex);
Index: b/include/linux/fs.h
===
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1214,6 +1214,8 @@ struct inode_operations {
int (*mkdir) (struct inode *,struct dentry *,int);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+   /* readdir(dentry, position, private/credential, filler, buffer) */
+   int (*readdir) (struct dentry *, loff_t *, void *, filldir_t, void *);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char __user *,int);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 2/2] i_op->readdir: Change libfs users to the new interface

2007-10-20 Thread Jan Blunck
This patch changes dcache_readdir() to the new inode operations readdir
interface. Hence all the users of libfs.c are changed to use the new interface
too.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/autofs4/autofs_i.h |5 ++---
 fs/autofs4/root.c |   41 -
 fs/cifs/inode.c   |1 +
 fs/hugetlbfs/inode.c  |1 +
 fs/libfs.c|   27 ++-
 fs/ocfs2/dlm/dlmfs.c  |1 +
 fs/ramfs/inode.c  |1 +
 include/linux/fs.h|3 ++-
 mm/shmem.c|1 +
 9 files changed, 47 insertions(+), 34 deletions(-)

Index: b/fs/autofs4/autofs_i.h
===
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -168,10 +168,9 @@ static inline int autofs4_ispending(stru
return pending;
 }
 
-static inline void autofs4_copy_atime(struct file *src, struct file *dst)
+static inline void autofs4_copy_atime(struct inode *src, struct inode *dst)
 {
-   dst->f_path.dentry->d_inode->i_atime =
-   src->f_path.dentry->d_inode->i_atime;
+   dst->i_atime = src->i_atime;
return;
 }
 
Index: b/fs/autofs4/root.c
===
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -35,7 +35,6 @@ const struct file_operations autofs4_roo
.open   = dcache_dir_open,
.release= dcache_dir_close,
.read   = generic_read_dir,
-   .readdir= autofs4_root_readdir,
.ioctl  = autofs4_root_ioctl,
 };
 
@@ -43,7 +42,6 @@ const struct file_operations autofs4_dir
.open   = autofs4_dir_open,
.release= autofs4_dir_close,
.read   = generic_read_dir,
-   .readdir= autofs4_dir_readdir,
 };
 
 const struct inode_operations autofs4_indirect_root_inode_operations = {
@@ -52,6 +50,7 @@ const struct inode_operations autofs4_in
.symlink= autofs4_dir_symlink,
.mkdir  = autofs4_dir_mkdir,
.rmdir  = autofs4_dir_rmdir,
+   .readdir= autofs4_root_readdir,
 };
 
 const struct inode_operations autofs4_direct_root_inode_operations = {
@@ -59,6 +58,7 @@ const struct inode_operations autofs4_di
.unlink = autofs4_dir_unlink,
.mkdir  = autofs4_dir_mkdir,
.rmdir  = autofs4_dir_rmdir,
+   .readdir= autofs4_root_readdir,
.follow_link= autofs4_follow_link,
 };
 
@@ -68,15 +68,17 @@ const struct inode_operations autofs4_di
.symlink= autofs4_dir_symlink,
.mkdir  = autofs4_dir_mkdir,
.rmdir  = autofs4_dir_rmdir,
+   .readdir= autofs4_dir_readdir,
 };
 
-static int autofs4_root_readdir(struct file *file, void *dirent,
-   filldir_t filldir)
+static int autofs4_root_readdir(struct dentry *dentry, loff_t *pos,
+   void *private,
+   filldir_t filldir, void *dirent)
 {
-   struct autofs_sb_info *sbi = autofs4_sbi(file->f_path.dentry->d_sb);
+   struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
int oz_mode = autofs4_oz_mode(sbi);
 
-   DPRINTK("called, filp->f_pos = %lld", file->f_pos);
+   DPRINTK("called, filp->f_pos = %lld", *pos);
 
/*
 * Don't set reghost flag if:
@@ -84,12 +86,12 @@ static int autofs4_root_readdir(struct f
 * 2) we haven't even enabled reghosting in the 1st place.
 * 3) this is the daemon doing a readdir
 */
-   if (oz_mode && file->f_pos == 0 && sbi->reghost_enabled)
+   if (oz_mode && *pos == 0 && sbi->reghost_enabled)
sbi->needs_reghost = 1;
 
DPRINTK("needs_reghost = %d", sbi->needs_reghost);
 
-   return dcache_readdir(file, dirent, filldir);
+   return dcache_inode_readdir(dentry, pos, private,filldir, dirent);
 }
 
 static int autofs4_dir_open(struct inode *inode, struct file *file)
@@ -201,15 +203,16 @@ out:
return status;
 }
 
-static int autofs4_dir_readdir(struct file *file, void *dirent, filldir_t 
filldir)
+static int autofs4_dir_readdir(struct dentry *dentry, loff_t *pos,
+  void *private,
+  filldir_t filldir, void *dirent)
 {
-   struct dentry *dentry = file->f_path.dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
-   struct dentry *cursor = file->private_data;
+   struct dentry *cursor = private;
int status;
 
-   DPRINTK("file=%p dentry=%p %.*s",
-   file, dentry, dentry->d_name.len, dentry->d_name.name);
+   DPRINTK("dentry=%p %.*s", dentry, dentry->d_name.len,
+   dentry-

[RFC 0/2] readdir() as an inode operation

2007-10-20 Thread Jan Blunck
This is a first try to move readdir() to become an inode operation. This is
necessary for a VFS implementation of "something like union-mounts" where a
readdir() needs to read the directory contents of multiple directories.
Besides that the new interface is no longer giving the struct file to the
filesystem implementations anymore.

Comments, please?
Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] r/o bind mounts: Accept passing a mnt NULL pointer to mnt_drop_write()

2007-10-12 Thread Jan Blunck
In case of somebody opens a file with dentry_open(dentry, NULL, ...) we don't
want to stumble on the NULL pointer mnt in struct file.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namespace.c |3 +++
 1 file changed, 3 insertions(+)

Index: b/fs/namespace.c
===
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -253,6 +253,9 @@ void mnt_drop_write(struct vfsmount *mnt
int must_check_underflow = 0;
struct mnt_writer *cpu_writer;
 
+   if (!mnt)
+   return;
+
cpu_writer = &get_cpu_var(mnt_writers);
spin_lock(&cpu_writer->lock);
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/2] r/o bind mount fixes for 2.6.23-mm1

2007-10-12 Thread Jan Blunck
Here are two small patches for 2.6.23-mm1 that fix some issues with the r/o
bind mount code. Besides that, I can see that you handle files opened by
dentry_open() somewhere. Nevertheless this files are also fput'ed later.

Regards,
Jan

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] r/o bind mounts: Dont touch the vfsmount after path_put()

2007-10-12 Thread Jan Blunck
mnt_drop_write() is called after releasing the reference to the path with
path_put().

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 net/unix/af_unix.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/net/unix/af_unix.c
===
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -752,8 +752,8 @@ static struct sock *unix_find_other(stru
if (u->sk_type == type)
touch_atime(nd.path.mnt, nd.path.dentry);
 
-   path_put(&nd.path);
mnt_drop_write(nd.path.mnt);
+   path_put(&nd.path);
 
err=-EPROTOTYPE;
if (u->sk_type != type) {

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] VFS: Remove lives_below_in_same_fs()

2007-10-12 Thread Jan Blunck
This is another cleanup which removes lives_below_in_same_fs() since
is_subdir() from fs/dcache.c is providing the same functionality and is more
widespreadly used.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namespace.c |   13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

Index: b/fs/namespace.c
===
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1036,17 +1036,6 @@ static bool permit_mount(struct nameidat
return true;
 }
 
-static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry)
-{
-   while (1) {
-   if (d == dentry)
-   return 1;
-   if (d == NULL || d == d->d_parent)
-   return 0;
-   d = d->d_parent;
-   }
-}
-
 struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
int flag, uid_t owner)
 {
@@ -1063,7 +1052,7 @@ struct vfsmount *copy_tree(struct vfsmou
 
p = mnt;
list_for_each_entry(r, &mnt->mnt_mounts, mnt_child) {
-   if (!lives_below_in_same_fs(r->mnt_mountpoint, dentry))
+   if (!is_subdir(r->mnt_mountpoint, dentry))
continue;
 
for (s = r; s; s = next_mnt(s, r)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel NULL pointer dereference in check_spread+0x0/0x26

2007-10-12 Thread Jan Blunck
On Fri, Oct 12, Frederik Deweerdt wrote:

> On Fri, Oct 12, 2007 at 02:40:54PM +0200, Jan Blunck wrote:
> > This is with 2.6.23-mm1 and allmodconfig.
> This generates a .config with CONFIG_SCHED_DEBUG=y and
> CONFIG_FAIR_GROUP_SCHED=n (The latter causes parent_entity to return
> NULL).
> Does setting CONFIG_FAIR_GROUP_SCHED=y help?

No, CONFIG_FAIR_GROUP_SCHED=y was set. Or did you mean CONFIG_FAIR_CGROUP_SCHED?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel NULL pointer dereference in check_spread+0x0/0x26

2007-10-12 Thread Jan Blunck
This is with 2.6.23-mm1 and allmodconfig.

Seems that se in the following is a NULL pointer.

453 static void check_spread(struct cfs_rq *cfs_rq, struct
sched_entity *se)
454 {
455 #ifdef CONFIG_SCHED_DEBUG
456 s64 d = se->vruntime - cfs_rq->min_vruntime;
457
458 if (d < 0)
459 d = -d;

Cheers,
Jan

--
[1.344000] Unable to handle kernel NULL pointer dereference at 
0040 RIP: 
[1.348000]  [] check_spread+0x0/0x26
[1.356000] PGD 0 
[1.36] Oops:  [1] SMP 
[1.364000] last sysfs file: 
[1.368000] CPU 1 
[1.368000] Modules linked in:
[1.372000] Pid: 2, comm: kthreadd Not tainted 2.6.23-mm1-jbl-gab69b1c9 #4
[1.38] RIP: 0010:[]  [] 
check_spread+0x0/0x26
[1.388000] RSP: 0018:810005753d58  EFLAGS: 00010083
[1.392000] RAX: 1908 RBX: 81008006b900 RCX: 0c31
[1.40] RDX: 03938700 RSI:  RDI: 81008006b900
[1.408000] RBP: 810005753d90 R08: 810005753d40 R09: 8100800d0798
[1.416000] R10:  R11: 0001 R12: 810005c12000
[1.424000] R13: 810005c12048 R14:  R15: 0001
[1.428000] FS:  () GS:810005401960() 
knlGS:
[1.44] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[1.444000] CR2: 0040 CR3: 00201000 CR4: 06e0
[1.452000] DR0:  DR1:  DR2: 
[1.46] DR3:  DR6: 0ff0 DR7: 0400
[1.468000] Process kthreadd (pid: 2, threadinfo 810005752000, task 
81000575)
[1.476000] last branch before last exception/interrupt
[1.48]  from  [] task_new_fair+0xd0/0x111
[1.488000]  to  [] check_spread+0x0/0x26
[1.492000] Stack:  802363f7 8100800d0780 8100800d0780 
810005c12000
[1.50]  000f   
810005753dc0
[1.508000]  80236cf1 00800711 0286 
810005c12000
[1.516000] Call Trace:
[1.52]  [] task_new_fair+0xd5/0x111
[1.524000]  [] wake_up_new_task+0x84/0xb8
[1.532000]  [] do_fork+0x186/0x286
[1.536000]  [] __lock_acquire+0xe4b/0xf40
[1.544000]  [] kernel_thread+0x81/0xde
[1.548000]  [] kthread+0x0/0x76
[1.552000]  [] child_rip+0x0/0x12
[1.56]  [] kthreadd+0xcd/0x143
[1.564000]  [] child_rip+0xa/0x12
[1.568000]  [] restore_args+0x0/0x30
[1.576000]  [] __mpol_free+0x3b/0x40
[1.58]  [] kthreadd+0x0/0x143
[1.584000]  [] child_rip+0x0/0x12
[1.592000] 
[1.592000] INFO: lockdep is turned off.
[1.596000] 
[1.596000] Code: 48 8b 46 40 48 2b 47 20 55 48 89 e5 48 99 48 31 d0 48 29 
d0 
[1.604000] RIP  [] check_spread+0x0/0x26
[1.612000]  RSP 
[1.616000] CR2: 0040
[1.62] kthreadd used greatest stack depth: 5512 bytes left
[   36.348000] BUG: spinlock lockup on CPU#2, swapper/0, 8100800d0780
[   36.356000] 
[   36.356000] Call Trace:
[   36.36][] _raw_spin_lock+0x126/0x14e
[   36.368000]  [] _spin_lock+0x46/0x53
[   36.372000]  [] scheduler_tick+0x42/0x1d8
[   36.376000]  [] update_process_times+0x82/0x92
[   36.384000]  [] tick_periodic+0x6e/0x7a
[   36.388000]  [] tick_handle_periodic+0x21/0x5e
[   36.396000]  [] default_idle+0x0/0x64
[   36.40]  [] smp_local_timer_interrupt+0x5a/0x5e
[   36.408000]  [] smp_apic_timer_interrupt+0x3a/0x54
[   36.416000]  [] default_idle+0x0/0x64
[   36.42]  [] apic_timer_interrupt+0x6b/0x70
[   36.428000][] default_idle+0x4a/0x64
[   36.432000]  [] default_idle+0x48/0x64
[   36.44]  [] cpu_idle+0xbd/0xf8
[   36.444000]  [] start_secondary+0x3d6/0x3e7
[   36.448000] 
[   36.452000] INFO: lockdep is turned off.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 09/10] Use struct path in fs_struct

2007-10-09 Thread Jan Blunck
* Use struct path in fs_struct.

Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/dcache.c   |   34 ---
 fs/namei.c|   53 ++
 fs/namespace.c|   57 --
 fs/proc/base.c|8 +++---
 include/linux/fs_struct.h |6 +---
 init/do_mounts.c  |6 ++--
 kernel/auditsc.c  |4 +--
 kernel/exit.c |   12 +++--
 kernel/fork.c |   18 +++---
 9 files changed, 87 insertions(+), 111 deletions(-)

Index: b/fs/dcache.c
===
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1851,8 +1851,7 @@ char * d_path(struct dentry *dentry, str
char *buf, int buflen)
 {
char *res;
-   struct vfsmount *rootmnt;
-   struct dentry *root;
+   struct path root;
 
/*
 * We have various synthetic filesystems that never get mounted.  On
@@ -1865,14 +1864,13 @@ char * d_path(struct dentry *dentry, str
return dentry->d_op->d_dname(dentry, buf, buflen);
 
read_lock(¤t->fs->lock);
-   rootmnt = mntget(current->fs->rootmnt);
-   root = dget(current->fs->root);
+   root = current->fs->root;
+   path_get(¤t->fs->root);
read_unlock(¤t->fs->lock);
spin_lock(&dcache_lock);
-   res = __d_path(dentry, vfsmnt, root, rootmnt, buf, buflen);
+   res = __d_path(dentry, vfsmnt, root.dentry, root.mnt, buf, buflen);
spin_unlock(&dcache_lock);
-   dput(root);
-   mntput(rootmnt);
+   path_put(&root);
return res;
 }
 
@@ -1918,28 +1916,28 @@ char *dynamic_dname(struct dentry *dentr
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size)
 {
int error;
-   struct vfsmount *pwdmnt, *rootmnt;
-   struct dentry *pwd, *root;
+   struct path pwd, root;
char *page = (char *) __get_free_page(GFP_USER);
 
if (!page)
return -ENOMEM;
 
read_lock(¤t->fs->lock);
-   pwdmnt = mntget(current->fs->pwdmnt);
-   pwd = dget(current->fs->pwd);
-   rootmnt = mntget(current->fs->rootmnt);
-   root = dget(current->fs->root);
+   pwd = current->fs->pwd;
+   path_get(¤t->fs->pwd);
+   root = current->fs->root;
+   path_get(¤t->fs->root);
read_unlock(¤t->fs->lock);
 
error = -ENOENT;
/* Has the current directory has been unlinked? */
spin_lock(&dcache_lock);
-   if (pwd->d_parent == pwd || !d_unhashed(pwd)) {
+   if (pwd.dentry->d_parent == pwd.dentry || !d_unhashed(pwd.dentry)) {
unsigned long len;
char * cwd;
 
-   cwd = __d_path(pwd, pwdmnt, root, rootmnt, page, PAGE_SIZE);
+   cwd = __d_path(pwd.dentry, pwd.mnt, root.dentry, root.mnt,
+  page, PAGE_SIZE);
spin_unlock(&dcache_lock);
 
error = PTR_ERR(cwd);
@@ -1957,10 +1955,8 @@ asmlinkage long sys_getcwd(char __user *
spin_unlock(&dcache_lock);
 
 out:
-   dput(pwd);
-   mntput(pwdmnt);
-   dput(root);
-   mntput(rootmnt);
+   path_put(&pwd);
+   path_put(&root);
free_page((unsigned long) page);
return error;
 }
Index: b/fs/namei.c
===
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -550,16 +550,16 @@ walk_init_root(const char *name, struct 
struct fs_struct *fs = current->fs;
 
read_lock(&fs->lock);
-   if (fs->altroot && !(nd->flags & LOOKUP_NOALT)) {
-   nd->path.mnt = mntget(fs->altrootmnt);
-   nd->path.dentry = dget(fs->altroot);
+   if (fs->altroot.dentry && !(nd->flags & LOOKUP_NOALT)) {
+   nd->path = fs->altroot;
+   path_get(&fs->altroot);
read_unlock(&fs->lock);
if (__emul_lookup_dentry(name,nd))
return 0;
read_lock(&fs->lock);
}
-   nd->path.mnt = mntget(fs->rootmnt);
-   nd->path.dentry = dget(fs->root);
+   nd->path = fs->root;
+   path_get(&fs->root);
read_unlock(&fs->lock);
return 1;
 }
@@ -756,8 +756,8 @@ static __always_inline void follow_dotdo
struct dentry *old = nd->path.dentry;
 
 read_lock(&fs->lock);
-   if (nd->path.dentry == fs->root &&
-   nd->path.mnt == fs->rootmnt) {
+   

[patch 10/10] Make set_fs_{root,pwd} take a struct path

2007-10-09 Thread Jan Blunck
In nearly all cases the set_fs_{root,pwd}() calls work on a struct
path. Change the function to reflect this and use path_get() here.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/namespace.c|   28 ++--
 fs/open.c |   12 
 include/linux/fs_struct.h |4 ++--
 3 files changed, 20 insertions(+), 24 deletions(-)

Index: b/fs/namespace.c
===
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2040,15 +2040,14 @@ out1:
  * Replace the fs->{rootmnt,root} with {mnt,dentry}. Put the old values.
  * It can block. Requires the big lock held.
  */
-void set_fs_root(struct fs_struct *fs, struct vfsmount *mnt,
-struct dentry *dentry)
+void set_fs_root(struct fs_struct *fs, struct path *path)
 {
struct path old_root;
 
write_lock(&fs->lock);
old_root = fs->root;
-   fs->root.mnt = mntget(mnt);
-   fs->root.dentry = dget(dentry);
+   fs->root = *path;
+   path_get(path);
write_unlock(&fs->lock);
if (old_root.dentry)
path_put(&old_root);
@@ -2058,15 +2057,14 @@ void set_fs_root(struct fs_struct *fs, s
  * Replace the fs->{pwdmnt,pwd} with {mnt,dentry}. Put the old values.
  * It can block. Requires the big lock held.
  */
-void set_fs_pwd(struct fs_struct *fs, struct vfsmount *mnt,
-   struct dentry *dentry)
+void set_fs_pwd(struct fs_struct *fs, struct path *path)
 {
struct path old_pwd;
 
write_lock(&fs->lock);
old_pwd = fs->pwd;
-   fs->pwd.mnt = mntget(mnt);
-   fs->pwd.dentry = dget(dentry);
+   fs->pwd = *path;
+   path_get(path);
write_unlock(&fs->lock);
 
if (old_pwd.dentry)
@@ -2087,12 +2085,10 @@ static void chroot_fs_refs(struct nameid
task_unlock(p);
if (fs->root.dentry == old_nd->path.dentry
&& fs->root.mnt == old_nd->path.mnt)
-   set_fs_root(fs, new_nd->path.mnt,
-   new_nd->path.dentry);
+   set_fs_root(fs, &new_nd->path);
if (fs->pwd.dentry == old_nd->path.dentry
&& fs->pwd.mnt == old_nd->path.mnt)
-   set_fs_pwd(fs, new_nd->path.mnt,
-  new_nd->path.dentry);
+   set_fs_pwd(fs, &new_nd->path);
put_fs_struct(fs);
} else
task_unlock(p);
@@ -2235,6 +2231,7 @@ static void __init init_mount_tree(void)
 {
struct vfsmount *mnt;
struct mnt_namespace *ns;
+   struct path root;
 
mnt = do_kern_mount("rootfs", 0, "rootfs", NULL);
if (IS_ERR(mnt))
@@ -2253,8 +2250,11 @@ static void __init init_mount_tree(void)
init_task.nsproxy->mnt_ns = ns;
get_mnt_ns(ns);
 
-   set_fs_pwd(current->fs, ns->root, ns->root->mnt_root);
-   set_fs_root(current->fs, ns->root, ns->root->mnt_root);
+   root.mnt = ns->root;
+   root.dentry = ns->root->mnt_root;
+
+   set_fs_pwd(current->fs, &root);
+   set_fs_root(current->fs, &root);
 }
 
 void __init mnt_init(void)
Index: b/fs/open.c
===
--- a/fs/open.c
+++ b/fs/open.c
@@ -501,7 +501,7 @@ asmlinkage long sys_chdir(const char __u
if (error)
goto dput_and_out;
 
-   set_fs_pwd(current->fs, nd.path.mnt, nd.path.dentry);
+   set_fs_pwd(current->fs, &nd.path);
 
 dput_and_out:
path_put(&nd.path);
@@ -512,9 +512,7 @@ out:
 asmlinkage long sys_fchdir(unsigned int fd)
 {
struct file *file;
-   struct dentry *dentry;
struct inode *inode;
-   struct vfsmount *mnt;
int error;
 
error = -EBADF;
@@ -522,9 +520,7 @@ asmlinkage long sys_fchdir(unsigned int 
if (!file)
goto out;
 
-   dentry = file->f_path.dentry;
-   mnt = file->f_path.mnt;
-   inode = dentry->d_inode;
+   inode = file->f_path.dentry->d_inode;
 
error = -ENOTDIR;
if (!S_ISDIR(inode->i_mode))
@@ -532,7 +528,7 @@ asmlinkage long sys_fchdir(unsigned int 
 
error = file_permission(file, MAY_EXEC);
if (!error)
-   set_fs_pwd(current->fs, mnt, dentry);
+   set_fs_pwd(current->fs, &file->f_path);
 out_putf:
fput(file);
 out:
@@ -556,7 +552,7 @@ asmlinkage long sys_chroot(const char __
if (!ca

[patch 06/10] Introduce path_put()

2007-10-09 Thread Jan Blunck
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 arch/alpha/kernel/osf_sys.c  |2 
 arch/mips/kernel/sysirix.c   |6 +-
 arch/parisc/hpux/sys_hpux.c  |2 
 arch/powerpc/platforms/cell/spufs/syscalls.c |2 
 arch/sparc64/solaris/fs.c|4 -
 drivers/md/dm-table.c|2 
 drivers/mtd/mtdsuper.c   |4 -
 fs/afs/mntpt.c   |2 
 fs/autofs4/root.c|2 
 fs/block_dev.c   |2 
 fs/coda/pioctl.c |4 -
 fs/compat.c  |4 -
 fs/configfs/symlink.c|4 -
 fs/dquot.c   |2 
 fs/ecryptfs/main.c   |2 
 fs/exec.c|4 -
 fs/ext3/super.c  |4 -
 fs/ext4/super.c  |4 -
 fs/gfs2/ops_fstype.c |2 
 fs/inotify_user.c|4 -
 fs/namei.c   |   56 ++-
 fs/namespace.c   |   20 -
 fs/nfs/namespace.c   |2 
 fs/nfsctl.c  |2 
 fs/nfsd/export.c |   10 ++--
 fs/nfsd/nfs4recover.c|2 
 fs/nfsd/nfs4state.c  |2 
 fs/open.c|   22 +-
 fs/proc/base.c   |2 
 fs/reiserfs/super.c  |8 +--
 fs/revoke.c  |2 
 fs/stat.c|6 +-
 fs/utimes.c  |2 
 fs/xattr.c   |   16 +++
 fs/xfs/linux-2.6/xfs_ioctl.c |2 
 include/linux/namei.h|7 ---
 include/linux/path.h |2 
 kernel/auditfilter.c |4 -
 net/sunrpc/rpc_pipe.c|2 
 net/unix/af_unix.c   |6 +-
 40 files changed, 119 insertions(+), 118 deletions(-)

Index: b/arch/alpha/kernel/osf_sys.c
===
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -261,7 +261,7 @@ osf_statfs(char __user *path, struct osf
retval = user_path_walk(path, &nd);
if (!retval) {
retval = do_osf_statfs(nd.path.dentry, buffer, bufsiz);
-   path_release(&nd);
+   path_put(&nd.path);
}
return retval;
 }
Index: b/arch/mips/kernel/sysirix.c
===
--- a/arch/mips/kernel/sysirix.c
+++ b/arch/mips/kernel/sysirix.c
@@ -711,7 +711,7 @@ asmlinkage int irix_statfs(const char __
}
 
 dput_and_out:
-   path_release(&nd);
+   path_put(&nd.path);
 out:
return error;
 }
@@ -1385,7 +1385,7 @@ asmlinkage int irix_statvfs(char __user 
error |= __put_user(0, &buf->f_fstr[i]);
 
 dput_and_out:
-   path_release(&nd);
+   path_put(&nd.path);
 out:
return error;
 }
@@ -1636,7 +1636,7 @@ asmlinkage int irix_statvfs64(char __use
error |= __put_user(0, &buf->f_fstr[i]);
 
 dput_and_out:
-   path_release(&nd);
+   path_put(&nd.path);
 out:
return error;
 }
Index: b/arch/parisc/hpux/sys_hpux.c
===
--- a/arch/parisc/hpux/sys_hpux.c
+++ b/arch/parisc/hpux/sys_hpux.c
@@ -222,7 +222,7 @@ asmlinkage long hpux_statfs(const char _
error = vfs_statfs_hpux(nd.path.dentry, &tmp);
if (!error && copy_to_user(buf, &tmp, sizeof(tmp)))
error = -EFAULT;
-   path_release(&nd);
+   path_put(&nd.path);
}
return error;
 }
Index: b/arch/powerpc/platforms/cell/spufs/syscalls.c
===
--- a/arch/powerpc/platforms/cell/spufs/syscalls.c
+++ b/arch/powerpc/platforms/cell/spufs/syscalls.c
@@ -73,7 +73,7 @@ static long do_spu_create(const char __u
LOOKUP_OPEN|LOOKUP_CREATE, &nd);
if (!ret) {
ret = spufs_create(&nd, f

[patch 01/10] Dont touch fs_struct in drivers

2007-10-09 Thread Jan Blunck
The sound drivers and the pnpbios core test for current->root != NULL. This
test seems to be unnecessary since we always have rootfs mounted before
initializing the drivers.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 drivers/pnp/pnpbios/core.c |2 --
 sound/core/seq/seq_clientmgr.c |4 ++--
 sound/core/seq/seq_device.c|3 ---
 sound/core/sound.c |4 
 sound/core/timer.c |2 --
 sound/ppc/daca.c   |5 ++---
 sound/ppc/tumbler.c|5 ++---
 7 files changed, 6 insertions(+), 19 deletions(-)

Index: b/drivers/pnp/pnpbios/core.c
===
--- a/drivers/pnp/pnpbios/core.c
+++ b/drivers/pnp/pnpbios/core.c
@@ -105,8 +105,6 @@ static int pnp_dock_event(int dock, stru
char *argv[3], **envp, *buf, *scratch;
int i = 0, value;
 
-   if (!current->fs->root)
-   return -EAGAIN;
if (!(envp = kcalloc(20, sizeof(char *), GFP_KERNEL)))
return -ENOMEM;
if (!(buf = kzalloc(256, GFP_KERNEL))) {
Index: b/sound/core/seq/seq_clientmgr.c
===
--- a/sound/core/seq/seq_clientmgr.c
+++ b/sound/core/seq/seq_clientmgr.c
@@ -152,13 +152,13 @@ struct snd_seq_client *snd_seq_client_us
}
spin_unlock_irqrestore(&clients_lock, flags);
 #ifdef CONFIG_KMOD
-   if (!in_interrupt() && current->fs->root) {
+   if (!in_interrupt()) {
static char client_requested[SNDRV_SEQ_GLOBAL_CLIENTS];
static char card_requested[SNDRV_CARDS];
if (clientid < SNDRV_SEQ_GLOBAL_CLIENTS) {
int idx;

-   if (! client_requested[clientid] && current->fs->root) {
+   if (!client_requested[clientid]) {
client_requested[clientid] = 1;
for (idx = 0; idx < 15; idx++) {
if (seq_client_load[idx] < 0)
Index: b/sound/core/seq/seq_device.c
===
--- a/sound/core/seq/seq_device.c
+++ b/sound/core/seq/seq_device.c
@@ -150,9 +150,6 @@ void snd_seq_device_load_drivers(void)
if (snd_seq_in_init)
return;
 
-   if (! current->fs->root)
-   return;
-
mutex_lock(&ops_mutex);
list_for_each_entry(ops, &opslist, list) {
if (! (ops->driver & DRIVER_LOADED) &&
Index: b/sound/core/sound.c
===
--- a/sound/core/sound.c
+++ b/sound/core/sound.c
@@ -72,8 +72,6 @@ static DEFINE_MUTEX(sound_mutex);
  */
 void snd_request_card(int card)
 {
-   if (! current->fs->root)
-   return;
if (snd_card_locked(card))
return;
if (card < 0 || card >= cards_limit)
@@ -87,8 +85,6 @@ static void snd_request_other(int minor)
 {
char *str;
 
-   if (! current->fs->root)
-   return;
switch (minor) {
case SNDRV_MINOR_SEQUENCER: str = "snd-seq";break;
case SNDRV_MINOR_TIMER: str = "snd-timer";  break;
Index: b/sound/core/timer.c
===
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -148,8 +148,6 @@ static struct snd_timer *snd_timer_find(
 
 static void snd_timer_request(struct snd_timer_id *tid)
 {
-   if (! current->fs->root)
-   return;
switch (tid->dev_class) {
case SNDRV_TIMER_CLASS_GLOBAL:
if (tid->device < timer_limit)
Index: b/sound/ppc/daca.c
===
--- a/sound/ppc/daca.c
+++ b/sound/ppc/daca.c
@@ -246,9 +246,8 @@ int __init snd_pmac_daca_init(struct snd
struct pmac_daca *mix;
 
 #ifdef CONFIG_KMOD
-   if (current->fs->root)
-   request_module("i2c-powermac");
-#endif /* CONFIG_KMOD */   
+   request_module("i2c-powermac");
+#endif /* CONFIG_KMOD */
 
mix = kzalloc(sizeof(*mix), GFP_KERNEL);
if (! mix)
Index: b/sound/ppc/tumbler.c
===
--- a/sound/ppc/tumbler.c
+++ b/sound/ppc/tumbler.c
@@ -1327,9 +1327,8 @@ int __init snd_pmac_tumbler_init(struct 
char *chipname;
 
 #ifdef CONFIG_KMOD
-   if (current->fs->root)
-   request_module("i2c-powermac");
-#endif /* CONFIG_KMOD */   
+   request_module("i2c-powermac");
+#endif /* CONFIG_KMOD */
 
mix = kzalloc(sizeof(*mix), GFP_KERNEL);
if (! mix)

-- 

-
To unsub

[patch 07/10] Use path_put() in a few places instead of {mnt,d}put()

2007-10-09 Thread Jan Blunck
Use path_put() in a few places instead of {mnt,d}put()

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/afs/mntpt.c |3 +--
 fs/namei.c |   15 +--
 2 files changed, 6 insertions(+), 12 deletions(-)

Index: b/fs/afs/mntpt.c
===
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -235,8 +235,7 @@ static void *afs_mntpt_follow_link(struc
err = do_add_mount(newmnt, nd, MNT_SHRINKABLE, &afs_vfsmounts);
switch (err) {
case 0:
-   dput(nd->path.dentry);
-   mntput(nd->path.mnt);
+   path_put(&nd->path);
nd->path.mnt = newmnt;
nd->path.dentry = dget(newmnt->mnt_root);
schedule_delayed_work(&afs_mntpt_expiry_timer,
Index: b/fs/namei.c
===
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -626,8 +626,7 @@ static __always_inline int __do_follow_l
if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}
-   dput(dentry);
-   mntput(path->mnt);
+   path_put(path);
 
return error;
 }
@@ -1034,8 +1033,7 @@ static int fastcall link_path_walk(const
result = __link_path_walk(name, nd);
}
 
-   dput(save.path.dentry);
-   mntput(save.path.mnt);
+   path_put(&save.path);
 
return result;
 }
@@ -1057,8 +1055,7 @@ static int __emul_lookup_dentry(const ch
 
if (!nd->path.dentry->d_inode ||
S_ISDIR(nd->path.dentry->d_inode->i_mode)) {
-   struct dentry *old_dentry = nd->path.dentry;
-   struct vfsmount *old_mnt = nd->path.mnt;
+   struct path old_path = nd->path;
struct qstr last = nd->last;
int last_type = nd->last_type;
struct fs_struct *fs = current->fs;
@@ -1074,14 +1071,12 @@ static int __emul_lookup_dentry(const ch
read_unlock(&fs->lock);
if (path_walk(name, nd) == 0) {
if (nd->path.dentry->d_inode) {
-   dput(old_dentry);
-   mntput(old_mnt);
+   path_put(&old_path);
return 1;
}
path_put(&nd->path);
}
-   nd->path.dentry = old_dentry;
-   nd->path.mnt = old_mnt;
+   nd->path = old_path;
nd->last = last;
nd->last_type = last_type;
}

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 04/10] Move struct path into its own header

2007-10-09 Thread Jan Blunck
Move the definition of struct path into its own header file for further
patches.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 include/linux/namei.h |6 +-
 include/linux/path.h  |   12 
 2 files changed, 13 insertions(+), 5 deletions(-)

Index: b/include/linux/namei.h
===
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct vfsmount;
 
@@ -30,11 +31,6 @@ struct nameidata {
} intent;
 };
 
-struct path {
-   struct vfsmount *mnt;
-   struct dentry *dentry;
-};
-
 /*
  * Type of the last component on LOOKUP_PARENT
  */
Index: b/include/linux/path.h
===
--- /dev/null
+++ b/include/linux/path.h
@@ -0,0 +1,12 @@
+#ifndef _LINUX_PATH_H
+#define _LINUX_PATH_H
+
+struct dentry;
+struct vfsmount;
+
+struct path {
+   struct vfsmount *mnt;
+   struct dentry *dentry;
+};
+
+#endif  /* _LINUX_PATH_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/10] Dont touch fs_struct in usermodehelper

2007-10-09 Thread Jan Blunck
This test seems to be unnecessary since we always have rootfs mounted before
calling a usermodehelper.

Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
Acked-by: Greg KH <[EMAIL PROTECTED]>
---
 kernel/kmod.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

Index: b/kernel/kmod.c
===
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -173,10 +173,7 @@ static int call_usermodehelper(void 
 */
set_user_nice(current, 0);
 
-   retval = -EPERM;
-   if (current->fs->root)
-   retval = kernel_execve(sub_info->path,
-   sub_info->argv, sub_info->envp);
+   retval = kernel_execve(sub_info->path, sub_info->argv, sub_info->envp);
 
/* Exec failed? */
sub_info->retval = retval;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 00/10] Use struct path in struct nameidata

2007-10-09 Thread Jan Blunck
This is a respin for inclusion into -mm of the patch series I send on 27th
September. I haven't changed the patches except for letting them apply on
2.6.23-rc8-mm1.

Andrew, please add this to -mm.

Thanks,
Jan

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 03/10] Remove path_release_on_umount()

2007-10-09 Thread Jan Blunck
path_release_on_umount() should only be called from sys_umount(). I merged the
function into sys_umount() instead of having in in namei.c.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/namei.c|   10 --
 fs/namespace.c|4 +++-
 include/linux/namei.h |1 -
 3 files changed, 3 insertions(+), 12 deletions(-)

Index: b/fs/namei.c
===
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -368,16 +368,6 @@ void path_release(struct nameidata *nd)
mntput(nd->mnt);
 }
 
-/*
- * umount() mustn't call path_release()/mntput() as that would clear
- * mnt_expiry_mark
- */
-void path_release_on_umount(struct nameidata *nd)
-{
-   dput(nd->dentry);
-   mntput_no_expire(nd->mnt);
-}
-
 /**
  * release_open_intent - free up open intent resources
  * @nd: pointer to nameidata
Index: b/fs/namespace.c
===
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -988,7 +988,9 @@ asmlinkage long sys_umount(char __user *
 
retval = do_umount(nd.mnt, flags);
 dput_and_out:
-   path_release_on_umount(&nd);
+   /* we mustn't call path_put() as that would clear mnt_expiry_mark */
+   dput(nd.dentry);
+   mntput_no_expire(nd.mnt);
 out:
return retval;
 }
Index: b/include/linux/namei.h
===
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -73,7 +73,6 @@ extern int FASTCALL(path_lookup(const ch
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
   const char *, unsigned int, struct nameidata *);
 extern void path_release(struct nameidata *);
-extern void path_release_on_umount(struct nameidata *);
 
 extern int __user_path_lookup_open(const char __user *, unsigned lookup_flags, 
struct nameidata *nd, int open_flags);
 extern int path_lookup_open(int dfd, const char *name, unsigned lookup_flags, 
struct nameidata *, int open_flags);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 08/10] Introduce path_get()

2007-10-09 Thread Jan Blunck
This introduces the symmetric function to path_put() for getting a reference
to the dentry and vfsmount of a struct path in the right order.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
---
 fs/namei.c|   17 +++--
 include/linux/namei.h |6 --
 include/linux/path.h  |1 +
 3 files changed, 16 insertions(+), 8 deletions(-)

Index: b/fs/namei.c
===
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -363,6 +363,19 @@ int deny_write_access(struct file * file
 }
 
 /**
+ * path_get - get a reference to a path
+ * @path: path to get the reference to
+ *
+ * Given a path increment the reference count to the dentry and the vfsmount.
+ */
+void path_get(struct path *path)
+{
+   mntget(path->mnt);
+   dget(path->dentry);
+}
+EXPORT_SYMBOL(path_get);
+
+/**
  * path_put - put a reference to a path
  * @path: path to put the reference to
  *
@@ -1161,8 +1174,8 @@ static int fastcall do_path_lookup(int d
if (retval)
goto fput_fail;
 
-   nd->path.mnt = mntget(file->f_path.mnt);
-   nd->path.dentry = dget(dentry);
+   nd->path = file->f_path;
+   path_get(&file->f_path);
 
fput_light(file, fput_needed);
}
Index: b/include/linux/namei.h
===
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -94,10 +94,4 @@ static inline char *nd_get_link(struct n
return nd->saved_names[nd->depth];
 }
 
-static inline void pathget(struct path *path)
-{
-   mntget(path->mnt);
-   dget(path->dentry);
-}
-
 #endif /* _LINUX_NAMEI_H */
Index: b/include/linux/path.h
===
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -9,6 +9,7 @@ struct path {
struct dentry *dentry;
 };
 
+extern void path_get(struct path *);
 extern void path_put(struct path *);
 
 #endif  /* _LINUX_PATH_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Combine path_put and path_put_conditional

2007-09-29 Thread Jan Blunck
On Fri, Sep 28, Andreas Gruenbacher wrote:

> The name path_put_conditional (formerly, dput_path) is a little unclear.
> Replace (path_put_conditional + path_put) with path_walk_put_both,
> "put a pair of paths after a path_walk" (see the kerneldoc).

Hmm, I don't know. To put both the nd and path is at the moment only used in
some error paths. I have another series of patches pending which is using
path_put_conditional outside of error paths. So please don't remove
it. Besides that the naming completely hides that the conditional release of
the vfsmount reference. Besides that I would name it path_put_both() just to
make it more "beautiful" wrt the other path_put*() functions.

> @@ -996,8 +1006,8 @@ return_reval:
>  return_base:
>   return 0;
>  out_dput:
> - path_put_conditional(&next, nd);
> - break;
> + path_walk_put_both(&next, &nd->path);
> + goto return_err;
>   }
>   path_put(&nd->path);
>  return_err:
> @@ -1777,11 +1787,15 @@ ok:
>   return 0;
>  
>  exit_dput:
> - path_put_conditional(&path, nd);
> + path_walk_put_both(&path, &nd->path);
> + goto exit_intent;
> +
>  exit:
> + path_put(&nd->path);
> +
> +exit_intent:
>   if (!IS_ERR(nd->intent.open.file))
>   release_open_intent(nd);
> - path_put(&nd->path);
>   return error;
>  
>  do_link:

IMHO introducing another label just to use it here isn't worth the change.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing the Current Upstream Kernel

2007-08-15 Thread Jan Blunck
On Wed, 15 Aug 2007 13:14:04 +0200, Sam Ravnborg wrote:

> On Wed, Aug 15, 2007 at 01:08:51PM +0200, Jan Blunck wrote:
>> (besides those we need for building a RPM).
> 
> Are these openSUSE specific or something we ought to apply to mainline?
> 

If there are patches (at the moment there are none) than they are 
openSUSE Build Service specific. Otherwise we usually send patches 
upstream.

Regards,
    Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing the Current Upstream Kernel

2007-08-15 Thread Jan Blunck
Andrew Morton has spoken at different occasions about testing of the
Linux kernel and asked users to test the current development version and
report their findings.  For our openSUSE releases we have in general a
frozen version and add only fixes for bugs that are encountered during
testing - but stay with the same version for the lifetime of a release.

With our openSUSE Build Service we build a daily kernel, where we take
nightly snapshots of the current upstream development kernel (Linus' kernel
tree linux-2.6.git) without any patches (besides those we need for building a
RPM).  We do call this the vanilla kernel. It can be downloaded from:

 http://download.opensuse.org/repositories/Kernel:/Vanilla/SUSE_Factory/

To install and use the vanilla kernel you might need to update some of your
user-space tools as well (like udev etc.). If you are running the latest
openSUSE Developer Version (aka Factory) you should already have the correct
version of tools installed.

If you test the vanilla kernel, report any problems to the Linux kernel
mailing list (for details check the FAQ at http://www.tux.org/lkml/) and not
to the openSUSE bugzilla. If you want to report problems with the RPMs itself
or have general questions about the Kernel:Vanilla project use the
[EMAIL PROTECTED] mailing list.

Andreas Jaeger <[EMAIL PROTECTED]> and Jan Blunck <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [patch 01/11] dm-snap: Replace special round_down()

2007-08-13 Thread Jan Blunck
On Fri, Aug 10, Jan Blunck wrote:

> This patch removes the special round_down() to next power of 2 implementation
> used only at one place in the snapshot target. It is replaced by an equivalent
> 1 << fls() which might use an architecture specific implementation.

Err, ok this is the second fix for this patch. I hope it is the last. It
doesn't make any sense to have hash_size be defined as sector_t. Same goes for
max_buckets.

Regards,
    Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>
Subject: dm-snap: Replace special round_down()

This patch removes the special round_down() to next power of 2 implementation
used only at one place in the snapshot target. It is replaced by an equivalent
1 << fls() which might use an architecture specific implementation.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 drivers/md/dm-snap.c |   15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -333,21 +333,12 @@ static int calc_max_buckets(void)
 }
 
 /*
- * Rounds a number down to a power of 2.
- */
-static uint32_t round_down(uint32_t n)
-{
-	while (n & (n - 1))
-		n &= (n - 1);
-	return n;
-}
-
-/*
  * Allocate room for a suitable hash table.
  */
 static int init_hash_tables(struct dm_snapshot *s)
 {
-	sector_t hash_size, cow_dev_size, origin_dev_size, max_buckets;
+	unsigned int hash_size, max_buckets;
+	sector_t cow_dev_size, origin_dev_size;
 
 	/*
 	 * Calculate based on the size of the original volume or
@@ -361,7 +352,7 @@ static int init_hash_tables(struct dm_sn
 	hash_size = min(hash_size, max_buckets);
 
 	/* Round it down to a power of 2 */
-	hash_size = round_down(hash_size);
+	hash_size = 1 << (fls(hash_size) - 1);
 	if (init_exception_table(&s->complete, hash_size))
 		return -ENOMEM;
 


Re: [patch 03/11] dm-snap: Remove dead queued_bios code

2007-08-13 Thread Jan Blunck
On Fri, Aug 10, Alasdair G Kergon wrote:

> On Fri, Aug 10, 2007 at 10:02:07PM +0200, Jan Blunck wrote:
> > This patch removes the unused queued_bios handling code.
>  
> Well I'm going to leave this in for now (as one existing unfinished
> patch series does build upon it).  We can include its removal as part of
> your following patch series that no doubt replaces the functionality
> with something better.

You mean the snapshot read tracking. Yes, I think I can fix this problem too.

Regards,
Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/11] dm-snap: remove SECTOR_SHIFT define

2007-08-13 Thread Jan Blunck
On Fri, Aug 10, Alasdair G Kergon wrote:

> On Fri, Aug 10, 2007 at 10:02:09PM +0200, Jan Blunck wrote:
> > Sector size on Linux is always 512 bytes. Don't even try to give the
> > impression this is changeable.
> 
> If that's what worries you, add a comment next to the definition,
> perhaps?

That and that the code just looks more like the rest of the block layer
code. Only a few users (UFS, MSDOS, HFS, IDE) use a predefined
SECTOR_{SHIFT,SIZE}.

> It's there so you can easily locate all the places within dm that
> perform these conversions by using a simple search.  Searching for '9'
> wouldn't be as easy.  (I don't know about other people, but I find the
> code easier to read the way it is.)

Hmm, so I guess this is more about dis/like of how the code looks. Maybe we
should define a global SECTOR_{SIZE,SHIFT} into blkdev.h.

Regards,
Jan

-- 
Jan Blunck <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/11] dm-snap: Replace special round_down()

2007-08-12 Thread Jan Blunck
On Fri, 10 Aug 2007 22:02:05 +0200, Jan Blunck wrote:

> This patch removes the special round_down() to next power of 2 implementation
> used only at one place in the snapshot target. It is replaced by an equivalent
> 1 << fls() which might use an architecture specific implementation.
> 
> Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> ---
>  drivers/md/dm-snap.c |   12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -333,16 +333,6 @@ static int calc_max_buckets(void)
>  }
>  
>  /*
> - * Rounds a number down to a power of 2.
> - */
> -static uint32_t round_down(uint32_t n)
> -{
> - while (n & (n - 1))
> - n &= (n - 1);
> - return n;
> -}
> -
> -/*
>   * Allocate room for a suitable hash table.
>   */
>  static int init_hash_tables(struct dm_snapshot *s)
> @@ -361,7 +351,7 @@ static int init_hash_tables(struct dm_sn
>   hash_size = min(hash_size, max_buckets);
>  
>   /* Round it down to a power of 2 */
> - hash_size = round_down(hash_size);
> + hash_size = 1 << fls(hash_size);
>   if (init_exception_table(&s->complete, hash_size))
>   return -ENOMEM;
>  
>

Err, this is sooo stupid. First, 1 << fls(n) isn't round down but
roundup_pow_of_2(). Second, hash_size is of type sector_t which could be
u64 with CONFIG_LBD.

-- 
Subject: dm-snap: Replace special round_down()

This patch removes the special round_down() to next power of 2 implementation
used only at one place in the snapshot target. It is replaced by an equivalent
1 << fls() which might use an architecture specific implementation.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 drivers/md/dm-snap.c |   12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -333,16 +333,6 @@ static int calc_max_buckets(void)
 }
 
 /*
- * Rounds a number down to a power of 2.
- */
-static uint32_t round_down(uint32_t n)
-{
-   while (n & (n - 1))
-   n &= (n - 1);
-   return n;
-}
-
-/*
  * Allocate room for a suitable hash table.
  */
 static int init_hash_tables(struct dm_snapshot *s)
@@ -361,7 +351,7 @@ static int init_hash_tables(struct dm_sn
hash_size = min(hash_size, max_buckets);
 
/* Round it down to a power of 2 */
-   hash_size = round_down(hash_size);
+   hash_size = 1 << (fls_long(hash_size) - 1);
if (init_exception_table(&s->complete, hash_size))
return -ENOMEM;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] oprofile: Make callgraph use dump_trace() on i386/x86_64

2007-08-10 Thread Jan Blunck
On Fri, Aug 10, Andi Kleen wrote:

> On Friday 10 August 2007 15:35:29 [EMAIL PROTECTED] wrote:
> > This patch improves oprofile callgraphs for i386/x86_64. The old backtracing
> > code was unable to produce even kernel backtraces if the kernel wasn't
> > compiled with framepointers. The code now uses dump_trace().
> 
> Hmm one issue i didn't notice before: with imprecise backtrace
> the profiling could be a little random because even if the same
> call chain is hit repeatedly the garbage left over stack entries also
> reported could vary and then cause oprofile to put it into different
> buckets. But there is probably not much that can be 
> done about that.

Yes, but before we didn't had any callgraphs for x86_64 since with
framepointers enabled the backtrace is having very strange results too.

I guess this is the best what we can achieve at the moment. Eventually when we
have fast, precise backtraces in the kernel the oprofile code benefits from
that automatically with this patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jan Blunck
On Thu, Aug 02, Ph. Marek wrote:

> On Mittwoch, 1. August 2007, Josef Sipek wrote:
> > Alright not the greatest of examples, there is something to be said about
> > symmetry, so...let me try again :)
> ...
> > Oops! There's a whiteout in /b that hides the directory in /c -- rename(2)
> > shouldn't make directory subtrees disappear.
> >
> > There are two ways to solve this:
> >
> > 1) "cp -r" the entire subtree ...
> >
> > 2) Don't store whiteouts within branches ...
> Sorry for making uninformed guesses, but if there are already special nodes 
> (whiteout), why not extending them to some more general format - specifying a 
> (source, destination) pair at the topmost level?
> - A delete is a (source, NULL) pair
> - A rename is a (source, destination) pair, which causes lookups on source to
>   use the string destination in the lower branches.

Originally I had the idea that whiteouts are a special kind of symlink. After
discussing that with various people sticked to the simplest approach.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jan Blunck
On Wed, Aug 01, Erez Zadok wrote:

> There are three other reasons why Unionfs and our users like to have
> multiple writable branches:
> 

...

>And yes, it does make our implementation more complex.

And error-prone and unflexible wrt to changes. When XIP was introduced,
unionfs crashed all over this changes. I don't know if this has changed
yet. Not speaking of other issues like calling back into VFS (stack usage),
locking problems and so on.

> 3. Some people use Unionfs in the scenario described in point #2 above, as a
>poor man's space- and load- distribution system.  Some of our users like
>the idea of controlling how much storage space they give each branch, and
>how much it might grow, and even how much CPU or I/O load might be placed
>on each of the lower filesystems which serve a given branch.  That way
>they worry less about the top-layer's space filling up more quickly than
>expected.  Now Unionfs was never designed to be a load-balancing f/s (we
>have RAIF for that, see ),
>but users seems to always find creative ways to [ab]use one's software in
>ways one never thought of. :-)

And this has nothing to do with unioning ...

> BTW, does Union Mounts copyup on meta-data changes (e.g., chmod, chgrp,
> etc.)?

No. But it was proposed during on of the last postings.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jan Blunck
On Wed, Aug 01, Josef Sipek wrote:

> This brings up an very interesting (but painful) question...which makes more
> sense? Allowing the modifications in only the top-most branch, or any branch
> (given the user allows it at mount-time)?

My implementation is keeping things simple because of reason. There have been
many attempts to get unioning working on the filesystem layer. Most of them
failed because of complexity. E.g. BSD throwed away all of the filesystem
stacking support after they tried to fix unionfs for years. Writing to lower
layers is making things unnecessary complex. Therefore I left it out.

> > > 1) "cp -r" the entire subtree being renamed to highest-priority branch, 
> > > and
> > > rename there (you might have to recreate a series of directories to have a
> > > place to "cp" to...so you got "cp -r" _AND_ "mkdir -p"-like code in the 
> > > VFS!
> > > 1/2 a :) )
> > 
> > I think this is the only alternative, given the design.
>  
> Right. Doing something like this at the filesystem level (as we do in
> unionfs) seems less painful - filesystems are places full of all sorts of
> nefarious activities to begin with. Having it in the VFS seems...even
> uglier.

The userspace is doing it since I return -EXDEV. And that even comes for
free. I don't need to hack around and call back into VFS as you do. It is so
simple and straightforward in the VFS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jan Blunck
On Tue, Jul 31, Josef Sipek wrote:

> > So you think that just because you mounted the filesystem somewhere else it
> > should look different? This is what sharing is all about. If you share a
> > filesystem you also share the removal of objects.
> 
> The removal happens at the union level, not the branch level. Say you have:
> 
> /a/
> /b/foo
> /c/foo
> 
> And you mount /u1 as a union of {a,b}, and /u2 as union of {a,c}.
> 
> $ find /u*
> /u1
> /u1/foo
> /u2
> /u2/foo
> $ rm /u1/foo # this creates whiteout for "foo" in /a
> $ find /u*
> /u1
> /u2
> 
> Is that what you'd expect as a user? I don't think so.
> 

Yes, although that might sound strange: you are sharing the topmost writable
layer. This is what I expect.

> > > store. We did an experiment with Unionfs, and moving the whiteout handling
> > > to effectively a "library" that did all the dirty work cleaned up the code
> > > considerably [2,3].
> > 
> > Haven't checked if you could use ODF for a generic store for filesystems 
> > that
> > couldn't support whiteouts. This might be an interesting idea.
>  
> Yes, since the ODF is completely separate, you can use _any_ filesystem and
> regardless of whether or not they support whiteouts.

Completely separate? It is totally tied to UnionFS and tries to work out
purely the problems that this kind of VFS emulating filesystems have.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 00/26] VFS based Union Mount (V2)

2007-08-02 Thread Jan Blunck
On Thu, Aug 02, Bharata B Rao wrote:

> On Mon, Jul 30, 2007 at 06:13:23PM +0200, Jan Blunck wrote:
> > Here is another post of the VFS based union mount implementation. Unlike the
> > traditional mount which hides the contents of the mount point, union mounts
> > present the merged view of the mount point and the mounted filesytem.
> 
> Doesn't compile without CONFIG_DEBUG_UNION_MOUNT.
> 
> fs/namei.c: In function `hash_lookup_union':
> fs/namei.c:1798: error: implicit declaration of function `UM_DEBUG_LOOKUP'
> make[1]: *** [fs/namei.o] Error 1

Umm, typo in the debug infrastruture patch. Here is the fixed version.

Thanks,
Jan
Subject: union-mount: Debug Infrastructure

This adds debugfs/relay based debugging infrastructure helpful when doing
development of the union-mount code itself. The debgging output can be enabled
during runtime by:

 echo 1 > /proc/sys/fs/union-debug

This registers the relayfs files where the debug code is writing its output
to. There are different levels of debugging output available which can be ORed
together. For the valid sysctl values see include/linux/union_debug.h.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/linux/union_debug.h |   91 ++
 lib/Kconfig.debug   |9 +
 lib/Makefile|2 
 lib/union_debug.c   |  268 
 4 files changed, 370 insertions(+)

--- /dev/null
+++ b/include/linux/union_debug.h
@@ -0,0 +1,91 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *   Author(s): Jan Blunck ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef __LINUX_UNION_DEBUG_H
+#define __LINUX_UNION_DEBUG_H
+
+#ifdef __KERNEL__
+
+#ifdef CONFIG_DEBUG_UNION_MOUNT
+
+#include 
+
+/* This is taken from klog debugging facility */
+extern void klog(const void *data, int len);
+extern void klog_printk(const char *fmt, ...);
+extern void klog_printk_dentry(const char *func, struct dentry *dentry);
+
+extern int sysctl_union_debug;
+
+#define UNION_MOUNT_DEBUG		1
+#define UNION_MOUNT_DEBUG_DCACHE	2
+#define UNION_MOUNT_DEBUG_LOCK		4
+#define UNION_MOUNT_DEBUG_READDIR	8
+#define UNION_MOUNT_DEBUG_LOOKUP	16
+
+#define UM_DEBUG(fmt, args...)		\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG)			\
+		klog_printk("%s: " fmt, __FUNCTION__, ## args);		\
+} while (0)
+#define UM_DEBUG_DENTRY(dentry)		\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG)			\
+		klog_printk_dentry(__FUNCTION__, (dentry));		\
+} while (0)
+#define UM_DEBUG_DCACHE(fmt, args...)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_DCACHE)		\
+		klog_printk("%s: " fmt, __FUNCTION__, ## args);		\
+} while (0)
+#define UM_DEBUG_DCACHE_DENTRY(dentry)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_DCACHE)		\
+		klog_printk_dentry(__FUNCTION__, (dentry));		\
+} while (0)
+#define UM_DEBUG_LOCK(fmt, args...)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOCK)		\
+		klog_printk("%s: " fmt, __FUNCTION__, ## args);		\
+} while (0)
+#define UM_DEBUG_READDIR(fmt, args...)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_READDIR)		\
+		klog_printk("%s: " fmt, __FUNCTION__, ## args);		\
+} while (0)
+#define UM_DEBUG_LOOKUP(fmt, args...)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOOKUP)		\
+		klog_printk("%s: " fmt, __FUNCTION__, ## args);		\
+} while (0)
+#define UM_DEBUG_LOOKUP_DENTRY(dentry)	\
+do {	\
+	if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOOKUP)		\
+		klog_printk_dentry(__FUNCTION__, (dentry));		\
+} while (0)
+
+#else	/* CONFIG_DEBUG_UNION_MOUNT */
+
+#define UM_DEBUG(fmt, args...)			do { /* empty */ } while (0)
+#define UM_DEBUG_DENTRY(fmt, args...)		do { /* empty */ } while (0)
+#define UM_DEBUG_DCACHE(fmt, args...)		do { /* empty */ } while (0)
+#define UM_DEBUG_DCACHE_DENTRY(fmt, args...)	do { /* empty */ } while (0)
+#define UM_DEBUG_LOCK(fmt, args...)		do { /* empty */ } while (0)
+#define UM_DEBUG_READDIR(fmt, args...)		do { /* empty */ } while (0)
+#define UM_DEBUG_LOOKUP(fmt, args...)		do { /* empty */ } while (0)
+#define UM_DEBUG_LOOKUP_DENTRY(fmt, args...)	do { /* empty */ } while (0)
+
+#endif	/* CONFIG_DEBUG_UNION_MOUNT */
+
+#endif	/* __KERNEL__ */
+#endif	/*  __LINUX_UNION_DEBUG_H */
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -393,6 +393,15 @@ config DEBUG_LIST
 
 	  If unsure, say N.
 
+config DEBUG_UNION_MOUNT
+	bool "Debug VFS based union mounts"

Re: [RFC 12/26] ext2 white-out support

2007-07-31 Thread Jan Blunck
On Tue, Jul 31, Josef Sipek wrote:

> On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > Introduce white-out support to ext2.
> 
> I think storing whiteouts on the branches is wrong. It creates all sort of
> nasty cases when people actually try to use unioning. Imagine a (no-so
> unlikely) scenario where you have 2 unions, and they share a branch. If you
> create a whiteout in one union on that shared branch, the whiteout magically
> affects the other union as well! Whiteouts are a union-level construct, and
> therefore storing them at the branch level is wrong.

So you think that just because you mounted the filesystem somewhere else it
should look different? This is what sharing is all about. If you share a
filesystem you also share the removal of objects.

> If you store whiteouts on the branches, you'll probably want readdir to not
> include them. That's relatively cheap if you have a whiteout bit in the
> inode, but I don't think filesystems should be forced to use up rather
> prescious inode bits for whiteouts/opaqueness [1].

How filesystem implement the whiteout filetype is up to them.

> Really the only sane way of keeping track of whiteouts seems some external
> store. We did an experiment with Unionfs, and moving the whiteout handling
> to effectively a "library" that did all the dirty work cleaned up the code
> considerably [2,3].

Haven't checked if you could use ODF for a generic store for filesystems that
couldn't support whiteouts. This might be an interesting idea.

> > Known Bugs:
> > - Needs a reserved inode number for white-outs
> > - S_OPAQUE isn't persistently stored
> 
> Out of curiosity, how do you keep track of opaqueness while the fs is
> mounted?

Its an inode flag (S_OPAQUE).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-07-31 Thread Jan Blunck
On Tue, Jul 31, Andreas Dilger wrote:

> On Jul 31, 2007  09:44 +0200, Jan Blunck wrote:
> > Ok, this is pretty similar to the way I implemented this for tmpfs. The
> > problem is that the union mount code is explicitly checking if the 
> > filesystem
> > is supporting whiteout. I used to use a new filesystem flag (FS_WHITEOUT) 
> > for
> > this but thought that disk filesystem like ext2/3/4 will have problem with
> > that if you mount an old image. So I guess I still need a feature flag.
> 
> You also need whiteout support for extents.  This could be done with
> unwritten extents potentially, or as I previously proposed (RFC) in
> linux-ext4.

Maybe. But this is about something totally different: a whiteout filetype, an
existing file that when it is found make the VFS return -ENOENT.

Cheers,
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 12/26] ext2 white-out support

2007-07-31 Thread Jan Blunck
On Mon, Jul 30, Theodore Tso wrote:

> On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> > Introduce white-out support to ext2.
> > 
> > Known Bugs:
> > - Needs a reserved inode number for white-outs
> 
> You picked different reserved inodes for the ext2 and ext3
> filesystems.  That's good for a NACK right there.  The codepoints
> (i.e., reserved inode numbers, feature bit masks, etc.) for ext2,
> ext3, and ext4 MUST not overlap.  After all, someone might use tune2fs
> -j to convert an ext2 filesystem to ext3, and is it's REALLY BAD that
> you're using a reserved inode of 7 for ext2, and 9 for ext3.

Ouch, right.

> Also, I note that you have created a new INCOMPAT feature flag support
> for whiteouts.  That's really unfortunate; we try to avoid introducing
> incompatible feature flags unless absolutely necessary; note that even
> adding a COMPAT feature flag means that you need a new version of
> e2fsprogs if you want e2fsck to be willing to touch that filesystem.
> 
> So --- if you're looking for a way to add whiteout support to
> ext2/ext3 without needing a feature bit, here's how.  We allocate a
> new inode flag in struct ext3_inode.i_flags:
> 
> #define EXT2_WHTOUT_FL 0x0004
> 
> We also allocate a new field in the ext2 superblock to store the
> "whiteout inode".  (Please coordinate with me so it's a superblock
> field not in use by ext3/ext4, and so it's reserved so that no one
> else uses it.)  The superblock field, call it s_whtout_ino, stores the
> inode number for the "white out inode".
> 
> When you create a new whiteout file, the code checks sb->s_whtout_ino,
> and if it is zero, it allocates a new inode, and creates it as a
> zero-length regular file (i_mode |= S_IFREG) with the EXT2_WHTOUT_FL
> flag set in the inode, and then store the inode number in
> sb->s_whtout_ino.  If sb->s_whtout_ino is non-zero, you must read in
> the inode and make sure that the EXT2_WHTOUT_FL is set.  If it is not,
> then allocate a new whiteout inode as described previously.  Then link
> the inode into the directory as before.
> 
> When reading an inode, if the EXT2_WHTOUT_FL flag is set, then set the
> in-memory mode of the inode to be S_IFWHT.  
> 
> That's pretty much about it.  For cleanliness sake, it would be good
> if ext2_delete_inode clears sb->s_whtout_ino if the last whiteout link
> has been deleted, but it's strictly speaking not necessary.  If you do
> it this way, the filesystem is completely backwards compatible; the
> whiteout files will just appear to links to a normal zero-lenth file.

Ok, this is pretty similar to the way I implemented this for tmpfs. The
problem is that the union mount code is explicitly checking if the filesystem
is supporting whiteout. I used to use a new filesystem flag (FS_WHITEOUT) for
this but thought that disk filesystem like ext2/3/4 will have problem with
that if you mount an old image. So I guess I still need a feature flag.

> I wouldn't bother with setting the directory type field to be DT_WHT,
> given that they will never be returned to userspace anyway.

At the moment I still rely on this for the current readdir implementation.
Viro already said that he doesn't want to see this (the readdir changes) in
the kernel but in userspace.

Thanks,
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 00/26] VFS based Union Mount (V2)

2007-07-30 Thread Jan Blunck
Here is another post of the VFS based union mount implementation. Unlike the
traditional mount which hides the contents of the mount point, union mounts
present the merged view of the mount point and the mounted filesytem.

Recent changes:
- brand new union structure no longer tied to the dentryn, now works with bind
  mounts
- generic part of the whiteout patches extracted
- introduces MS_WHITEOUT to make the white-out patches independant of the
  union-mount stuff
- uses a singleton whiteout inode for the tmpfs filesystem (I need to fix this
  for ext2/3, too)
- renaming files on unions uses copyup now
- rewrote the union mount debugging code: it is now debugfs/relay based.
- random cleanups

I'm able to compile the kernel with this patches applied on a  3 layer union
mount with the seperate layers bind mounted to different locations. I haven't
done any performance tests since I think there is a more important topic
ahead: better readdir() support.

This series is against 2.6.22-rc6-mm1.

Comments are welcome,
Jan

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 08/26] VFS: Export lives_below_in_same_fs()

2007-07-30 Thread Jan Blunck
Export lives_below_in_same_fs() for use in union mount code.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namespace.c|3 ++-
 include/linux/mount.h |1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -793,7 +793,7 @@ static bool permit_mount(struct nameidat
return true;
 }
 
-static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry)
+int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry)
 {
while (1) {
if (d == dentry)
@@ -803,6 +803,7 @@ static int lives_below_in_same_fs(struct
d = d->d_parent;
}
 }
+EXPORT_SYMBOL_GPL(lives_below_in_same_fs);
 
 struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
int flag, uid_t owner)
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -106,6 +106,7 @@ extern void shrink_submounts(struct vfsm
 
 extern spinlock_t vfsmount_lock;
 extern dev_t name_to_dev_t(char *name);
+extern int lives_below_in_same_fs(struct dentry *, struct dentry *);
 
 #endif
 #endif /* _LINUX_MOUNT_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 22/26] union-mount: white-out changes for copy-on-open

2007-07-30 Thread Jan Blunck
When files on an upper layer of the union stack are removed we need to
white-out the removed filename.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c |   46 --
 1 file changed, 44 insertions(+), 2 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2253,6 +2253,13 @@ do_last:
 
/* Negative dentry, just create the file */
if (!path.dentry->d_inode || S_ISWHT(path.dentry->d_inode->i_mode)) {
+   if (path.dentry->d_parent != dir) {
+   dput_path(&path, nd);
+   path.dentry = __lookup_hash_kern(&nd->last, dir, nd);
+   path.mnt = nd->mnt;
+   goto do_last;
+   }
+
error = open_namei_create(nd, &path, flag, mode);
if (error)
goto exit;
@@ -2373,6 +2380,16 @@ int lookup_create(struct nameidata *nd, 
 {
int err = -EEXIST;
 
+   if (is_unionized(nd->dentry, nd->mnt)) {
+   err = union_relookup_topmost(nd, nd->flags & ~LOOKUP_PARENT);
+   if (err) {
+   /* FIXME: This really sucks */
+   mutex_lock_nested(&nd->dentry->d_inode->i_mutex,
+ I_MUTEX_PARENT);
+   goto fail;
+   }
+   }
+
mutex_lock_nested(&nd->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
/*
 * Yucky last component or no last component at all?
@@ -2391,6 +2408,16 @@ int lookup_create(struct nameidata *nd, 
if (err)
goto fail;
 
+   /* Special case - we found a whiteout */
+   if (path->dentry->d_inode && S_ISWHT(path->dentry->d_inode->i_mode)) {
+   if (path->dentry->d_parent != nd->dentry) {
+   dput_path(path, nd);
+   path->dentry = __lookup_hash_kern(&nd->last, nd->dentry,
+ nd);
+   path->mnt = nd->mnt;
+   }
+   }
+
/*
 * Special case - lookup gave negative, but... we had foo/bar/
 * From the vfs_mknod() POV we just have a negative dentry -
@@ -2682,6 +2709,15 @@ static int do_whiteout(struct nameidata 
if (isdir && !directory_is_empty(path->dentry, path->mnt))
goto out;
 
+   mutex_unlock(&nd->dentry->d_inode->i_mutex);
+   err = union_relookup_topmost(nd, nd->flags & ~LOOKUP_PARENT);
+   if (err) {
+   mutex_lock_nested(&nd->dentry->d_inode->i_mutex,
+ I_MUTEX_PARENT);
+   goto out;
+   }
+   mutex_lock_nested(&nd->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+
/* safe the name for a later lookup */
err = -ENOMEM;
name.name = kmalloc(dentry->d_name.len, GFP_KERNEL);
@@ -3012,7 +3048,10 @@ static long do_rmdir(int dfd, const char
error = hash_lookup_union(&nd, &nd.last, &path);
if (error)
goto exit2;
-   error = vfs_rmdir(nd.dentry->d_inode, path.dentry);
+   if (is_unionized(nd.dentry, nd.mnt))
+   error = do_whiteout(&nd, &path, 1);
+   else
+   error = vfs_rmdir(nd.dentry->d_inode, path.dentry);
dput_path(&path, &nd);
 exit2:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
@@ -3091,7 +3130,10 @@ static long do_unlinkat(int dfd, const c
inode = path.dentry->d_inode;
if (inode)
atomic_inc(&inode->i_count);
-   error = vfs_unlink(nd.dentry->d_inode, path.dentry);
+   if (is_unionized(nd.dentry, nd.mnt))
+   error = do_whiteout(&nd, &path, 0);
+   else
+   error = vfs_unlink(nd.dentry->d_inode, path.dentry);
exit2:
dput_path(&path, &nd);
}

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 26/26] union-mount: Debug code

2007-07-30 Thread Jan Blunck
Some debugging code itself.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c|   26 ++
 fs/union.c|   27 +++
 include/linux/namei.h |4 
 3 files changed, 57 insertions(+)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1794,11 +1795,15 @@ int hash_lookup_union(struct nameidata *
struct path safe = { .dentry = nd->dentry, .mnt = nd->mnt };
int res ;
 
+   UM_DEBUG_LOOKUP("name = \"%*s\"\n", name->len, name->name);
+
pathget(&safe);
res = __hash_lookup_topmost(nd, name, path);
if (res)
goto out;
 
+   UM_DEBUG_LOOKUP_DENTRY(path->dentry);
+
/* only directories can be part of a union stack */
if (!path->dentry->d_inode ||
!S_ISDIR(path->dentry->d_inode->i_mode))
@@ -1813,6 +1818,7 @@ int hash_lookup_union(struct nameidata *
goto out;
}
 
+   UM_DEBUG_LOOKUP_DENTRY(path->dentry);
 out:
path_release(nd);
nd->dentry = safe.dentry;
@@ -2765,6 +2771,8 @@ out_freename:
kfree(name.name);
 out:
pathput(&safe);
+   UM_DEBUG("err = %d\n", err);
+   UM_DEBUG_DENTRY(dentry);
return err;
 }
 
@@ -2802,6 +2810,9 @@ int vfs_unlink_whiteout(struct inode *di
}
mutex_unlock(&dentry->d_inode->i_mutex);
 
+   UM_DEBUG("err = %d\n", error);
+   UM_DEBUG_DENTRY(dentry);
+
/*
 * We can call dentry_iput() since nobody could actually do something
 * useful with a whiteout. So dropping the reference to the inode
@@ -3490,6 +3501,10 @@ int vfs_rename_union(struct nameidata *o
struct dentry *dentry;
int error;
 
+   UM_DEBUG_DENTRY(old->dentry);
+   UM_DEBUG_DENTRY(new->dentry);
+/* return -EPERM; */
+
if (old->dentry->d_inode == new->dentry->d_inode)
return 0;
 
@@ -3530,6 +3545,9 @@ int vfs_rename_union(struct nameidata *o
 
/* possibly delete the existing new file */
if ((newnd->dentry == new->dentry->d_parent) && new->dentry->d_inode) {
+   UM_DEBUG("unlink:\n");
+   UM_DEBUG_DENTRY(new->dentry);
+
/* FIXME: inode may be truncated while we hold a lock */
error = vfs_unlink(new_dir, new->dentry);
if (error)
@@ -3540,6 +3558,9 @@ int vfs_rename_union(struct nameidata *o
if (IS_ERR(dentry))
goto freename;
 
+   UM_DEBUG("new target:\n");
+   UM_DEBUG_DENTRY(new->dentry);
+
dput(new->dentry);
new->dentry = dentry;
}
@@ -3554,6 +3575,10 @@ int vfs_rename_union(struct nameidata *o
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto freename;
+
+   UM_DEBUG("whiteout:\n");
+   UM_DEBUG_DENTRY(dentry);
+
error = vfs_whiteout(old_dir, dentry);
dput(dentry);
 
@@ -3567,6 +3592,7 @@ int vfs_rename_union(struct nameidata *o
 */
 freename:
kfree(old_name.name);
+   UM_DEBUG("err = %d\n", error);
return error;
 }
 
--- a/fs/union.c
+++ b/fs/union.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -253,6 +254,9 @@ int append_to_union(struct vfsmount *mnt
 
BUG_ON(!IS_MNT_UNION(mnt));
 
+   UM_DEBUG_DENTRY(dentry);
+   UM_DEBUG_DENTRY(dest_dentry);
+
this = union_alloc(dentry, mnt, dest_dentry, dest_mnt);
if (!this)
return -ENOMEM;
@@ -822,6 +826,8 @@ int union_relookup_topmost(struct nameid
char *kbuf, *name;
struct nameidata this;
 
+   UM_DEBUG_DENTRY(nd->dentry);
+
kbuf = (char *)__get_free_page(GFP_KERNEL);
if (!kbuf)
return -ENOMEM;
@@ -838,6 +844,7 @@ int union_relookup_topmost(struct nameid
path_release(nd);
nd->dentry = this.dentry;
nd->mnt = this.mnt;
+   UM_DEBUG_DENTRY(nd->dentry);
 
/*
 * the nd->flags should be unchanged
@@ -846,6 +853,7 @@ int union_relookup_topmost(struct nameid
nd->um_flags &= ~LAST_LOWLEVEL;
  free_page:
free_page((unsigned long)kbuf);
+   UM_DEBUG("err = %d\n", err);
return err;
 }
 
@@ -895,6 +903,8 @@ struct dentry *union_create_topmost(stru
if (IS_ERR(dentry))
goto out_unlock;
 
+   UM_DEBUG_DENTRY(dentry);
+
switch (mode & S_IFMT) {
case S_IFREG:
/*
@@ -916,6 +926,9 @@ struct dentry *union_create_topmost(stru
dentry = ERR_PTR(res);
goto out_unlock;

[RFC 07/26] VFS: Introduce dput() variante that maintains a kill-list

2007-07-30 Thread Jan Blunck
This patch introduces a new variant of dput(). This becomes necessary to
prevent a recursive call to dput() from the union mount code.

  void __dput(struct dentry *dentry, struct list_head *list);

__dput() works mostly like the original dput() did. The main difference is
that it doesn't do a full d_kill() at the end but puts the dentry on a list as
soon as it isn't reachable anymore. Therefore the union mount code can savely
call __dput() when it wants to get rid of underlying dentry references during
a dput(). After calling __dput() the caller must make sure that on all
dentries __d_kill_final() is called. __d_kill_final() is actually doing the
dentry_iput() and is also dereferencing the parent.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/dcache.c |   60 +++-
 1 file changed, 55 insertions(+), 5 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -129,19 +129,56 @@ static void dentry_iput(struct dentry * 
  *
  * If this is the root of the dentry tree, return NULL.
  */
-static struct dentry *d_kill(struct dentry *dentry)
+static struct dentry *__d_kill(struct dentry *dentry, struct list_head *list)
 {
struct dentry *parent;
 
list_del(&dentry->d_u.d_child);
dentry_stat.nr_dentry--;/* For d_free, below */
-   /*drops the locks, at that point nobody can reach this dentry */
+
+   if (list) {
+   list_del_init(&dentry->d_alias);
+   /* at this point nobody can reach this dentry */
+   list_add(&dentry->d_lru, list);
+   spin_unlock(&dentry->d_lock);
+   spin_unlock(&dcache_lock);
+   return NULL;
+   }
+
+   /* drops the locks, at that point nobody can reach this dentry */
dentry_iput(dentry);
parent = dentry->d_parent;
d_free(dentry);
return dentry == parent ? NULL : parent;
 }
 
+void __dput(struct dentry *, struct list_head *);
+
+static void __d_kill_final(struct dentry *dentry, struct list_head *list)
+{
+   struct dentry *parent = dentry->d_parent;
+   struct inode *inode = dentry->d_inode;
+
+   if (inode) {
+   dentry->d_inode = NULL;
+   if (!inode->i_nlink)
+   fsnotify_inoderemove(inode);
+   if (dentry->d_op && dentry->d_op->d_iput)
+   dentry->d_op->d_iput(dentry, inode);
+   else
+   iput(inode);
+   }
+
+   d_free(dentry);
+   if (dentry != parent)
+   __dput(parent, list);
+}
+
+static struct dentry *d_kill(struct dentry *dentry)
+{
+   return __d_kill(dentry, NULL);
+}
+
 /* 
  * This is dput
  *
@@ -171,7 +208,7 @@ static struct dentry *d_kill(struct dent
  * no dcache lock, please.
  */
 
-void dput(struct dentry *dentry)
+void __dput(struct dentry *dentry, struct list_head *list)
 {
if (!dentry)
return;
@@ -215,14 +252,27 @@ kill_it:
 * delete it from there
 */
if (!list_empty(&dentry->d_lru)) {
-   list_del(&dentry->d_lru);
+   list_del_init(&dentry->d_lru);
dentry_stat.nr_unused--;
}
-   dentry = d_kill(dentry);
+
+   dentry = __d_kill(dentry, list);
if (dentry)
goto repeat;
 }
 
+void dput(struct dentry *dentry)
+{
+   LIST_HEAD(mortuary);
+
+   __dput(dentry, &mortuary);
+   while (!list_empty(&mortuary)) {
+   dentry = list_entry(mortuary.next, struct dentry, d_lru);
+   list_del(&dentry->d_lru);
+   __d_kill_final(dentry, &mortuary);
+   }
+}
+
 /**
  * d_invalidate - invalidate a dentry
  * @dentry: dentry to invalidate

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 12/26] ext2 white-out support

2007-07-30 Thread Jan Blunck
Introduce white-out support to ext2.

Known Bugs:
- Needs a reserved inode number for white-outs
- S_OPAQUE isn't persistently stored

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/ext2/dir.c   |2 ++
 fs/ext2/namei.c |   18 ++
 fs/ext2/super.c |5 -
 include/linux/ext2_fs.h |4 
 4 files changed, 28 insertions(+), 1 deletion(-)

--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -230,6 +230,7 @@ static unsigned char ext2_filetype_table
[EXT2_FT_FIFO]  = DT_FIFO,
[EXT2_FT_SOCK]  = DT_SOCK,
[EXT2_FT_SYMLINK]   = DT_LNK,
+   [EXT2_FT_WHT]   = DT_WHT,
 };
 
 #define S_SHIFT 12
@@ -241,6 +242,7 @@ static unsigned char ext2_type_by_mode[S
[S_IFIFO >> S_SHIFT]= EXT2_FT_FIFO,
[S_IFSOCK >> S_SHIFT]   = EXT2_FT_SOCK,
[S_IFLNK >> S_SHIFT]= EXT2_FT_SYMLINK,
+   [S_IFWHT >> S_SHIFT]= EXT2_FT_WHT,
 };
 
 static inline void ext2_set_de_type(ext2_dirent *de, struct inode *inode)
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -288,6 +288,23 @@ static int ext2_rmdir (struct inode * di
return err;
 }
 
+static int ext2_whiteout(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode;
+   int err;
+
+   inode = ext2_new_inode (dir, S_IFWHT | S_IRUGO);
+   err = PTR_ERR(inode);
+   if (IS_ERR(inode))
+   goto out;
+
+   init_special_inode(inode, inode->i_mode, 0);
+   mark_inode_dirty(inode);
+   err = ext2_add_nondir(dentry, inode);
+out:
+   return err;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
struct inode * new_dir, struct dentry * new_dentry )
 {
@@ -382,6 +399,7 @@ const struct inode_operations ext2_dir_i
.mkdir  = ext2_mkdir,
.rmdir  = ext2_rmdir,
.mknod  = ext2_mknod,
+   .whiteout   = ext2_whiteout,
.rename = ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
.setxattr   = generic_setxattr,
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -752,6 +752,9 @@ static int ext2_fill_super(struct super_
ext2_xip_verify_sb(sb); /* see if bdev supports xip, unset
EXT2_MOUNT_XIP if not */
 
+   if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_WHITEOUT))
+   sb->s_flags |= MS_WHITEOUT;
+
if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV &&
(EXT2_HAS_COMPAT_FEATURE(sb, ~0U) ||
 EXT2_HAS_RO_COMPAT_FEATURE(sb, ~0U) ||
@@ -1299,7 +1302,7 @@ static struct file_system_type ext2_fs_t
.name   = "ext2",
.get_sb = ext2_get_sb,
.kill_sb= kill_block_super,
-   .fs_flags   = FS_REQUIRES_DEV,
+   .fs_flags   = FS_REQUIRES_DEV | FS_WHT,
 };
 
 static int __init init_ext2_fs(void)
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -61,6 +61,7 @@
 #define EXT2_ROOT_INO   2  /* Root inode */
 #define EXT2_BOOT_LOADER_INO5  /* Boot loader inode */
 #define EXT2_UNDEL_DIR_INO  6  /* Undelete directory inode */
+#define EXT2_WHT_INO7  /* Whiteout inode */
 
 /* First non-reserved inode for old ext2 filesystems */
 #define EXT2_GOOD_OLD_FIRST_INO11
@@ -479,10 +480,12 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER  0x0004
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV  0x0008
 #define EXT2_FEATURE_INCOMPAT_META_BG  0x0010
+#define EXT2_FEATURE_INCOMPAT_WHITEOUT 0x0020
 #define EXT2_FEATURE_INCOMPAT_ANY  0x
 
 #define EXT2_FEATURE_COMPAT_SUPP   EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE| \
+EXT2_FEATURE_INCOMPAT_WHITEOUT| \
 EXT2_FEATURE_INCOMPAT_META_BG)
 #define EXT2_FEATURE_RO_COMPAT_SUPP(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
@@ -549,6 +552,7 @@ enum {
EXT2_FT_FIFO,
EXT2_FT_SOCK,
EXT2_FT_SYMLINK,
+   EXT2_FT_WHT,
EXT2_FT_MAX
 };
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 03/26] VFS: Make lookup_hash() return a struct path

2007-07-30 Thread Jan Blunck
This patch changes lookup_hash() into returning a struct path.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c |  113 ++---
 1 file changed, 57 insertions(+), 56 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1297,27 +1297,27 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static inline struct dentry * __lookup_hash(struct qstr *name, struct dentry 
*base, struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+  struct path *path)
 {
-   struct dentry *dentry;
struct inode *inode;
int err;
 
-   inode = base->d_inode;
+   inode = nd->dentry->d_inode;
 
err = permission(inode, MAY_EXEC, nd);
-   dentry = ERR_PTR(err);
if (err)
goto out;
 
-   dentry = __lookup_hash_kern(name, base, nd);
+   path->mnt =  nd->mnt;
+   path->dentry = __lookup_hash_kern(name, nd->dentry, nd);
+   if (IS_ERR(path->dentry)) {
+   err = PTR_ERR(path->dentry);
+   path->dentry = NULL;
+   path->mnt = NULL;
+   }
 out:
-   return dentry;
-}
-
-static struct dentry *lookup_hash(struct nameidata *nd)
-{
-   return __lookup_hash(&nd->last, nd->dentry, nd);
+   return err;
 }
 
 /* SMP-safe */
@@ -1351,7 +1351,10 @@ struct dentry *lookup_one_len_nd(const c
err = __lookup_one_len(name, &this, base, len);
if (err)
return ERR_PTR(err);
-   return __lookup_hash(&this, base, nd);
+   err = permission(base->d_inode, MAY_EXEC, nd);
+   if (err)
+   return ERR_PTR(err);
+   return __lookup_hash_kern(&this, base, nd);
 }
 
 struct dentry *lookup_one_len_kern(const char *name, struct dentry *base, int 
len)
@@ -1709,12 +1712,10 @@ int open_namei(int dfd, const char *path
dir = nd->dentry;
nd->flags &= ~LOOKUP_PARENT;
mutex_lock(&dir->d_inode->i_mutex);
-   path.dentry = lookup_hash(nd);
-   path.mnt = nd->mnt;
+   error = lookup_hash(nd, &nd->last, &path);
 
 do_last:
-   error = PTR_ERR(path.dentry);
-   if (IS_ERR(path.dentry)) {
+   if (error) {
mutex_unlock(&dir->d_inode->i_mutex);
goto exit;
}
@@ -1817,8 +1818,7 @@ do_link:
}
dir = nd->dentry;
mutex_lock(&dir->d_inode->i_mutex);
-   path.dentry = lookup_hash(nd);
-   path.mnt = nd->mnt;
+   error = lookup_hash(nd, &nd->last, &path);
__putname(nd->last.name);
goto do_last;
 }
@@ -1835,7 +1835,8 @@ do_link:
  */
 struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 {
-   struct dentry *dentry = ERR_PTR(-EEXIST);
+   struct path path = { .dentry = ERR_PTR(-EEXIST) } ;
+   int err;
 
mutex_lock_nested(&nd->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
/*
@@ -1851,9 +1852,11 @@ struct dentry *lookup_create(struct name
/*
 * Do the final lookup.
 */
-   dentry = lookup_hash(nd);
-   if (IS_ERR(dentry))
+   err = lookup_hash(nd, &nd->last, &path);
+   if (err) {
+   path.dentry = ERR_PTR(err);
goto fail;
+   }
 
/*
 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1861,14 +1864,16 @@ struct dentry *lookup_create(struct name
 * all is fine. Let's be bastards - you had / on the end, you've
 * been asking for (non-existent) directory. -ENOENT for you.
 */
-   if (!is_dir && nd->last.name[nd->last.len] && !dentry->d_inode)
+   if (!is_dir && nd->last.name[nd->last.len] && !path.dentry->d_inode)
goto enoent;
-   return dentry;
+   if (nd->mnt != path.mnt)
+   mntput(path.mnt);
+   return path.dentry;
 enoent:
-   dput(dentry);
-   dentry = ERR_PTR(-ENOENT);
+   dput_path(&path, nd);
+   path.dentry = ERR_PTR(-ENOENT);
 fail:
-   return dentry;
+   return path.dentry;
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -2075,7 +2080,7 @@ static long do_rmdir(int dfd, const char
 {
int error = 0;
char * name;
-   struct dentry *dentry;
+   struct path path;
struct nameidata nd;
 
name = getname(pathname);
@@ -2098,12 +2103,11 @@ static long do_rmdir(int dfd, const char
goto exit1;
}
mutex_lock_nested(&nd.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-   dentry = lookup_hash(&nd);
-   error = PTR_ERR(dentry);
-   if (IS_ERR(dentry))
+   error = lookup_hash(&nd, &nd.last, &path);
+   if (error)
goto exit2;
-   error = vfs_r

[RFC 13/26] ext3 whiteout support

2007-07-30 Thread Jan Blunck
Introduce whiteout support for ext3.

- Needs a reserved inode number for white-outs
- S_OPAQUE isn't persistently stored

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/ext3/dir.c   |3 ++-
 fs/ext3/namei.c |   33 +
 fs/ext3/super.c |5 -
 include/linux/ext3_fs.h |5 -
 4 files changed, 43 insertions(+), 3 deletions(-)

--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -29,7 +29,8 @@
 #include 
 
 static unsigned char ext3_filetype_table[] = {
-   DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+   DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK,
+   DT_WHT
 };
 
 static int ext3_readdir(struct file *, void *, filldir_t);
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -1081,6 +1081,7 @@ static unsigned char ext3_type_by_mode[S
[S_IFIFO >> S_SHIFT]= EXT3_FT_FIFO,
[S_IFSOCK >> S_SHIFT]   = EXT3_FT_SOCK,
[S_IFLNK >> S_SHIFT]= EXT3_FT_SYMLINK,
+   [S_IFWHT >> S_SHIFT]= EXT3_FT_WHT,
 };
 
 static inline void ext3_set_de_type(struct super_block *sb,
@@ -2070,6 +2071,37 @@ end_rmdir:
return retval;
 }
 
+static int ext3_whiteout(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode;
+   int err, retries = 0;
+   handle_t *handle;
+
+retry:
+   handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
+   EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+   2*EXT3_QUOTA_INIT_BLOCKS(dir->i_sb));
+   if (IS_ERR(handle))
+   return PTR_ERR(handle);
+
+   if (IS_DIRSYNC(dir))
+   handle->h_sync = 1;
+
+   inode = ext3_new_inode (handle, dir, S_IFWHT | S_IRUGO);
+   err = PTR_ERR(inode);
+   if (IS_ERR(inode))
+   goto out_stop;
+
+   init_special_inode(inode, inode->i_mode, 0);
+   err = ext3_add_nondir(handle, dentry, inode);
+
+out_stop:
+   ext3_journal_stop(handle);
+   if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
+   goto retry;
+   return err;
+}
+
 static int ext3_unlink(struct inode * dir, struct dentry *dentry)
 {
int retval;
@@ -2387,6 +2419,7 @@ const struct inode_operations ext3_dir_i
.mkdir  = ext3_mkdir,
.rmdir  = ext3_rmdir,
.mknod  = ext3_mknod,
+   .whiteout   = ext3_whiteout,
.rename = ext3_rename,
.setattr= ext3_setattr,
 #ifdef CONFIG_EXT3_FS_XATTR
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1500,6 +1500,9 @@ static int ext3_fill_super (struct super
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
((sbi->s_mount_opt & EXT3_MOUNT_POSIX_ACL) ? MS_POSIXACL : 0);
 
+   if (EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_WHITEOUT))
+   sb->s_flags |= MS_WHITEOUT;
+
if (le32_to_cpu(es->s_rev_level) == EXT3_GOOD_OLD_REV &&
(EXT3_HAS_COMPAT_FEATURE(sb, ~0U) ||
 EXT3_HAS_RO_COMPAT_FEATURE(sb, ~0U) ||
@@ -2764,7 +2767,7 @@ static struct file_system_type ext3_fs_t
.name   = "ext3",
.get_sb = ext3_get_sb,
.kill_sb= kill_block_super,
-   .fs_flags   = FS_REQUIRES_DEV,
+   .fs_flags   = FS_REQUIRES_DEV | FS_WHT,
 };
 
 static int __init init_ext3_fs(void)
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -63,6 +63,7 @@
 #define EXT3_UNDEL_DIR_INO  6  /* Undelete directory inode */
 #define EXT3_RESIZE_INO 7  /* Reserved group descriptors 
inode */
 #define EXT3_JOURNAL_INO8  /* Journal inode */
+#define EXT3_WHT_INO9  /* Whiteout inode */
 
 /* First non-reserved inode for old ext3 filesystems */
 #define EXT3_GOOD_OLD_FIRST_INO11
@@ -582,6 +583,7 @@ static inline int ext3_valid_inum(struct
 #define EXT3_FEATURE_INCOMPAT_RECOVER  0x0004 /* Needs recovery */
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV  0x0008 /* Journal device */
 #define EXT3_FEATURE_INCOMPAT_META_BG  0x0010
+#define EXT3_FEATURE_INCOMPAT_WHITEOUT 0x0020
 
 #define EXT3_FEATURE_COMPAT_SUPP   EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT3_FEATURE_INCOMPAT_SUPP (EXT3_FEATURE_INCOMPAT_FILETYPE| \
@@ -648,8 +650,9 @@ struct ext3_dir_entry_2 {
 #define EXT3_FT_FIFO   5
 #define EXT3_FT_SOCK   6
 #define EXT3_FT_SYMLINK7
+#define EXT3_FT_WHT8
 
-#define EXT3_FT_MAX8
+#define EXT3_FT_MAX9
 
 /*
  * EXT3_DIR_PAD defines the directory entries boundaries

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 23/26] union-mount: copyup on rename

2007-07-30 Thread Jan Blunck
Add copyup renaming of regular files on union mounts. Directories are still
lazyly copied with the help of user-space.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c |  133 -
 fs/union.c |8 ++-
 2 files changed, 129 insertions(+), 12 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1491,6 +1491,8 @@ static int fastcall do_path_lookup(int d
nd->mnt = mntget(fs->pwdmnt);
nd->dentry = dget(fs->pwd);
read_unlock(&fs->lock);
+   /* Force a union_relookup() */
+   nd->um_flags = LAST_LOWLEVEL;
} else {
struct dentry *dentry;
 
@@ -3478,6 +3480,97 @@ int vfs_rename(struct inode *old_dir, st
return error;
 }
 
+int vfs_rename_union(struct nameidata *oldnd, struct path *old,
+struct nameidata *newnd, struct path *new)
+{
+   struct inode *old_dir = oldnd->dentry->d_inode;
+   struct inode *new_dir = newnd->dentry->d_inode;
+   struct qstr old_name;
+   char *name;
+   struct dentry *dentry;
+   int error;
+
+   if (old->dentry->d_inode == new->dentry->d_inode)
+   return 0;
+
+   error = may_whiteout(old->dentry, 0);
+   if (error)
+   return error;
+   if (!old_dir->i_op || !old_dir->i_op->whiteout)
+   return -EPERM;
+
+   if (!new->dentry->d_inode)
+   error = may_create(new_dir, new->dentry, NULL);
+   else
+   error = may_delete(new_dir, new->dentry, 0);
+   if (error)
+   return error;
+
+   DQUOT_INIT(old_dir);
+   DQUOT_INIT(new_dir);
+
+   error = security_inode_rename(old_dir, old->dentry,
+ new_dir, new->dentry);
+   if (error)
+   return error;
+
+   error = -EBUSY;
+   if (d_mountpoint(old->dentry) || d_mountpoint(new->dentry))
+   return error;
+
+   error = -ENOMEM;
+   name = kmalloc(old->dentry->d_name.len, GFP_KERNEL);
+   if (!name)
+   return error;
+   strncpy(name, old->dentry->d_name.name, old->dentry->d_name.len);
+   name[old->dentry->d_name.len] = 0;
+   old_name.len = old->dentry->d_name.len;
+   old_name.hash = old->dentry->d_name.hash;
+   old_name.name = name;
+
+   /* possibly delete the existing new file */
+   if ((newnd->dentry == new->dentry->d_parent) && new->dentry->d_inode) {
+   /* FIXME: inode may be truncated while we hold a lock */
+   error = vfs_unlink(new_dir, new->dentry);
+   if (error)
+   goto freename;
+
+   dentry = __lookup_hash_kern(&new->dentry->d_name,
+   newnd->dentry, newnd);
+   if (IS_ERR(dentry))
+   goto freename;
+
+   dput(new->dentry);
+   new->dentry = dentry;
+   }
+
+   /* copyup to the new file */
+   error = __union_copyup(old, newnd, new);
+   if (error)
+   goto freename;
+
+   /* whiteout the old file */
+   dentry = __lookup_hash_kern(&old_name, oldnd->dentry, oldnd);
+   error = PTR_ERR(dentry);
+   if (IS_ERR(dentry))
+   goto freename;
+   error = vfs_whiteout(old_dir, dentry);
+   dput(dentry);
+
+   /* FIXME: This is acutally unlink() && create() ... */
+/*
+   if (!error) {
+   const char *new_name = old_dentry->d_name.name;
+   fsnotify_move(old_dir, new_dir, old_name.name, new_name, 0,
+ new_dentry->d_inode, old_dentry->d_inode);
+   }
+*/
+freename:
+   kfree(old_name.name);
+   return error;
+}
+
+
 static int do_rename(int olddfd, const char *oldname,
int newdfd, const char *newname)
 {
@@ -3495,10 +3588,7 @@ static int do_rename(int olddfd, const c
if (error)
goto exit1;
 
-   error = -EXDEV;
-   if (oldnd.mnt != newnd.mnt)
-   goto exit2;
-
+lock:
old_dir = oldnd.dentry;
error = -EBUSY;
if (oldnd.last_type != LAST_NORM)
@@ -3536,15 +3626,40 @@ static int do_rename(int olddfd, const c
error = -ENOTEMPTY;
if (new.dentry == trap)
goto exit5;
-   /* renaming on unions is done by the user-space */
+   /* renaming of directories on unions is done by the user-space */
error = -EXDEV;
-   if (is_unionized(oldnd.dentry, oldnd.mnt))
+   if (is_unionized(oldnd.dentry, oldnd.mnt) &&
+   S_ISDIR(old.dentry->d_inode->i_mode))
goto exit5;
-   if (is_unionized(newnd.dentry, newnd.mnt))
+   /* re

[RFC 04/26] VFS: Make lookup_create() return a struct path

2007-07-30 Thread Jan Blunck
This patch changes lookup_create() into returning a struct path.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 arch/powerpc/platforms/cell/spufs/inode.c |   15 ++
 fs/namei.c|   75 +-
 include/linux/dcache.h|1 
 include/linux/namei.h |1 
 net/unix/af_unix.c|   17 +++---
 5 files changed, 50 insertions(+), 59 deletions(-)

--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -456,7 +456,7 @@ static struct file_system_type spufs_typ
 
 long spufs_create(struct nameidata *nd, unsigned int flags, mode_t mode)
 {
-   struct dentry *dentry;
+   struct path path;
int ret;
 
ret = -EINVAL;
@@ -475,26 +475,25 @@ long spufs_create(struct nameidata *nd, 
goto out;
}
 
-   dentry = lookup_create(nd, 1);
-   ret = PTR_ERR(dentry);
-   if (IS_ERR(dentry))
+   ret = lookup_create(nd, 1, &path);
+   if (ret)
goto out_dir;
 
ret = -EEXIST;
-   if (dentry->d_inode)
+   if (path.dentry->d_inode)
goto out_dput;
 
mode &= ~current->fs->umask;
 
if (flags & SPU_CREATE_GANG)
return spufs_create_gang(nd->dentry->d_inode,
-   dentry, nd->mnt, mode);
+path.dentry, path.mnt, mode);
else
return spufs_create_context(nd->dentry->d_inode,
-   dentry, nd->mnt, flags, mode);
+   path.dentry, path.mnt, flags, mode);
 
 out_dput:
-   dput(dentry);
+   dput_path(&path, nd);
 out_dir:
mutex_unlock(&nd->dentry->d_inode->i_mutex);
 out:
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1833,10 +1833,9 @@ do_link:
  *
  * Returns with nd->dentry->d_inode->i_mutex locked.
  */
-struct dentry *lookup_create(struct nameidata *nd, int is_dir)
+int lookup_create(struct nameidata *nd, int is_dir, struct path *path)
 {
-   struct path path = { .dentry = ERR_PTR(-EEXIST) } ;
-   int err;
+   int err = -EEXIST;
 
mutex_lock_nested(&nd->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
/*
@@ -1852,11 +1851,9 @@ struct dentry *lookup_create(struct name
/*
 * Do the final lookup.
 */
-   err = lookup_hash(nd, &nd->last, &path);
-   if (err) {
-   path.dentry = ERR_PTR(err);
+   err = lookup_hash(nd, &nd->last, path);
+   if (err)
goto fail;
-   }
 
/*
 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1864,16 +1861,14 @@ struct dentry *lookup_create(struct name
 * all is fine. Let's be bastards - you had / on the end, you've
 * been asking for (non-existent) directory. -ENOENT for you.
 */
-   if (!is_dir && nd->last.name[nd->last.len] && !path.dentry->d_inode)
+   if (!is_dir && nd->last.name[nd->last.len] && !path->dentry->d_inode)
goto enoent;
-   if (nd->mnt != path.mnt)
-   mntput(path.mnt);
-   return path.dentry;
+   return 0;
 enoent:
-   dput_path(&path, nd);
-   path.dentry = ERR_PTR(-ENOENT);
+   dput_path(path, nd);
+   err = -ENOENT;
 fail:
-   return path.dentry;
+   return err;
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -1906,7 +1901,7 @@ asmlinkage long sys_mknodat(int dfd, con
 {
int error = 0;
char * tmp;
-   struct dentry * dentry;
+   struct path path;
struct nameidata nd;
 
if (S_ISDIR(mode))
@@ -1918,22 +1913,23 @@ asmlinkage long sys_mknodat(int dfd, con
error = do_path_lookup(dfd, tmp, LOOKUP_PARENT, &nd);
if (error)
goto out;
-   dentry = lookup_create(&nd, 0);
-   error = PTR_ERR(dentry);
+   error = lookup_create(&nd, 0, &path);
 
if (!IS_POSIXACL(nd.dentry->d_inode))
mode &= ~current->fs->umask;
-   if (!IS_ERR(dentry)) {
+   if (!error) {
switch (mode & S_IFMT) {
case 0: case S_IFREG:
-   error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
+   error = vfs_create(nd.dentry->d_inode, path.dentry,
+  mode, &nd);
break;
case S_IFCHR: case S_IFBLK:
-   error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
-   new_decode_dev(dev));
+   error = vfs_mknod(nd.dentry->d_inode, path.dentry,
+ mode, new_dec

[RFC 11/26] tmpfs white-out support

2007-07-30 Thread Jan Blunck
Introduce white-out support to tmpfs.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/linux/shmem_fs.h |1 
 mm/shmem.c   |   54 +++
 2 files changed, 55 insertions(+)

--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -33,6 +33,7 @@ struct shmem_sb_info {
int policy; /* Default NUMA memory alloc policy */
nodemask_t policy_nodes;/* nodemask for preferred and bind */
spinlock_tstat_lock;
+   struct inode *whiteout_inode;
 };
 
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1784,6 +1784,42 @@ static int shmem_create(struct inode *di
 }
 
 /*
+ * This is the whiteout support for tmpfs. It uses one singleton whiteout
+ * inode per superblock thus it is very similar to shmem_link().
+ */
+static int shmem_whiteout(struct inode *dir, struct dentry *dentry)
+{
+   struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+   struct inode *inode = sbinfo->whiteout_inode;
+
+   if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+   return -EPERM;
+
+   /*
+* No ordinary (disk based) filesystem counts whiteouts as inodes;
+* but each new link needs a new dentry, pinning lowmem, and
+* tmpfs dentries cannot be pruned until they are unlinked.
+*/
+   if (sbinfo->max_inodes) {
+   spin_lock(&sbinfo->stat_lock);
+   if (!sbinfo->free_inodes) {
+   spin_unlock(&sbinfo->stat_lock);
+   return -ENOSPC;
+   }
+   sbinfo->free_inodes--;
+   spin_unlock(&sbinfo->stat_lock);
+   }
+
+   dir->i_size += BOGO_DIRENT_SIZE;
+   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+   inc_nlink(inode);
+   atomic_inc(&inode->i_count);/* New dentry reference */
+   dget(dentry);   /* Extra pinning count for the created dentry */
+   d_instantiate(dentry, inode);
+   return 0;
+}
+
+/*
  * Link a file..
  */
 static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct 
dentry *dentry)
@@ -2231,6 +2267,9 @@ out:
 
 static void shmem_put_super(struct super_block *sb)
 {
+   struct shmem_sb_info *sbinfo = sb->s_fs_info;
+
+   iput(sbinfo->whiteout_inode);
kfree(sb->s_fs_info);
sb->s_fs_info = NULL;
 }
@@ -2305,6 +2344,19 @@ static int shmem_fill_super(struct super
if (!root)
goto failed_iput;
sb->s_root = root;
+
+#ifdef CONFIG_TMPFS
+   if (!(sb->s_flags & MS_NOUSER)) {
+   inode = shmem_get_inode(sb, S_IRUGO | S_IWUGO | S_IFWHT, 0);
+   if (!inode) {
+   dput(root);
+   goto failed;
+   }
+   sbinfo->whiteout_inode = inode;
+   sb->s_flags |= MS_WHITEOUT;
+   }
+#endif
+
return 0;
 
 failed_iput:
@@ -2410,6 +2462,7 @@ static const struct inode_operations shm
.rmdir  = shmem_rmdir,
.mknod  = shmem_mknod,
.rename = shmem_rename,
+   .whiteout   = shmem_whiteout,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
.setattr= shmem_notify_change,
@@ -2464,6 +2517,7 @@ static struct file_system_type tmpfs_fs_
.name   = "tmpfs",
.get_sb = shmem_get_sb,
.kill_sb= kill_litter_super,
+   .fs_flags   = FS_WHT,
 };
 static struct vfsmount *shm_mnt;
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 06/26] VFS: Make real_lookup() return a struct path

2007-07-30 Thread Jan Blunck
This patch changes real_lookup() into returning a struct path.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c |   77 ++---
 1 file changed, 48 insertions(+), 29 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -462,10 +462,11 @@ ok:
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, 
struct nameidata *nd)
+static int real_lookup(struct nameidata *nd, struct qstr *name,
+  struct path *path)
 {
-   struct dentry * result;
-   struct inode *dir = parent->d_inode;
+   struct inode *dir = nd->dentry->d_inode;
+   int res = 0;
 
mutex_lock(&dir->i_mutex);
/*
@@ -482,19 +483,27 @@ static struct dentry * real_lookup(struc
 *
 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 */
-   result = d_lookup(parent, name);
-   if (!result) {
-   struct dentry * dentry = d_alloc(parent, name);
-   result = ERR_PTR(-ENOMEM);
+   path->dentry = d_lookup(nd->dentry, name);
+   path->mnt = nd->mnt;
+   if (!path->dentry) {
+   struct dentry *dentry = d_alloc(nd->dentry, name);
if (dentry) {
-   result = dir->i_op->lookup(dir, dentry, nd);
-   if (result)
+   path->dentry = dir->i_op->lookup(dir, dentry, nd);
+   if (path->dentry) {
dput(dentry);
-   else
-   result = dentry;
+   if (IS_ERR(path->dentry)) {
+   res = PTR_ERR(path->dentry);
+   path->dentry = NULL;
+   path->mnt = NULL;
+   }
+   } else
+   path->dentry = dentry;
+   } else {
+   res = -ENOMEM;
+   path->mnt = NULL;
}
mutex_unlock(&dir->i_mutex);
-   return result;
+   return res;
}
 
/*
@@ -502,12 +511,20 @@ static struct dentry * real_lookup(struc
 * we waited on the semaphore. Need to revalidate.
 */
mutex_unlock(&dir->i_mutex);
-   if (result->d_op && result->d_op->d_revalidate) {
-   result = do_revalidate(result, nd);
-   if (!result)
-   result = ERR_PTR(-ENOENT);
+   if (path->dentry->d_op && path->dentry->d_op->d_revalidate) {
+   path->dentry = do_revalidate(path->dentry, nd);
+   if (!path->dentry) {
+   res = -ENOENT;
+   path->mnt = NULL;
+   }
+   if (IS_ERR(path->dentry)) {
+   res = PTR_ERR(path->dentry);
+   path->dentry = NULL;
+   path->mnt = NULL;
+   }
}
-   return result;
+
+   return res;
 }
 
 static int __emul_lookup_dentry(const char *, struct nameidata *);
@@ -748,35 +765,37 @@ static __always_inline void follow_dotdo
 static int do_lookup(struct nameidata *nd, struct qstr *name,
 struct path *path)
 {
-   struct vfsmount *mnt = nd->mnt;
-   struct dentry *dentry = __d_lookup(nd->dentry, name);
+   int err;
 
-   if (!dentry)
+   path->dentry = __d_lookup(nd->dentry, name);
+   path->mnt = nd->mnt;
+   if (!path->dentry)
goto need_lookup;
-   if (dentry->d_op && dentry->d_op->d_revalidate)
+   if (path->dentry->d_op && path->dentry->d_op->d_revalidate)
goto need_revalidate;
+
 done:
-   path->mnt = mnt;
-   path->dentry = dentry;
__follow_mount(path);
return 0;
 
 need_lookup:
-   dentry = real_lookup(nd->dentry, name, nd);
-   if (IS_ERR(dentry))
+   err = real_lookup(nd, name, path);
+   if (err)
goto fail;
goto done;
 
 need_revalidate:
-   dentry = do_revalidate(dentry, nd);
-   if (!dentry)
+   path->dentry = do_revalidate(path->dentry, nd);
+   if (!path->dentry)
goto need_lookup;
-   if (IS_ERR(dentry))
+   if (IS_ERR(path->dentry)) {
+   err = PTR_ERR(path->dentry);
goto fail;
+   }
goto done;
 
 fail:
-   return PTR_ERR(dentry);
+   return err;
 }
 
 /*

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 24/26] union-mount: dont report EROFS for union mounts

2007-07-30 Thread Jan Blunck
SuS v2 requires we report a read only fs too. For union-mounts this is a very
expensive check. So I'm lazy and just disable the check if we are on a lower
layer of an union.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/open.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/open.c
+++ b/fs/open.c
@@ -483,7 +483,7 @@ asmlinkage long sys_faccessat(int dfd, c
   special_file(nd.dentry->d_inode->i_mode))
goto out_path_release;
 
-   if(IS_RDONLY(nd.dentry->d_inode))
+   if (!(nd.um_flags & LAST_LOWLEVEL) && IS_RDONLY(nd.dentry->d_inode))
res = -EROFS;
 
 out_path_release:

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 25/26] union-mount: Debug Infrastructure

2007-07-30 Thread Jan Blunck
This adds debugfs/relay based debugging infrastructure helpful when doing
development of the union-mount code itself. The debgging output can be enabled
during runtime by:

 echo 1 > /proc/sys/fs/union-debug

This registers the relayfs files where the debug code is writing its output
to. There are different levels of debugging output available which can be ORed
together. For the valid sysctl values see include/linux/union_debug.h.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/linux/union_debug.h |   91 ++
 lib/Kconfig.debug   |9 +
 lib/Makefile|2 
 lib/union_debug.c   |  268 
 4 files changed, 370 insertions(+)

--- /dev/null
+++ b/include/linux/union_debug.h
@@ -0,0 +1,91 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *   Author(s): Jan Blunck ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef __LINUX_UNION_DEBUG_H
+#define __LINUX_UNION_DEBUG_H
+
+#ifdef __KERNEL__
+
+#ifdef CONFIG_DEBUG_UNION_MOUNT
+
+#include 
+
+/* This is taken from klog debugging facility */
+extern void klog(const void *data, int len);
+extern void klog_printk(const char *fmt, ...);
+extern void klog_printk_dentry(const char *func, struct dentry *dentry);
+
+extern int sysctl_union_debug;
+
+#define UNION_MOUNT_DEBUG  1
+#define UNION_MOUNT_DEBUG_DCACHE   2
+#define UNION_MOUNT_DEBUG_LOCK 4
+#define UNION_MOUNT_DEBUG_READDIR  8
+#define UNION_MOUNT_DEBUG_LOOKUP   16
+
+#define UM_DEBUG(fmt, args...) \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG) \
+   klog_printk("%s: " fmt, __FUNCTION__, ## args); \
+} while (0)
+#define UM_DEBUG_DENTRY(dentry)
\
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG) \
+   klog_printk_dentry(__FUNCTION__, (dentry)); \
+} while (0)
+#define UM_DEBUG_DCACHE(fmt, args...)  \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_DCACHE)  \
+   klog_printk("%s: " fmt, __FUNCTION__, ## args); \
+} while (0)
+#define UM_DEBUG_DCACHE_DENTRY(dentry) \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_DCACHE)  \
+   klog_printk_dentry(__FUNCTION__, (dentry)); \
+} while (0)
+#define UM_DEBUG_LOCK(fmt, args...)\
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOCK)\
+   klog_printk("%s: " fmt, __FUNCTION__, ## args); \
+} while (0)
+#define UM_DEBUG_READDIR(fmt, args...) \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_READDIR) \
+   klog_printk("%s: " fmt, __FUNCTION__, ## args); \
+} while (0)
+#define UM_DEBUG_LOOKUP(fmt, args...)  \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOOKUP)  \
+   klog_printk("%s: " fmt, __FUNCTION__, ## args); \
+} while (0)
+#define UM_DEBUG_LOOKUP_DENTRY(dentry) \
+do {   \
+   if (sysctl_union_debug & UNION_MOUNT_DEBUG_LOOKUP)  \
+   klog_printk_dentry(__FUNCTION__, (dentry)); \
+} while (0)
+
+#else  /* CONFIG_DEBUG_UNION_MOUNT */
+
+#define UM_DEBUG(fmt, args...) do { /* empty */ } while (0)
+#define UM_DEBUG_DENTRY(fmt, args...)  do { /* empty */ } while (0)
+#define UM_DEBUG_DCACHE(fmt, args...)  do { /* empty */ } while (0)
+#define UM_DEBUG_DCACHE_DENTRY(fmt, args...)   do { /* empty */ } while (0)
+#define UM_DEBUG_LOCK(fmt, args...)do { /* empty */ } while (0)
+#define UM_DEBUG_READDIR(fmt, args...) do { /* empty */ } while (0)
+#define UM_DEBUG_LOOKUP_DENTRY(fmt, args...)  

[RFC 14/26] union-mount: Documentation

2007-07-30 Thread Jan Blunck
Add simple documentation about union mounting in general and this
implementation in specific.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 Documentation/filesystems/union-mounts.txt |  172 +
 1 file changed, 172 insertions(+)

--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,172 @@
+VFS based Union Mounts
+--
+
+ 1. What are "Union Mounts"
+ 2. The Union Stack
+ 3. The White-out Filetype
+ 4. Renaming Unions
+ 5. Directory Reading
+ 6. Known Problems
+ 7. References
+
+---
+
+1. What are "Union Mounts"
+==
+
+Please note: this is NOT about UnionFS and it is NOT derived work!
+
+Traditionally the mount operation is opaque, which means that the content of
+the mount point, the directory where the file system is mounted on, is hidden
+by the content of the mounted file system's root directory until the file
+system is unmounted again. Unlike the traditional UNIX mount mechanism, that
+hides the contents of the mount point, a union mount presents a view as if
+both filesystems are merged together. Although only the topmost layer of the
+mount stack can be altered, it appears as if transparent file system mounts
+allow any file to be created, modified or deleted.
+
+Most people know the concepts and features of union mounts from other
+operating systems like Sun's Translucent Filesystem, Plan9 or BSD.
+
+Here are the key features of this implementation:
+- completely VFS based
+- does not change the namespace stacking
+- directory listings have duplicate entries removed
+- writable unions: only the topmost file system layer may be writable
+- writable unions: new white-out filetype handled inside the kernel
+
+---
+
+2. The Union Stack
+==
+
+The mounted file systems are organized in the "file system hierarchy" (tree of
+vfsmount structures), which keeps track about the stacking of file systems
+upon each other. The per-directory view on the file system hierarchy is called
+"mount stack" and reflects the order of file systems, which are mounted on a
+specific directory.
+
+Union mounts present a single unified view of the contents of two or more file
+systems as if they are merged together. Since the information which file
+system objects are part of a unified view is not directly available from the
+file system hierachy there is a need for a new structure. The file system
+objects, which are part of a unified view are ordered in a so-called "union
+stack". Only directoties can be part of a unified view.
+
+The link between two layers of the union stack is maintained using the
+union_mount structure (#include ):
+
+struct union_mount {
+   atomic_t u_count;   /* reference count */
+   struct mutex u_mutex;
+   struct list_head u_unions;  /* list head for d_unions */
+   struct hlist_node u_hash;   /* list head for seaching */
+   struct hlist_node u_rhash;  /* list head for reverse seaching */
+
+   struct path u_this; /* this is me */
+   struct path u_next; /* this is what I overlay */
+};
+
+The union_mount structure holds a reference (dget,mntget) to the next lower
+layer of the union stack. Since a dentry can be part of multiple unions
+(e.g. with bind mounts) they are tied together via the d_unions field of the
+dentry structure.
+
+All union_mount structures are cached in two hash tables, one for lookups of
+the next lower layer of the union stack and one for reverse lookups of the
+next upper layer of the union stack. The reverse lookup is necessary to
+resolve CWD relative path lookups. For calculation of the hash value, the
+(dentry,vfsmount) pair is used. The u_this field is used for the hash table
+which is used in forward lookups and the u_next field for the reverse lookups.
+
+During every new mount (or mount propagation), a new union_mount structure is
+allocated. A reference to the mountpoint's vfsmount and dentry is taken and
+stored in the u_next field.  In almost the same manner an union_mount
+structure is created during the first time lookup of a directory within a
+union mount point. In this case the lookup proceeds to all lower layers of the
+union. Therefore the complete union stack is constructed during lookups.
+
+The union_mount structures of a dentry are destroyed when the dentry itself is
+destroyed. Therefore the dentry cache is indirectly driving the union_mount
+cache like this is done for inodes too. Please note that lower layer
+union_mount structures are kept in memory until the topmost dentry is
+destroyed.
+
+---
+
+3. Writable Unions: The White-out Filetype and Copy-On-Open
+=

[RFC 21/26] union-mount: in-kernel file copy between union mounted filesystems

2007-07-30 Thread Jan Blunck
This patch introduces in-kernel file copy between union mounted
filesystems. When a file is opened for writing but resides on a lower (thus
read-only) layer of the union stack it is copied to the topmost union layer
first.

This patch uses the do_splice() for doing the in-kernel file copy.

Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c|   73 ++-
 fs/union.c|  312 ++
 include/linux/union.h |9 +
 3 files changed, 389 insertions(+), 5 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -994,7 +994,7 @@ static int __follow_mount(struct path *p
return res;
 }
 
-static void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
+void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
 {
while (d_mountpoint(*dentry)) {
struct vfsmount *mounted = lookup_mnt(*mnt, *dentry);
@@ -1213,6 +1213,21 @@ static fastcall int __link_path_walk(con
if (err)
break;
 
+   if ((nd->flags & LOOKUP_TOPMOST) &&
+   (nd->um_flags & LAST_LOWLEVEL)) {
+   struct dentry *dentry;
+
+   dentry = union_create_topmost(nd, &this, &next);
+   if (IS_ERR(dentry)) {
+   err = PTR_ERR(dentry);
+   goto out_dput;
+   }
+   dput_path(&next, nd);
+   next.mnt = nd->mnt;
+   next.dentry = dentry;
+   nd->um_flags &= ~LAST_LOWLEVEL;
+   }
+
err = -ENOENT;
inode = next.dentry->d_inode;
if (!inode || S_ISWHT(inode->i_mode))
@@ -1267,6 +1282,22 @@ last_component:
err = do_lookup(nd, &this, &next);
if (err)
break;
+
+   if ((nd->flags & LOOKUP_TOPMOST) &&
+   (nd->um_flags & LAST_LOWLEVEL)) {
+   struct dentry *dentry;
+
+   dentry = union_create_topmost(nd, &this, &next);
+   if (IS_ERR(dentry)) {
+   err = PTR_ERR(dentry);
+   goto out_dput;
+   }
+   dput_path(&next, nd);
+   next.mnt = nd->mnt;
+   next.dentry = dentry;
+   nd->um_flags &= ~LAST_LOWLEVEL;
+   }
+
inode = next.dentry->d_inode;
if ((lookup_flags & LOOKUP_FOLLOW)
&& inode && inode->i_op && inode->i_op->follow_link) {
@@ -1755,7 +1786,7 @@ out:
return err;
 }
 
-static int hash_lookup_union(struct nameidata *nd, struct qstr *name,
+int hash_lookup_union(struct nameidata *nd, struct qstr *name,
 struct path *path)
 {
struct path safe = { .dentry = nd->dentry, .mnt = nd->mnt };
@@ -2169,6 +2200,11 @@ int open_namei(int dfd, const char *path
 nd, flag);
if (error)
return error;
+   if (flag & FMODE_WRITE) {
+   error = union_copyup(nd, flag);
+   if (error)
+   return error;
+   }
goto ok;
}
 
@@ -2188,6 +2224,16 @@ int open_namei(int dfd, const char *path
if (nd->last_type != LAST_NORM || nd->last.name[nd->last.len])
goto exit;
 
+   /*
+* If this dentry is on an union mount we need the topmost dentry here.
+* This creates all topmost directories on the path to this dentry too.
+*/
+   if (is_unionized(nd->dentry, nd->mnt)) {
+   error = union_relookup_topmost(nd, nd->flags & ~LOOKUP_PARENT);
+   if (error)
+   goto exit;
+   }
+
dir = nd->dentry;
nd->flags &= ~LOOKUP_PARENT;
mutex_lock(&dir->d_inode->i_mutex);
@@ -2235,10 +2281,21 @@ do_last:
if (path.dentry->d_inode->i_op && 
path.dentry->d_inode->i_op->follow_link)
goto do_link;
 
-   path_to_nameidata(&path, nd);
error = -EISDIR;
if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
-   goto exit;
+   goto exit_dput;
+
+   /*
+* If this file is on a lower layer of the union stack, copy it to the
+* topmost layer before opening it
+*/
+   if (path.dentry->d_inode && (path.dentry->d_parent != dir)) {
+ 

[RFC 19/26] union-mount: Make lookup work for union-mounted file systems

2007-07-30 Thread Jan Blunck
On union-mounted file systems the lookup function must also visit lower layers
of the union-stack when doing a lookup. This patches add support for
union-mounts to cached lookups and real lookups.

We have 3 different styles of lookup functions now:
- multiple pathname components, follow mounts, follow union, follow symlinks
- single pathname component, doesn't follow mounts, follow union, doesn't
  follow symlinks
- single pathname component doesn't follow mounts, doesn't follow unions,
  doesn't follow symlinks

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c|  467 +-
 include/linux/namei.h |6 
 2 files changed, 465 insertions(+), 8 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -415,6 +416,167 @@ static struct dentry *cache_lookup(struc
 }
 
 /*
+ * cache_lookup_topmost - lookup the topmost (non-)negative dentry
+ *
+ * This is used for union mount lookups from dcache. The first non-negative
+ * dentry is searched on all layers of the union stack. Otherwise the topmost
+ * negative dentry is return.
+ */
+static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
+ struct path *path)
+{
+   struct dentry *dentry;
+
+   dentry = d_lookup(nd->dentry, name);
+   if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
+   dentry = do_revalidate(dentry, nd);
+
+   /*
+* Remember the topmost negative dentry in case we don't find anything
+*/
+   path->dentry = dentry;
+   path->mnt = dentry ? nd->mnt : NULL;
+
+   if (!dentry || dentry->d_inode)
+   return !dentry;
+
+   /* look for the first non-negative dentry */
+
+   while (follow_union_down(&nd->mnt, &nd->dentry)) {
+   dentry = d_hash_and_lookup(nd->dentry, name);
+
+   /*
+* If parts of the union stack are not in the dcache we need
+* to do a real lookup
+*/
+   if (!dentry)
+   goto out_dput;
+
+   /*
+* If parts of the union don't survive the revalidation we
+* need to do a real lookup
+*/
+   if (dentry->d_op && dentry->d_op->d_revalidate) {
+   dentry = do_revalidate(dentry, nd);
+   if (!dentry)
+   goto out_dput;
+   }
+
+   if (dentry->d_inode)
+   goto out_dput;
+
+   dput(dentry);
+   }
+
+   return !dentry;
+
+out_dput:
+   dput(path->dentry);
+   path->dentry = dentry;
+   path->mnt = dentry ? mntget(nd->mnt) : NULL;
+   return !dentry;
+}
+
+/*
+ * cache_lookup_union - lookup the rest of the union stack
+ *
+ * This is called after you have the topmost dentry in @path.
+ */
+static int __cache_lookup_union(struct nameidata *nd, struct qstr *name,
+   struct path *path)
+{
+   struct path last = *path;
+   struct dentry *dentry;
+
+   while (follow_union_down(&nd->mnt, &nd->dentry)) {
+   dentry = d_hash_and_lookup(nd->dentry, name);
+   if (!dentry)
+   return 1;
+
+   if (dentry->d_op && dentry->d_op->d_revalidate) {
+   dentry = do_revalidate(dentry, nd);
+   if (!dentry)
+   return 1;
+   }
+
+   if (!dentry->d_inode) {
+   dput(dentry);
+   continue;
+   }
+
+   /* only directories can be part of a union stack */
+   if (!S_ISDIR(dentry->d_inode->i_mode)) {
+   dput(dentry);
+   break;
+   }
+
+   /* now we know we found something "real"  */
+   append_to_union(last.mnt, last.dentry, nd->mnt, dentry);
+
+   if (last.dentry != path->dentry)
+   pathput(&last);
+   last.dentry = dentry;
+   last.mnt = mntget(nd->mnt);
+   }
+
+   if (last.dentry != path->dentry)
+   pathput(&last);
+
+   return 0;
+}
+
+/*
+ * cache_lookup - lookup a single pathname part from dcache
+ *
+ * This is a union mount capable version of what d_lookup() & revalidate()
+ * would do. This function returns a valid (union) dentry on success.
+ *
+ * Remember: On failure it means that parts of the union aren't cached. You
+ * should call real_lookup() afterwards to find the proper (union) dentry.
+ */
+static int cache_lookup_union(str

[RFC 15/26] union-mount: Add union-mount mount flag

2007-07-30 Thread Jan Blunck
Introduce MNT_UNION and MS_UNION flags. You need additional patches for
util-linux for that to work.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namespace.c|6 +-
 include/linux/fs.h|1 +
 include/linux/mount.h |1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -437,6 +437,7 @@ static int show_vfsmnt(struct seq_file *
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
{ MNT_NOMNT, ",nomnt" },
+   { MNT_UNION, ",union" },
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
@@ -1558,9 +1559,12 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_RELATIME;
if (flags & MS_NOMNT)
mnt_flags |= MNT_NOMNT;
+   if (flags & MS_UNION)
+   mnt_flags |= MNT_UNION;
 
flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_NOMNT);
+  MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_NOMNT |
+  MS_UNION );
 
/* ... and get the mountpoint */
retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -114,6 +114,7 @@ extern int dir_notify_enable;
 #define MS_REMOUNT 32  /* Alter flags of a mounted FS */
 #define MS_MANDLOCK64  /* Allow mandatory locks on an FS */
 #define MS_DIRSYNC 128 /* Directory modifications are synchronous */
+#define MS_UNION   256
 #define MS_NOATIME 1024/* Do not update access times. */
 #define MS_NODIRATIME  2048/* Do not update directory access times */
 #define MS_BIND4096
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -36,6 +36,7 @@ struct mnt_namespace;
 #define MNT_SHARED 0x1000  /* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE 0x2000  /* if the vfsmount is a unbindable mount */
 #define MNT_PNODE_MASK 0x3000  /* propagation flag mask */
+#define MNT_UNION  0x4000  /* if the vfsmount is a union mount */
 
 struct vfsmount {
struct list_head mnt_hash;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 18/26] union-mount: Changes to the namespace handling

2007-07-30 Thread Jan Blunck
Creates the proper struct union_mount when mounting something into a
union. If the topmost filesystem isn't capable of handling the white-out
filetype it could only be mount read-only.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namespace.c|   46 ++--
 fs/union.c|   57 ++
 include/linux/mount.h |3 ++
 include/linux/union.h |6 +
 4 files changed, 110 insertions(+), 2 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "pnode.h"
@@ -68,6 +69,9 @@ struct vfsmount *alloc_vfsmnt(const char
INIT_LIST_HEAD(&mnt->mnt_share);
INIT_LIST_HEAD(&mnt->mnt_slave_list);
INIT_LIST_HEAD(&mnt->mnt_slave);
+#ifdef CONFIG_UNION_MOUNT
+   INIT_LIST_HEAD(&mnt->mnt_unions);
+#endif
if (name) {
int size = strlen(name) + 1;
char *newname = kmalloc(size, GFP_KERNEL);
@@ -157,6 +161,7 @@ static void __touch_mnt_namespace(struct
 
 static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd)
 {
+   detach_mnt_union(mnt);
old_nd->dentry = mnt->mnt_mountpoint;
old_nd->mnt = mnt->mnt_parent;
mnt->mnt_parent = mnt;
@@ -180,6 +185,7 @@ static void attach_mnt(struct vfsmount *
list_add_tail(&mnt->mnt_hash, mount_hashtable +
hash(nd->mnt, nd->dentry));
list_add_tail(&mnt->mnt_child, &nd->mnt->mnt_mounts);
+   attach_mnt_union(mnt, nd->mnt, nd->dentry);
 }
 
 /*
@@ -202,6 +208,7 @@ static void commit_tree(struct vfsmount 
list_add_tail(&mnt->mnt_hash, mount_hashtable +
hash(parent, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+   attach_mnt_union(mnt, mnt->mnt_parent, mnt->mnt_mountpoint);
touch_mnt_namespace(n);
 }
 
@@ -577,6 +584,7 @@ void release_mounts(struct list_head *he
struct dentry *dentry;
struct vfsmount *m;
spin_lock(&vfsmount_lock);
+   detach_mnt_union(mnt);
dentry = mnt->mnt_mountpoint;
m = mnt->mnt_parent;
mnt->mnt_mountpoint = mnt->mnt_root;
@@ -999,6 +1007,10 @@ static int do_change_type(struct nameida
if (nd->dentry != nd->mnt->mnt_root)
return -EINVAL;
 
+   /* Don't change the type of union mounts */
+   if (IS_MNT_UNION(nd->mnt))
+   return -EINVAL;
+
down_write(&namespace_sem);
spin_lock(&vfsmount_lock);
for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
@@ -1011,7 +1023,8 @@ static int do_change_type(struct nameida
 /*
  * do loopback mount.
  */
-static int do_loopback(struct nameidata *nd, char *old_name, int flags)
+static int do_loopback(struct nameidata *nd, char *old_name, int flags,
+  int mnt_flags)
 {
int clone_flags = 0;
uid_t owner = 0;
@@ -1049,6 +1062,18 @@ static int do_loopback(struct nameidata 
if (IS_ERR(mnt))
goto out;
 
+   /*
+* Unions couldn't be writable if the filesystem doesn't know about
+* whiteouts
+*/
+   err = -ENOTSUPP;
+   if ((mnt_flags & MNT_UNION) &&
+   !(mnt->mnt_sb->s_flags & (MS_WHITEOUT|MS_RDONLY)))
+   goto out;
+
+   if (mnt_flags & MNT_UNION)
+   mnt->mnt_flags |= MNT_UNION;
+
err = graft_tree(mnt, nd);
if (err) {
LIST_HEAD(umount_list);
@@ -1121,6 +1146,13 @@ static int do_move_mount(struct nameidat
if (err)
return err;
 
+   /* moving to or from a union mount is not supported */
+   err = -EINVAL;
+   if (IS_MNT_UNION(nd->mnt))
+   goto exit;
+   if (IS_MNT_UNION(old_nd.mnt))
+   goto exit;
+
down_write(&namespace_sem);
while (d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
;
@@ -1176,6 +1208,7 @@ out:
up_write(&namespace_sem);
if (!err)
path_release(&parent_nd);
+exit:
path_release(&old_nd);
return err;
 }
@@ -1253,6 +1286,15 @@ int do_add_mount(struct vfsmount *newmnt
if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
goto unlock;
 
+   /*
+* Unions couldn't be writable if the filesystem doesn't know about
+* whiteouts
+*/
+   err = -ENOTSUPP;
+   if ((mnt_flags & MNT_UNION

[RFC 16/26] union-mount: Introduce union_mount structure

2007-07-30 Thread Jan Blunck
This patch adds the basic structures of VFS based union mounts. It is a new
implementation based on some of my old idea's that influenced Bharata B Rao
<[EMAIL PROTECTED]> who came up with the proposal to let the
union_mount struct only point to the next layer in the union stack. I rewrote
nearly all of the central patches around lookup and the dcache interaction.

Advantages of the new implementation:
- the new union stack is no longer tied directly to one dentry
- the union stack enables dentries to be part of more than one union
  (bind mounts)
- it is unnecessary to traverse the union stack when de/referencing a dentry
- caching of union stack information still driven by dentry cache

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/Kconfig |8 +
 fs/Makefile|2 
 fs/dcache.c|4 
 fs/union.c |  335 +
 include/linux/dcache.h |9 +
 include/linux/union.h  |   61 
 6 files changed, 419 insertions(+)

--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -551,6 +551,14 @@ config INOTIFY_USER
 
  If unsure, say Y.
 
+config UNION_MOUNT
+   bool "Union mount support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL
+   ---help---
+ If you say Y here, you will be able to mount file systems as
+ union mount stacks. This is a VFS based implementation and
+ should work with all file systems. If unsure, say N.
+
 config QUOTA
bool "Quota support"
help
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -49,6 +49,8 @@ obj-$(CONFIG_FS_POSIX_ACL)+= posix_acl.
 obj-$(CONFIG_NFS_COMMON)   += nfs_common/
 obj-$(CONFIG_GENERIC_ACL)  += generic_acl.o
 
+obj-$(CONFIG_UNION_MOUNT)  += union.o
+
 obj-$(CONFIG_QUOTA)+= dquot.o
 obj-$(CONFIG_QFMT_V1)  += quota_v1.o
 obj-$(CONFIG_QFMT_V2)  += quota_v2.o
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -985,6 +985,10 @@ struct dentry *d_alloc(struct dentry * p
 #ifdef CONFIG_PROFILING
dentry->d_cookie = NULL;
 #endif
+#ifdef CONFIG_UNION_MOUNT
+   INIT_LIST_HEAD(&dentry->d_unions);
+   dentry->d_unionized = 0;
+#endif
INIT_HLIST_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,335 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *
+ *   Author(s): Jan Blunck ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
+ * should try to make this good - I've just made it work.
+ */
+static unsigned int union_hash_mask __read_mostly;
+static unsigned int union_hash_shift __read_mostly;
+static struct hlist_head *union_hashtable __read_mostly;
+static unsigned int union_rhash_mask __read_mostly;
+static unsigned int union_rhash_shift __read_mostly;
+static struct hlist_head *union_rhashtable __read_mostly;
+
+/*
+ * Locking Rules:
+ * - dcache_lock (for union_rlookup() only)
+ * - union_lock
+ */
+DEFINE_SPINLOCK(union_lock);
+
+static struct kmem_cache *union_cache __read_mostly;
+
+static unsigned long hash(struct dentry *dentry, struct vfsmount *mnt)
+{
+   unsigned long tmp;
+
+   tmp = ((unsigned long)mnt * (unsigned long)dentry) ^
+   (GOLDEN_RATIO_PRIME + (unsigned long)mnt) / L1_CACHE_BYTES;
+   tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> union_hash_shift);
+   return tmp & union_hash_mask;
+}
+
+static __initdata unsigned long union_hash_entries;
+
+static int __init set_union_hash_entries(char *str)
+{
+   if (!str)
+   return 0;
+   union_hash_entries = simple_strtoul(str, &str, 0);
+   return 1;
+}
+
+__setup("union_hash_entries=", set_union_hash_entries);
+
+static int __init init_union(void)
+{
+   int loop;
+
+   union_cache = kmem_cache_create("union_mount",
+   sizeof(struct union_mount), 0,
+   SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+   NULL, NULL);
+
+   union_hashtable = alloc_large_system_hash("Union-cache",
+ sizeof(struct hlist_head),
+ union_hash_entries,
+ 14,
+ 0,
+  

[RFC 20/26] union-mount: Simple union-mount readdir implementation

2007-07-30 Thread Jan Blunck
This is a very simple union mount readdir implementation. It modifies the
readdir routine to merge the entries of union mounted directories and
eliminate duplicates while walking the union stack.

  FIXME:
  This patch needs to be reworked! At the moment this only works for ext2 and
  tmpfs. All kind of index directories that return d_off > i_size don't work
  with this.

The directory entries are read starting from the top layer and they are
maintained in a cache. Subsequently when the entries from the bottom layers
of the union stack are read they are checked for duplicates (in the cache)
before being passed out to the user space. There can be multiple calls
to readdir/getdents routines for reading the entries of a single directory.
But union directory cache is not maitained across these calls. Instead
for every call, the previously read entries are re-read into the cache
and newly read entires are compared against these for duplicates before
being they are returned to user space.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
---
 fs/readdir.c  |   11 -
 fs/union.c|  336 ++
 include/linux/union.h |   25 +++
 3 files changed, 364 insertions(+), 8 deletions(-)

--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -16,13 +16,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 {
-   struct inode *inode = file->f_path.dentry->d_inode;
int res = -ENOTDIR;
+
if (!file->f_op || !file->f_op->readdir)
goto out;
 
@@ -30,13 +31,7 @@ int vfs_readdir(struct file *file, filld
if (res)
goto out;
 
-   mutex_lock(&inode->i_mutex);
-   res = -ENOENT;
-   if (!IS_DEADDIR(inode)) {
-   res = file->f_op->readdir(file, buf, filler);
-   file_accessed(file);
-   }
-   mutex_unlock(&inode->i_mutex);
+   res = do_readdir(file, buf, filler);
 out:
return res;
 }
--- a/fs/union.c
+++ b/fs/union.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -462,3 +464,337 @@ void detach_mnt_union(struct vfsmount *m
union_put(um);
return;
 }
+
+
+/*
+ * Union mounts support for readdir.
+ */
+
+/* This is a copy from fs/readdir.c */
+struct getdents_callback {
+   struct linux_dirent __user *current_dir;
+   struct linux_dirent __user *previous;
+   int count;
+   int error;
+};
+
+/* The readdir union cache object */
+struct union_cache_entry {
+   struct list_head list;
+   struct qstr name;
+};
+
+static int union_cache_add_entry(struct list_head *list,
+const char *name, int namelen)
+{
+   struct union_cache_entry *this;
+   char *tmp_name;
+
+   this = kmalloc(sizeof(*this), GFP_KERNEL);
+   if (!this) {
+   printk(KERN_CRIT
+  "union_cache_add_entry(): out of kernel memory\n");
+   return -ENOMEM;
+   }
+
+   tmp_name = kmalloc(namelen + 1, GFP_KERNEL);
+   if (!tmp_name) {
+   printk(KERN_CRIT
+  "union_cache_add_entry(): out of kernel memory\n");
+   kfree(this);
+   return -ENOMEM;
+   }
+
+   this->name.name = tmp_name;
+   this->name.len = namelen;
+   this->name.hash = 0;
+   memcpy(tmp_name, name, namelen);
+   tmp_name[namelen] = 0;
+   INIT_LIST_HEAD(&this->list);
+   list_add(&this->list, list);
+   return 0;
+}
+
+static void union_cache_free(struct list_head *uc_list)
+{
+   struct list_head *p;
+   struct list_head *ptmp;
+   int count = 0;
+
+   list_for_each_safe(p, ptmp, uc_list) {
+   struct union_cache_entry *this;
+
+   this = list_entry(p, struct union_cache_entry, list);
+   list_del_init(&this->list);
+   kfree(this->name.name);
+   kfree(this);
+   count++;
+   }
+   return;
+}
+
+static int union_cache_find_entry(struct list_head *uc_list,
+ const char *name, int namelen)
+{
+   struct union_cache_entry *p;
+   int ret = 0;
+
+   list_for_each_entry(p, uc_list, list) {
+   if (p->name.len != namelen)
+   continue;
+   if (strncmp(p->name.name, name, namelen) == 0) {
+   ret = 1;
+   break;
+   }
+   }
+
+   return ret;
+}
+
+/*
+ * There are four filldir() wrapper necessary for the union mount readdir
+ * implementation:
+ *
+ * - filldir_topmost(): fills the union's readdir cache and the user space

[RFC 17/26] union-mount: Drive the union cache via dcache

2007-07-30 Thread Jan Blunck
If a dentry is removed from dentry cache because its usage count drops to
zero, the references to the underlying layer of the unions the dentry is in
are droped too. Therefore the union cache is driven by the dentry cache.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/dcache.c|8 +
 fs/union.c |   72 +
 include/linux/dcache.h |8 +
 include/linux/union.h  |6 
 4 files changed, 94 insertions(+)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -142,11 +143,14 @@ static struct dentry *__d_kill(struct de
list_add(&dentry->d_lru, list);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
+   __shrink_d_unions(dentry, list);
return NULL;
}
 
/* drops the locks, at that point nobody can reach this dentry */
dentry_iput(dentry);
+   /* If the dentry was in an union delete them */
+   shrink_d_unions(dentry);
parent = dentry->d_parent;
d_free(dentry);
return dentry == parent ? NULL : parent;
@@ -721,6 +725,7 @@ static void shrink_dcache_for_umount_sub
iput(inode);
}
 
+   shrink_d_unions(dentry);
d_free(dentry);
 
/* finished when we fall off the top of the tree,
@@ -1464,7 +1469,9 @@ void d_delete(struct dentry * dentry)
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
if (atomic_read(&dentry->d_count) == 1) {
+   __d_drop_unions(dentry);
dentry_iput(dentry);
+   shrink_d_unions(dentry);
fsnotify_nameremove(dentry, isdir);
 
/* remove this and other inotify debug checks after 2.6.18 */
@@ -1478,6 +1485,7 @@ void d_delete(struct dentry * dentry)
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
 
+   shrink_d_unions(dentry);
fsnotify_nameremove(dentry, isdir);
 }
 
--- a/fs/union.c
+++ b/fs/union.c
@@ -258,6 +258,8 @@ int append_to_union(struct vfsmount *mnt
union_put(this);
return 0;
}
+   list_add(&this->u_unions, &dentry->d_unions);
+   dest_dentry->d_unionized++;
__union_hash(this);
spin_unlock(&union_lock);
return 0;
@@ -333,3 +335,73 @@ int follow_union_mount(struct vfsmount *
 
return res;
 }
+
+/*
+ * This must be called when unhashing a dentry. This is called with dcache_lock
+ * and unhashes all unions this dentry is in.
+ */
+void __d_drop_unions(struct dentry *dentry)
+{
+   struct union_mount *this, *next;
+
+   spin_lock(&union_lock);
+   list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions)
+   __union_unhash(this);
+   spin_unlock(&union_lock);
+}
+
+/*
+ * This must be called after __d_drop_unions() without holding any locks.
+ * Note: The dentry might still be reachable via a lookup but at that time it
+ * already a negative dentry. Otherwise it would be unhashed. The union_mount
+ * structure itself is still reachable through mnt->mnt_unions (which we
+ * protect against with union_lock).
+ */
+void shrink_d_unions(struct dentry *dentry)
+{
+   struct union_mount *this, *next;
+
+repeat:
+   spin_lock(&union_lock);
+   list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+   BUG_ON(!hlist_unhashed(&this->u_hash));
+   BUG_ON(!hlist_unhashed(&this->u_rhash));
+   list_del(&this->u_unions);
+   this->u_next.dentry->d_unionized--;
+   spin_unlock(&union_lock);
+   union_put(this);
+   goto repeat;
+   }
+   spin_unlock(&union_lock);
+}
+
+extern void __dput(struct dentry *, struct list_head *);
+
+/*
+ * This is the special variant for use in dput() only.
+ */
+void __shrink_d_unions(struct dentry *dentry, struct list_head *list)
+{
+   struct union_mount *this, *next;
+
+   BUG_ON(!d_unhashed(dentry));
+
+repeat:
+   spin_lock(&union_lock);
+   list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+   struct dentry *n_dentry = this->u_next.dentry;
+   struct vfsmount *n_mnt = this->u_next.mnt;
+
+   BUG_ON(!hlist_unhashed(&this->u_hash));
+   BUG_ON(!hlist_unhashed(&this->u_rhash));
+   list_del(&this->u_unions);
+   this->u_next.dentry->d_unionized--;
+   spin_unlock(&union_lock);
+   if (__union_put(this)) {
+   __dput(n_dentr

[RFC 09/26] linux/stat.h: Add the filetype white-out

2007-07-30 Thread Jan Blunck
A white-out stops the VFS from further lookups of the white-out's name and
returns -ENOENT. This is the same behaviour as if the filename isn't
found. This can be used in combination with union mounts to virtually
delete (white-out) files by creating a file of this file type.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 include/linux/stat.h |2 ++
 1 file changed, 2 insertions(+)

--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -10,6 +10,7 @@
 #if defined(__KERNEL__) || !defined(__GLIBC__) || (__GLIBC__ < 2)
 
 #define S_IFMT  0017
+#define S_IFWHT  016   /* whiteout */
 #define S_IFSOCK 014
 #define S_IFLNK 012
 #define S_IFREG  010
@@ -28,6 +29,7 @@
 #define S_ISBLK(m) (((m) & S_IFMT) == S_IFBLK)
 #define S_ISFIFO(m)(((m) & S_IFMT) == S_IFIFO)
 #define S_ISSOCK(m)(((m) & S_IFMT) == S_IFSOCK)
+#define S_ISWHT(m) (((m) & S_IFMT) == S_IFWHT)
 
 #define S_IRWXU 00700
 #define S_IRUSR 00400

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 10/26] VFS white-out handling

2007-07-30 Thread Jan Blunck
Introduce white-out handling in the VFS.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/inode.c |   22 ++
 fs/namei.c |  417 +++--
 fs/readdir.c   |6 
 include/linux/fs.h |7 
 4 files changed, 441 insertions(+), 11 deletions(-)

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1410,6 +1410,26 @@ void __init inode_init(unsigned long mem
INIT_HLIST_HEAD(&inode_hashtable[loop]);
 }
 
+/*
+ * Dummy default file-operations:
+ * Never open a whiteout. This is always a bug.
+ */
+static int whiteout_no_open(struct inode *irrelevant, struct file *dontcare)
+{
+   printk("WARNING: at %s:%d %s(): Attempted to open a whiteout!\n",
+  __FILE__, __LINE__, __FUNCTION__);
+   /*
+* Nobody should ever be able to open a whiteout. On the other hand
+* this isn't fatal so lets just print a warning message.
+*/
+   WARN_ON(1);
+   return -ENXIO;
+}
+
+static struct file_operations def_wht_fops = {
+   .open   = whiteout_no_open,
+};
+
 void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
 {
inode->i_mode = mode;
@@ -1423,6 +1443,8 @@ void init_special_inode(struct inode *in
inode->i_fop = &def_fifo_fops;
else if (S_ISSOCK(mode))
inode->i_fop = &bad_sock_fops;
+   else if (S_ISWHT(mode))
+   inode->i_fop = &def_wht_fops;
else
printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n",
   mode);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -887,7 +887,7 @@ static fastcall int __link_path_walk(con
 
err = -ENOENT;
inode = next.dentry->d_inode;
-   if (!inode)
+   if (!inode || S_ISWHT(inode->i_mode))
goto out_dput;
err = -ENOTDIR; 
if (!inode->i_op)
@@ -951,6 +951,8 @@ last_component:
err = -ENOENT;
if (!inode)
break;
+   if (S_ISWHT(inode->i_mode))
+   break;
if (lookup_flags & LOOKUP_DIRECTORY) {
err = -ENOTDIR; 
if (!inode->i_op || !inode->i_op->lookup)
@@ -1434,13 +1436,10 @@ static inline int check_sticky(struct in
  * 10. We don't allow removal of NFS sillyrenamed files; it's handled by
  * nfs_async_unlink().
  */
-static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
+static int __may_delete(struct inode *dir, struct dentry *victim, int isdir)
 {
int error;
 
-   if (!victim->d_inode)
-   return -ENOENT;
-
BUG_ON(victim->d_parent->d_inode != dir);
audit_inode_child(victim->d_name.name, victim->d_inode, dir);
 
@@ -1466,6 +1465,14 @@ static int may_delete(struct inode *dir,
return 0;
 }
 
+static int may_delete(struct inode *dir, struct dentry *victim, int isdir)
+{
+   if (!victim->d_inode || S_ISWHT(victim->d_inode->i_mode))
+   return -ENOENT;
+
+   return __may_delete(dir, victim, isdir);
+}
+
 /* Check whether we can create an object with dentry child in directory
  *  dir.
  *  1. We can't do it if child already exists (open has special treatment for
@@ -1477,7 +1484,7 @@ static int may_delete(struct inode *dir,
 static inline int may_create(struct inode *dir, struct dentry *child,
 struct nameidata *nd)
 {
-   if (child->d_inode)
+   if (child->d_inode && !S_ISWHT(child->d_inode->i_mode))
return -EEXIST;
if (IS_DEADDIR(dir))
return -ENOENT;
@@ -1559,6 +1566,13 @@ int vfs_create(struct inode *dir, struct
error = security_inode_create(dir, dentry, mode);
if (error)
return error;
+
+   if (dentry->d_inode && S_ISWHT(dentry->d_inode->i_mode)) {
+   error = vfs_unlink_whiteout(dir, dentry);
+   if (error)
+   return error;
+   }
+
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error)
@@ -1741,7 +1755,7 @@ do_last:
}
 
/* Negative dentry, just create the file */
-   if (!path.dentry->d_inode) {
+   if (!path.dentry->d_inode || S_ISWHT(path.dentry->d_inode->i_mode)) {
error = open_namei_create(nd, &path, flag, mode);
if (error)
goto exit;
@@ -1903,6 +1917,12 @@ int vfs_mknod(struct inode *dir, struct 
if (error)
return error;
 
+   if (dentry->d_inode && S_ISWHT(dentry->d_inode->i_mode)) {
+   error = vfs_unlink_whiteout(dir, dentry);
+   if (error)
+

[RFC 01/26] [PATCH 14/18] shmem: convert to using splice instead of sendfile()

2007-07-30 Thread Jan Blunck
From: Hugh Dickins <[EMAIL PROTECTED]>

Remove shmem_file_sendfile and resurrect shmem_readpage, as used by tmpfs
to support loop and sendfile in 2.4 and 2.5.  Now tmpfs can support splice,
loop and sendfile in the simplest way, using generic_file_splice_read and
generic_file_splice_write (with the aid of shmem_prepare_write).

We could make some efficiency tweaks later, if there's a real need;
but this is stable and works well as is.

Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>
---
 mm/shmem.c |   40 
 1 file changed, 16 insertions(+), 24 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1109,8 +1109,8 @@ static int shmem_getpage(struct inode *i
 * Normally, filepage is NULL on entry, and either found
 * uptodate immediately, or allocated and zeroed, or read
 * in under swappage, which is then assigned to filepage.
-* But shmem_write_begin passes in a locked filepage,
-* which may be found not uptodate by other callers too,
+* But shmem_readpage and shmem_write_begin passes in a locked
+* filepage, which may be found not uptodate by other callers too,
 * and may need to be copied from the swappage read in.
 */
 repeat:
@@ -1454,9 +1454,18 @@ static const struct inode_operations shm
 static const struct inode_operations shmem_symlink_inline_operations;
 
 /*
- * Normally tmpfs makes no use of shmem_write_begin, but it
- * lets a tmpfs file be used read-write below the loop driver.
+ * Normally tmpfs avoids the use of shmem_readpage and shmem_write_begin;
+ * but providing them allows a tmpfs file to be used for splice, sendfile, and
+ * below the loop driver, in the generic fashion that many filesystems support.
  */
+static int shmem_readpage(struct file *file, struct page *page)
+{
+   struct inode *inode = page->mapping->host;
+   int error = shmem_getpage(inode, page->index, &page, SGP_CACHE, NULL);
+   unlock_page(page);
+   return error;
+}
+
 static int
 shmem_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
@@ -1701,25 +1710,6 @@ static ssize_t shmem_file_read(struct fi
return desc.error;
 }
 
-static ssize_t shmem_file_sendfile(struct file *in_file, loff_t *ppos,
-size_t count, read_actor_t actor, void *target)
-{
-   read_descriptor_t desc;
-
-   if (!count)
-   return 0;
-
-   desc.written = 0;
-   desc.count = count;
-   desc.arg.data = target;
-   desc.error = 0;
-
-   do_shmem_file_read(in_file, ppos, &desc, actor);
-   if (desc.written)
-   return desc.written;
-   return desc.error;
-}
-
 static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
@@ -2376,6 +2366,7 @@ static const struct address_space_operat
.writepage  = shmem_writepage,
.set_page_dirty = __set_page_dirty_no_writeback,
 #ifdef CONFIG_TMPFS
+   .readpage   = shmem_readpage,
.write_begin= shmem_write_begin,
.write_end  = shmem_write_end,
 #endif
@@ -2389,7 +2380,8 @@ static const struct file_operations shme
.read   = shmem_file_read,
.write  = shmem_file_write,
.fsync  = simple_sync_file,
-   .sendfile   = shmem_file_sendfile,
+   .splice_read= generic_file_splice_read,
+   .splice_write   = generic_file_splice_write,
 #endif
 };
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 05/26] VFS: cache_lookup() cleanup

2007-07-30 Thread Jan Blunck
cache_lookup() can directly use d_lookup() instead of calling __d_lookup()
first since rename_lock is a seq_lock.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c |   13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -403,15 +403,10 @@ do_revalidate(struct dentry *dentry, str
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
-static struct dentry * cached_lookup(struct dentry * parent, struct qstr * 
name, struct nameidata *nd)
+static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
+  struct nameidata *nd)
 {
-   struct dentry * dentry = __d_lookup(parent, name);
-
-   /* lockess __d_lookup may fail due to concurrent d_move() 
-* in some unrelated directory, so try with d_lookup
-*/
-   if (!dentry)
-   dentry = d_lookup(parent, name);
+   struct dentry *dentry = d_lookup(parent, name);
 
if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
dentry = do_revalidate(dentry, nd);
@@ -1276,7 +1271,7 @@ static inline struct dentry *__lookup_ha
goto out;
}
 
-   dentry = cached_lookup(base, name, nd);
+   dentry = cache_lookup(base, name, nd);
if (!dentry) {
struct dentry *new = d_alloc(base, name);
dentry = ERR_PTR(-ENOMEM);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 02/26] VFS: Export dput_path() and path_to_nameidata()

2007-07-30 Thread Jan Blunck
This patch makes dput_path() and path_to_nameidata() general available.

Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
---
 fs/namei.c|   16 
 include/linux/namei.h |   15 +++
 2 files changed, 15 insertions(+), 16 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -573,22 +573,6 @@ fail:
return PTR_ERR(link);
 }
 
-static inline void dput_path(struct path *path, struct nameidata *nd)
-{
-   dput(path->dentry);
-   if (path->mnt != nd->mnt)
-   mntput(path->mnt);
-}
-
-static inline void path_to_nameidata(struct path *path, struct nameidata *nd)
-{
-   dput(nd->dentry);
-   if (nd->mnt != path->mnt)
-   mntput(nd->mnt);
-   nd->mnt = path->mnt;
-   nd->dentry = path->dentry;
-}
-
 static __always_inline int __do_follow_link(struct path *path, struct 
nameidata *nd)
 {
int error;
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -119,5 +119,20 @@ static inline void pathput(struct path *
dput(path->dentry);
mntput(path->mnt);
 }
+static inline void dput_path(struct path *path, struct nameidata *nd)
+{
+   dput(path->dentry);
+   if (path->mnt != nd->mnt)
+   mntput(path->mnt);
+}
+
+static inline void path_to_nameidata(struct path *path, struct nameidata *nd)
+{
+   dput(nd->dentry);
+   if (nd->mnt != path->mnt)
+   mntput(nd->mnt);
+   nd->mnt = path->mnt;
+   nd->dentry = path->dentry;
+}
 
 #endif /* _LINUX_NAMEI_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Tue, 19 Jun 2007 22:59:51 -0700, Arjan van de Ven wrote:

> user does on FS A: 
> mkdir  /mnt/A/somedir
> touch /mnt/A/somedir/somefile
> 
> and then 2 things happen in parallel
> 1) touch /mnt/B/somefile
> 2) mv /mnt/union/somedir /mnt/union/somefile
> 
> since the underlying FS for 2) is FS A... how will this work out locking
> wise? Will the VS lock the union directory only? Or will this operate
> only on the underlying FS? How is dcache consistency guaranteed for
> scenarios like this?

Ok, with Christophs help I guess I know now what the question is :)

touch /mnt/B/somefile is doing a lookup in "B" for "somefile". Therefore it
locks B->i_mutex for that. When it gets a negative dentry it creates the
file.

mv /mnt/union/somedir /mnt/union/somefile is doing a lookup in "union" for
"somefile". Therefore it first locks the i_mutex of the topmost directory
in the union of "/mnt/union" (which happens to be "B"). When it gets a
negative dentry it than follows the union down to the next layer (with the
topmost directory still locked). Lookup is repeated until a filled dentry
is found or the topmost dentry negative dentry is used as a target for the
move. Thats it.

Did that answer your question?

Cheers,
Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 13:32:23 +0100, Christoph Hellwig wrote:

> On Wed, Jun 20, 2007 at 07:29:55AM +0000, Jan Blunck wrote:
>> Mounting a file system twice is bad in the first place. This should be
>> done by using bind mounts and bind a mounted file system into a union.
>> After that the normal locking rules apply (and hopefully work ;).
> 
> From the kernel POV mounting a filesystem twice is the same as doing
> a bind mount.

Somehow I thought about doing this:

 mount /dev/dasda1 /mnt/A
 mount /dev/dasda1 /mnt/B

... which doesn't result in a bind mount.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] Union mount documentation.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 11:21:57 +0530, Bharata B Rao wrote:

> +4. Union stack: building and traversal
> +-- +Union stack needs to be built
> from two places: during an explicit union +mount (or mount propagation)
> and during the lookup of a directory that +appears in more than one
> layer of the union. +
> +The link between two layers of union stack is maintained using the
> +union_mount structure:
> +
> +struct union_mount {
> + /* vfsmount and dentry of this layer */
> + struct vfsmount *src_mnt;
> + struct dentry *src_dentry;
> +
> + /* vfsmount and dentry of the next lower layer */
> + struct vfsmount *dst_mnt;
> + struct dentry *dst_dentry;
> +
> + /*
> +  * This list_head hashes this union_mount based on this layer's +   
>  *
> vfsmount and dentry. This is used to get to the next layer of +* the
> stack (dst_mnt, dst_dentry) given the (src_mnt, src_dentry) +  * and is
> used for stack traversal. +*/
> + struct list_head hash;
> +
> + /*
> +  * All union_mounts under a vfsmount(src_mnt) are linked together + 
>  *
> at mnt->mnt_union using this list_head. This is needed to destroy +*
> all the union_mounts when the mnt goes away. + */
> + struct list_head list;
> +};
> +
> +These union mount structures are stored in a hash
> table(union_mount_hashtable) +which uses the same hash as used for
> mount_hashtable since both of them use +(vfsmount, dentry) pairs to
> calculate the hash. +
> +During a new mount (or mount propagation), a new union_mount structure
> is +created. A reference to the mountpoint's vfsmount and dentry is
> taken and +stored in the union_mount (as dst_mnt, dst_dentry). And this
> union_mount +is inserted in the union_mount_hashtable based on the hash
> generated by +the mount root's vfsmount and dentry. +
> +Similar method is employed to create a union stack during first time
> lookup +of a common named directory within a union mount point. But
> here, the top +level directory's vfsmount and dentry are hashed to get
> to the lower level +directory's vfsmount and dentry.
> +
> +The insertion, deletion and lookup of union_mounts in the
> +union_mount_hashtable is protected by vfsmount_lock. While traversing
> the +stack, we hold this spinlock only briefly during lookup time and
> release +it as soon as we get the next union stack member. The top level
> of the +stack holds a reference to the next level (via union_mount
> structure) and +so on. Therefore, as long as we hold a reference to a
> union stack member, +its lower layers can't go away. And since we don't
> do the complete +traversal under any lock, it is possible for the stack
> to change over the +level from where we started traversing. For eg. when
> traversing the stack +downwards, a new filesystem can be mounted on top
> of it. When this happens, +the user who had a reference to the old top
> wouldn't have visibility to +the new top and would continue as if the
> new top didn't exist for him. +I believe this is fine as long as members
> of the stack don't go away from +under us(CHECK). And to be sure of
> this, we need to hold a reference to the +level from where we start the
> traversal and should continue to hold it +till we are done with the
> traversal.

Well done. I like your approach much more than the simple chaining of
dentries. When I told you about the idea of maintaining a list of
 objects I always though about one big structure for all
the layers of an union. Smaller objects that only point to the next layer
seem to be better but make the search for the topmost layer impossible.
You should maintain a reference to the topmost struct union_mount though.

> +5. Union stack: destroying
> +--
> +In addition to storing the union_mounts in a hash table for quick
> lookups, +they are also stored as a list, headed at vsmount->mnt_union.
> So, all +union_mounts that occur under a vfsmount (starting from the
> mountpoint +followed by the subdir unions) are stored within the
> vfsmount. During +umount (specifically, during the last mntput()), this
> list is traversed +to destroy all union stacks under this vfsmount. +
> +Hence, all union stacks under a vfsmount continue to exist until the
> +vfsmount is unmounted. It may be noted that the union_mount structure
> +holds a reference to the current dentry also. Becasue of this, for
> +subdir unions, both the top and bottom level dentries become pinned
> +till the upper layer filesystem is unmounted. Is this behaviour
> +acceptable ? Would this lead to a lot of pinned dentries over a period
> +of time ? (CHECK) If we don't do this, the top layer dentry might go
> +out of cache, during which time we have no means to release the
> +corresponding union_mount and the union_mount becomes stale. Would it
> +be necessary and worthwhile to add intelligence to prune_dcache() to
> +prune unused union_mounts thereby releasing the dentries ? +
> +As n

Re: [RFC PATCH 3/4] Lookup changes to support union mount.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 11:23:26 +0530, Bharata B Rao wrote:

> +/*
> + * Looks for the given @name in dcache by walking through all the layers
> + * of the union stack, starting from the top.
> + * FIXME: If we don't find the dentry in a upper layer, we descend to the
> + * next layer. So there is a chance to miss this dentry in the top layer
> + * if this is the _first_ time lookup of the dentry in this layer. A real
> + * lookup might have fetched a valid dentry in this layer itself, while we
> + * chose to descend to the next lower layer. One solution is not have this
> + * function itself, do the toplevel lookup in dcache and if it fails proceed
> + * to real_lookup_union() directly.
> + */
> +struct dentry *__d_lookup_union(struct nameidata *nd, struct qstr *name)
> +{
> + struct dentry *dentry;
> + struct nameidata nd_tmp;
> + struct vfsmount *mnt = mntget(nd->mnt);
> + struct qstr this;
> + int err;
> +
> + nd_tmp.mnt = nd->mnt;
> + nd_tmp.dentry = nd->dentry;
> +
> + this.name = name->name;
> + this.len = name->len;
> + this.hash = name->hash;
> +
> + do {
> + /* d_hash() is a repetition for the top layer. */
> + if (nd_tmp.dentry->d_op && nd_tmp.dentry->d_op->d_hash) {
> + err = nd_tmp.dentry->d_op->d_hash(nd_tmp.dentry, &this);
> + if (err < 0)
> + goto out;
> + }
> +
> + dentry = __d_lookup(nd_tmp.dentry, &this);
> + if (dentry) {
> + if (dentry->d_inode) {
> + if (nd->mnt != nd_tmp.mnt) {
> + mntput(nd->mnt);
> + nd->mnt = mntget(nd_tmp.mnt);
> + }
> + mntput(mnt);
> + return dentry;
> + } else {
> + dput(dentry);
> + }
> + }
> + } while (next_union_mount(&nd_tmp));
> +out:
> + mntput(mnt);
> + return NULL;
> +}
> +

The reference counting for vfsmount is wrong. next_union_mount() should be
something similar to follow_down(). You should grab the reference to the
underlying mount before doing the lookup. Ok ok, you already have a valid
reference in struct union_mount but anyway.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/4] Mount changes to support union mount.

2007-06-20 Thread Jan Blunck
On Wed, 20 Jun 2007 11:22:41 +0530, Bharata B Rao wrote:

> +/*
> + * When propagating mount events to peer group, this is called under
> + * vfsmount_lock. Hence using GFP_ATOMIC for kmalloc here.
> + * TODO: Can we use a separate kmem cache for union_mount ?
> + */
> +struct union_mount *alloc_union_mount(struct vfsmount *src_mnt,
> + struct dentry *src_dentry, struct vfsmount *dst_mnt,
> + struct dentry *dst_dentry)
> +{
> + struct union_mount *u;
> + u = kmalloc(sizeof(struct union_mount), GFP_ATOMIC);
> + if (!u)
> + return u;
> + u->dst_mnt = mntget(dst_mnt);
> + u->dst_dentry = dget(dst_dentry);
> + u->src_mnt = src_mnt;
> + u->src_dentry = dget(src_dentry);
> + INIT_LIST_HEAD(&u->hash);
> + INIT_LIST_HEAD(&u->list);
> + return u;
> +}

Hmm, you pin the dentries in memory until umount. This isn't good. Besides
that this doesn't work with file systems that do invalidate their
dentries. The file system must have a chance to replace the dentry in the
union structure.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >