Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Al Viro
On Fri, Jan 30, 2015 at 01:34:41PM +0800, Miao Xie wrote:
> > First of all, ->s_umount is not a mutex; it's rwsem.  So you mean
> > down_read_trylock().  As for the transient failures - grep for down_write
> > on it...  E.g. have somebody call mount() from the same device.  We call
> > sget(), which finds existing superblock and calls grab_super().  Sure,
> > that ->s_umount will be released shortly, but in the meanwhile your trylock
> > will fail...
> 
> I know it, so I suggested to return -EBUSY in the previous mail.
> I think it is acceptable method, mount/umount operations are not so many
> after all.

Er...  What for, when there's a trivial variant that doesn't suffer those
spurious -EBUSY?  Seriously, just move sysfs removals up to make sure
that any possible fs shutdown won't progress past those during sysfs IO and
use sb_want_write/sb_dont_write to make sure it'll stay r/w for the duration.
What's the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Miao Xie
On Fri, 30 Jan 2015 04:37:14 +, Al Viro wrote:
> On Fri, Jan 30, 2015 at 12:14:24PM +0800, Miao Xie wrote:
>> On Fri, 30 Jan 2015 02:14:45 +, Al Viro wrote:
>>> On Fri, Jan 30, 2015 at 09:44:03AM +0800, Qu Wenruo wrote:
>>>
 This shouldn't happen. If someone is ro, the whole fs should be ro, right?
>>>
>>> Wrong.  Individual vfsmounts over an r/w superblock might very well be r/o.
>>> As for that trylock...  What for?  It invites transient failures for no
>>> good reason.  Removal of sysfs entry will block while write(2) to that 
>>> sucker
>>> is in progress, so btrfs shutdown will block at that point in ctree_close().
>>> It won't go away under you.
>>
>> could you explain the race condition? I think the deadlock won't happen, 
>> during
>> the btrfs shutdown, we hold s_umount, the write operation will fail to lock 
>> it,
>> and quit quickly, and then umount will continue.
> 
>   First of all, ->s_umount is not a mutex; it's rwsem.  So you mean
> down_read_trylock().  As for the transient failures - grep for down_write
> on it...  E.g. have somebody call mount() from the same device.  We call
> sget(), which finds existing superblock and calls grab_super().  Sure,
> that ->s_umount will be released shortly, but in the meanwhile your trylock
> will fail...

I know it, so I suggested to return -EBUSY in the previous mail.
I think it is acceptable method, mount/umount operations are not so many
after all.

Thanks
Miao

> 
>> I think sb_want_write() is similar to trylock(s_umount), the difference is 
>> that
>> sb_want_write() is more complex.
>>
>>>
>>> Now, you might want to move those sysfs entry removals to the very beginning
>>> of btrfs_kill_super(), but that's a different story - you need only to make
>>> sure that they are removed not later than the destruction of the data
>>> structures they need (IOW, the current location might very well be OK - I
>>> hadn't checked the details).
>>
>> Yes, we need move those sysfs entry removals, but needn't move to the very
>> beginning of btrfs_kill_super(), just at the beginning of close_ctree();
> 
> So move them...  It's a matter of moving one function call around a bit.
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Miao Xie 
To: Al Viro , Qu Wenruo 
Date: 2015年01月30日 12:14

On Fri, 30 Jan 2015 02:14:45 +, Al Viro wrote:

On Fri, Jan 30, 2015 at 09:44:03AM +0800, Qu Wenruo wrote:


This shouldn't happen. If someone is ro, the whole fs should be ro, right?

Wrong.  Individual vfsmounts over an r/w superblock might very well be r/o.
As for that trylock...  What for?  It invites transient failures for no
good reason.  Removal of sysfs entry will block while write(2) to that sucker
is in progress, so btrfs shutdown will block at that point in ctree_close().
It won't go away under you.

could you explain the race condition? I think the deadlock won't happen, during
the btrfs shutdown, we hold s_umount, the write operation will fail to lock it,
and quit quickly, and then umount will continue.

I think sb_want_write() is similar to trylock(s_umount), the difference is that
sb_want_write() is more complex.

How?
sb_want_write() should be much like mnt_want_write(), except no need to 
increase vfsmout ref count things

and no need to check per mount ro/rw things.

Thanks,
Qu



Now, you might want to move those sysfs entry removals to the very beginning
of btrfs_kill_super(), but that's a different story - you need only to make
sure that they are removed not later than the destruction of the data
structures they need (IOW, the current location might very well be OK - I
hadn't checked the details).

Yes, we need move those sysfs entry removals, but needn't move to the very
beginning of btrfs_kill_super(), just at the beginning of close_ctree();

The current location is not right, it will introduce the use-after-free
problem. because we remove the sysfs entry after we release
transaction_kthread, use-after-free problem might happen in this case
Task1   Task2
change Label by sysfs
close_ctree
  kthread_stop(transaction_kthread);
  change label
  wake_up(transaction_kthread)


Thanks
Miao


As for "it won't go r/o under us" - sb_want_write() will do that just fine.
.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Al Viro
On Fri, Jan 30, 2015 at 12:14:24PM +0800, Miao Xie wrote:
> On Fri, 30 Jan 2015 02:14:45 +, Al Viro wrote:
> > On Fri, Jan 30, 2015 at 09:44:03AM +0800, Qu Wenruo wrote:
> > 
> >> This shouldn't happen. If someone is ro, the whole fs should be ro, right?
> > 
> > Wrong.  Individual vfsmounts over an r/w superblock might very well be r/o.
> > As for that trylock...  What for?  It invites transient failures for no
> > good reason.  Removal of sysfs entry will block while write(2) to that 
> > sucker
> > is in progress, so btrfs shutdown will block at that point in ctree_close().
> > It won't go away under you.
> 
> could you explain the race condition? I think the deadlock won't happen, 
> during
> the btrfs shutdown, we hold s_umount, the write operation will fail to lock 
> it,
> and quit quickly, and then umount will continue.

First of all, ->s_umount is not a mutex; it's rwsem.  So you mean
down_read_trylock().  As for the transient failures - grep for down_write
on it...  E.g. have somebody call mount() from the same device.  We call
sget(), which finds existing superblock and calls grab_super().  Sure,
that ->s_umount will be released shortly, but in the meanwhile your trylock
will fail...

> I think sb_want_write() is similar to trylock(s_umount), the difference is 
> that
> sb_want_write() is more complex.
> 
> > 
> > Now, you might want to move those sysfs entry removals to the very beginning
> > of btrfs_kill_super(), but that's a different story - you need only to make
> > sure that they are removed not later than the destruction of the data
> > structures they need (IOW, the current location might very well be OK - I
> > hadn't checked the details).
> 
> Yes, we need move those sysfs entry removals, but needn't move to the very
> beginning of btrfs_kill_super(), just at the beginning of close_ctree();

So move them...  It's a matter of moving one function call around a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Miao Xie
On Fri, 30 Jan 2015 02:14:45 +, Al Viro wrote:
> On Fri, Jan 30, 2015 at 09:44:03AM +0800, Qu Wenruo wrote:
> 
>> This shouldn't happen. If someone is ro, the whole fs should be ro, right?
> 
> Wrong.  Individual vfsmounts over an r/w superblock might very well be r/o.
> As for that trylock...  What for?  It invites transient failures for no
> good reason.  Removal of sysfs entry will block while write(2) to that sucker
> is in progress, so btrfs shutdown will block at that point in ctree_close().
> It won't go away under you.

could you explain the race condition? I think the deadlock won't happen, during
the btrfs shutdown, we hold s_umount, the write operation will fail to lock it,
and quit quickly, and then umount will continue.

I think sb_want_write() is similar to trylock(s_umount), the difference is that
sb_want_write() is more complex.

> 
> Now, you might want to move those sysfs entry removals to the very beginning
> of btrfs_kill_super(), but that's a different story - you need only to make
> sure that they are removed not later than the destruction of the data
> structures they need (IOW, the current location might very well be OK - I
> hadn't checked the details).

Yes, we need move those sysfs entry removals, but needn't move to the very
beginning of btrfs_kill_super(), just at the beginning of close_ctree();

The current location is not right, it will introduce the use-after-free
problem. because we remove the sysfs entry after we release
transaction_kthread, use-after-free problem might happen in this case
Task1   Task2
change Label by sysfs
close_ctree
  kthread_stop(transaction_kthread);
  change label
  wake_up(transaction_kthread)


Thanks
Miao

> 
> As for "it won't go r/o under us" - sb_want_write() will do that just fine.
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Miao Xie 
To: Qu Wenruo , 
Date: 2015年01月30日 11:22

On Fri, 30 Jan 2015 10:02:26 +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get
vfsmount from a given sb.
From: Qu Wenruo 
To: Miao Xie , linux-btrfs@vger.kernel.org
Date: 2015年01月30日 09:44

 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get
vfsmount from a given sb.
From: Miao Xie 
To: Qu Wenruo , 
Date: 2015年01月30日 08:52

On Thu, 29 Jan 2015 10:24:39 +0800, Qu Wenruo wrote:

There are sysfs interfaces in some fs, only btrfs yet, which will modify
on-disk data.
Unlike normal file operation routine we can use mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to any file
in the filesystem.
So we can only extract the first vfsmount of a superblock and pass it to
mnt_want_write() to do the protection.

This method is wrong, becasue one fs  may be mounted on the multi places
at the same time, someone is R/O, someone is R/W, you may get a R/O and
fail to get the write permission.

This shouldn't happen. If someone is ro, the whole fs should be ro, right?
You can mount a device which is already mounted as rw to other point as ro,
and remount a mount point to ro will also cause all other mount point to ro.

So I didn't see the problem here.

I think you do label/feature change by sysfs interface by the following way

btrfs_sysfs_change_()
{
 /* Use trylock to avoid the race with umount */
 if(!mutex_trylock(&sb->s_umount))
 return -EBUSY;

 check R/O and FREEZE

 mutex_unlock(&sb->s_umount);
}

This looks better since it not introduce changes to VFS.

Thanks,
Qu

Oh, wait a second, this one leads to the old problem and old solution.

If we hold s_umount mutex, we must do freeze check and can't start transaction
since it will deadlock.

And for freeze check, we must use sb_try_start_intwrite() to hold the freeze
lock and then add a new
btrfs_start_transaction_freeze() which will not call sb_start_write()...

Oh this seems so similar, v2 or v3 version RFC patch?
So still goes to the old method?

No. Just check R/O and RREEZE, if failed, go out. if the check pass,
we start_transaction. Because we do it in s_umount lock, no one can
change fs to R/O or FREEZE.

Maybe the above description is not so clear, explain it again.

btrfs_sysfs_change_()
{
/* Use trylock to avoid the race with umount */
if(!mutex_trylock(&sb->s_umount))
return -EBUSY;

if (fs is R/O or FREEZED) {
mutex_unlock(&sb->s_umount);
return -EACCES;
}

btrfs_start_transaction()
change label/feature
btrfs_commit_transaction()

mutex_unlock(&sb->s_umount);
}
I prefer the sb_want_write() method, since it doesn't even need to hold 
the s_umount mutex.


Thanks,
Qu

Thanks
Miao


Thanks,
Qu

Thanks
Miao


Cc: linux-fsdevel 
Signed-off-by: Qu Wenruo 
---
   fs/namespace.c| 25 +
   include/linux/mount.h |  1 +
   2 files changed, 26 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index cd1e968..5a16a62 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
   }
   EXPORT_SYMBOL(mntget);
   +/*
+ * get a vfsmount from a given sb
+ *
+ * This is especially used for case where change fs' sysfs interface
+ * will lead to a write, e.g. Change label through sysfs in btrfs.
+ * So vfs can get a vfsmount and then use mnt_want_write() to protect.
+ */
+struct vfsmount *get_vfsmount_sb(struct super_block *sb)
+{
+struct vfsmount *ret_vfs = NULL;
+struct mount *mnt;
+int ret = 0;
+
+lock_mount_hash();
+if (list_empty(&sb->s_mounts))
+goto out;
+mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
+ret_vfs = &mnt->mnt;
+ret_vfs = mntget(ret_vfs);
+out:
+unlock_mount_hash();
+return ret_vfs;
+}
+EXPORT_SYMBOL(get_vfsmount_sb);
+
   struct vfsmount *mnt_clone_internal(struct path *path)
   {
   struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c2c561d..cf1b0f5 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write_file(struct file *file);
   extern void mntput(struct vfsmount *mnt);
   extern struct vfsmount *mntget(struct vfsmount *mnt);
   extern struct vfsmount *mnt_clone_internal(struct path *path);
+extern struct vfsmount *get_vfsmount_sb(struct super_block *sb);
   extern int __mnt_is_readonly(struct vfsmount *mnt);
 struct path;


.



--
To unsubscribe from this list: send the line "unsubscribe li

Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Miao Xie
On Fri, 30 Jan 2015 10:02:26 +0800, Qu Wenruo wrote:
> 
>  Original Message 
> Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get
> vfsmount from a given sb.
> From: Qu Wenruo 
> To: Miao Xie , linux-btrfs@vger.kernel.org
> Date: 2015年01月30日 09:44
>>
>>  Original Message ----
>> Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get
>> vfsmount from a given sb.
>> From: Miao Xie 
>> To: Qu Wenruo , 
>> Date: 2015年01月30日 08:52
>>> On Thu, 29 Jan 2015 10:24:39 +0800, Qu Wenruo wrote:
>>>> There are sysfs interfaces in some fs, only btrfs yet, which will modify
>>>> on-disk data.
>>>> Unlike normal file operation routine we can use mnt_want_write_file() to
>>>> protect the operation, change through sysfs won't to be binded to any file
>>>> in the filesystem.
>>>> So we can only extract the first vfsmount of a superblock and pass it to
>>>> mnt_want_write() to do the protection.
>>> This method is wrong, becasue one fs  may be mounted on the multi places
>>> at the same time, someone is R/O, someone is R/W, you may get a R/O and
>>> fail to get the write permission.
>> This shouldn't happen. If someone is ro, the whole fs should be ro, right?
>> You can mount a device which is already mounted as rw to other point as ro,
>> and remount a mount point to ro will also cause all other mount point to ro.
>>
>> So I didn't see the problem here.
>>>
>>> I think you do label/feature change by sysfs interface by the following way
>>>
>>> btrfs_sysfs_change_()
>>> {
>>> /* Use trylock to avoid the race with umount */
>>> if(!mutex_trylock(&sb->s_umount))
>>> return -EBUSY;
>>>
>>> check R/O and FREEZE
>>>
>>> mutex_unlock(&sb->s_umount);
>>> }
>> This looks better since it not introduce changes to VFS.
>>
>> Thanks,
>> Qu
> Oh, wait a second, this one leads to the old problem and old solution.
> 
> If we hold s_umount mutex, we must do freeze check and can't start transaction
> since it will deadlock.
> 
> And for freeze check, we must use sb_try_start_intwrite() to hold the freeze
> lock and then add a new
> btrfs_start_transaction_freeze() which will not call sb_start_write()...
> 
> Oh this seems so similar, v2 or v3 version RFC patch?
> So still goes to the old method?

No. Just check R/O and RREEZE, if failed, go out. if the check pass,
we start_transaction. Because we do it in s_umount lock, no one can
change fs to R/O or FREEZE.

Maybe the above description is not so clear, explain it again.

btrfs_sysfs_change_()
{
/* Use trylock to avoid the race with umount */
if(!mutex_trylock(&sb->s_umount))
return -EBUSY;

if (fs is R/O or FREEZED) {
mutex_unlock(&sb->s_umount);
return -EACCES;
}

btrfs_start_transaction()
change label/feature
btrfs_commit_transaction()

mutex_unlock(&sb->s_umount);
}

Thanks
Miao

> 
> Thanks,
> Qu
>>>
>>> Thanks
>>> Miao
>>>
>>>> Cc: linux-fsdevel 
>>>> Signed-off-by: Qu Wenruo 
>>>> ---
>>>>   fs/namespace.c| 25 +
>>>>   include/linux/mount.h |  1 +
>>>>   2 files changed, 26 insertions(+)
>>>>
>>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>>> index cd1e968..5a16a62 100644
>>>> --- a/fs/namespace.c
>>>> +++ b/fs/namespace.c
>>>> @@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
>>>>   }
>>>>   EXPORT_SYMBOL(mntget);
>>>>   +/*
>>>> + * get a vfsmount from a given sb
>>>> + *
>>>> + * This is especially used for case where change fs' sysfs interface
>>>> + * will lead to a write, e.g. Change label through sysfs in btrfs.
>>>> + * So vfs can get a vfsmount and then use mnt_want_write() to protect.
>>>> + */
>>>> +struct vfsmount *get_vfsmount_sb(struct super_block *sb)
>>>> +{
>>>> +struct vfsmount *ret_vfs = NULL;
>>>> +struct mount *mnt;
>>>> +int ret = 0;
>>>> +
>>>> +lock_mount_hash();
>>>> +if (list_empty(&sb->s_mounts))
>>>> +goto out;
>>>> +mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
>>&g

Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Al Viro 
To: Qu Wenruo 
Date: 2015年01月30日 10:09

On Fri, Jan 30, 2015 at 09:11:26AM +0800, Qu Wenruo wrote:

For the mounted ro case, that's not a problem, since if one instance
is mounted ro,
other instances are also mounted ro, so that's not a problem.

Not really.

root@satch:~# cd /tmp/
root@satch:~# mkdir /tm/a
root@satch:~# mount --bind /tmp/ /tmp/a
root@satch:~# mount --bind -o remount,ro /tmp/a
root@satch:~# mkdir /tmp/b
root@satch:~# mkdir /tmp/a/c
mkdir: cannot create directory '/tmp/a/c': Read-only file system
root@satch:~# stat /tmp/b /tmp/a/b
   File: '/tmp/b'
   Size: 4096Blocks: 8  IO Block: 4096   directory
Device: 806h/2054d  Inode: 257537  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2015-01-29 21:03:35.0 -0500
Modify: 2015-01-29 21:03:35.0 -0500
Change: 2015-01-29 21:03:35.0 -0500
  Birth: -
   File: '/tmp/a/b'
   Size: 4096Blocks: 8  IO Block: 4096   directory
Device: 806h/2054d  Inode: 257537  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2015-01-29 21:03:35.0 -0500
Modify: 2015-01-29 21:03:35.0 -0500
Change: 2015-01-29 21:03:35.0 -0500
  Birth: -
root@satch:~#

IOW, /tmp and /tmp/a bear the same filesystem (one from /dev/sda6), the former
is mounted r/w, the latter - r/o.

Oh, bind mount.

I was so stupid to forget bind.



Assuming that your superblock is guaranteed to stay alive and usable for
whatever work you are trying to do, what's wrong with sb_want_write()?

Did you mean change the function name and it's parameter to
sb_want_write(sb) and sb_drop_write(sb).
That looks much better.
But I'm a little worried about just using sb_start_write() and
s_readonly_remount/s_flags to do the protection,
will it be enough?

Protection against what?  If superblock is r/w, it will stay r/w until you
do sb_drop_write(); if it's r/o, sb_want_write() will fail.  There might be
any number of vfsmounts over superblock; attempt to get write access via
vfsmount will succeed only of both the vfsmount and superblock are r/w -
mnt_want_write() does sb_want_write() and fails if that has failed.

Nice.
I'll add sb_want_write() and sb_drop_write().

Tons of thanks for all your advice!
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Al Viro
On Fri, Jan 30, 2015 at 09:44:03AM +0800, Qu Wenruo wrote:

> This shouldn't happen. If someone is ro, the whole fs should be ro, right?

Wrong.  Individual vfsmounts over an r/w superblock might very well be r/o.
As for that trylock...  What for?  It invites transient failures for no
good reason.  Removal of sysfs entry will block while write(2) to that sucker
is in progress, so btrfs shutdown will block at that point in ctree_close().
It won't go away under you.

Now, you might want to move those sysfs entry removals to the very beginning
of btrfs_kill_super(), but that's a different story - you need only to make
sure that they are removed not later than the destruction of the data
structures they need (IOW, the current location might very well be OK - I
hadn't checked the details).

As for "it won't go r/o under us" - sb_want_write() will do that just fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Al Viro
On Fri, Jan 30, 2015 at 09:11:26AM +0800, Qu Wenruo wrote:
> For the mounted ro case, that's not a problem, since if one instance
> is mounted ro,
> other instances are also mounted ro, so that's not a problem.

Not really.

root@satch:~# cd /tmp/
root@satch:~# mkdir /tm/a
root@satch:~# mount --bind /tmp/ /tmp/a
root@satch:~# mount --bind -o remount,ro /tmp/a
root@satch:~# mkdir /tmp/b  
root@satch:~# mkdir /tmp/a/c
mkdir: cannot create directory '/tmp/a/c': Read-only file system
root@satch:~# stat /tmp/b /tmp/a/b
  File: '/tmp/b'
  Size: 4096Blocks: 8  IO Block: 4096   directory
Device: 806h/2054d  Inode: 257537  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2015-01-29 21:03:35.0 -0500
Modify: 2015-01-29 21:03:35.0 -0500
Change: 2015-01-29 21:03:35.0 -0500
 Birth: -
  File: '/tmp/a/b'
  Size: 4096Blocks: 8  IO Block: 4096   directory
Device: 806h/2054d  Inode: 257537  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2015-01-29 21:03:35.0 -0500
Modify: 2015-01-29 21:03:35.0 -0500
Change: 2015-01-29 21:03:35.0 -0500
 Birth: -
root@satch:~#

IOW, /tmp and /tmp/a bear the same filesystem (one from /dev/sda6), the former
is mounted r/w, the latter - r/o.

> >Assuming that your superblock is guaranteed to stay alive and usable for
> >whatever work you are trying to do, what's wrong with sb_want_write()?
> Did you mean change the function name and it's parameter to
> sb_want_write(sb) and sb_drop_write(sb).
> That looks much better.
> But I'm a little worried about just using sb_start_write() and
> s_readonly_remount/s_flags to do the protection,
> will it be enough?

Protection against what?  If superblock is r/w, it will stay r/w until you
do sb_drop_write(); if it's r/o, sb_want_write() will fail.  There might be
any number of vfsmounts over superblock; attempt to get write access via
vfsmount will succeed only of both the vfsmount and superblock are r/w -
mnt_want_write() does sb_want_write() and fails if that has failed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Qu Wenruo 
To: Miao Xie , linux-btrfs@vger.kernel.org
Date: 2015年01月30日 09:44


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Miao Xie 
To: Qu Wenruo , 
Date: 2015年01月30日 08:52

On Thu, 29 Jan 2015 10:24:39 +0800, Qu Wenruo wrote:
There are sysfs interfaces in some fs, only btrfs yet, which will 
modify

on-disk data.
Unlike normal file operation routine we can use 
mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to 
any file

in the filesystem.
So we can only extract the first vfsmount of a superblock and pass 
it to

mnt_want_write() to do the protection.

This method is wrong, becasue one fs  may be mounted on the multi places
at the same time, someone is R/O, someone is R/W, you may get a R/O and
fail to get the write permission.
This shouldn't happen. If someone is ro, the whole fs should be ro, 
right?
You can mount a device which is already mounted as rw to other point 
as ro,
and remount a mount point to ro will also cause all other mount point 
to ro.


So I didn't see the problem here.


I think you do label/feature change by sysfs interface by the 
following way


btrfs_sysfs_change_()
{
/* Use trylock to avoid the race with umount */
if(!mutex_trylock(&sb->s_umount))
return -EBUSY;

check R/O and FREEZE

mutex_unlock(&sb->s_umount);
}

This looks better since it not introduce changes to VFS.

Thanks,
Qu

Oh, wait a second, this one leads to the old problem and old solution.

If we hold s_umount mutex, we must do freeze check and can't start 
transaction since it will deadlock.


And for freeze check, we must use sb_try_start_intwrite() to hold the 
freeze lock and then add a new

btrfs_start_transaction_freeze() which will not call sb_start_write()...

Oh this seems so similar, v2 or v3 version RFC patch?
So still goes to the old method?

Thanks,
Qu


Thanks
Miao


Cc: linux-fsdevel 
Signed-off-by: Qu Wenruo 
---
  fs/namespace.c| 25 +
  include/linux/mount.h |  1 +
  2 files changed, 26 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index cd1e968..5a16a62 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
  }
  EXPORT_SYMBOL(mntget);
  +/*
+ * get a vfsmount from a given sb
+ *
+ * This is especially used for case where change fs' sysfs interface
+ * will lead to a write, e.g. Change label through sysfs in btrfs.
+ * So vfs can get a vfsmount and then use mnt_want_write() to protect.
+ */
+struct vfsmount *get_vfsmount_sb(struct super_block *sb)
+{
+struct vfsmount *ret_vfs = NULL;
+struct mount *mnt;
+int ret = 0;
+
+lock_mount_hash();
+if (list_empty(&sb->s_mounts))
+goto out;
+mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
+ret_vfs = &mnt->mnt;
+ret_vfs = mntget(ret_vfs);
+out:
+unlock_mount_hash();
+return ret_vfs;
+}
+EXPORT_SYMBOL(get_vfsmount_sb);
+
  struct vfsmount *mnt_clone_internal(struct path *path)
  {
  struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c2c561d..cf1b0f5 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write_file(struct file *file);
  extern void mntput(struct vfsmount *mnt);
  extern struct vfsmount *mntget(struct vfsmount *mnt);
  extern struct vfsmount *mnt_clone_internal(struct path *path);
+extern struct vfsmount *get_vfsmount_sb(struct super_block *sb);
  extern int __mnt_is_readonly(struct vfsmount *mnt);
struct path;





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Miao Xie 
To: Qu Wenruo , 
Date: 2015年01月30日 08:52

On Thu, 29 Jan 2015 10:24:39 +0800, Qu Wenruo wrote:

There are sysfs interfaces in some fs, only btrfs yet, which will modify
on-disk data.
Unlike normal file operation routine we can use mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to any file
in the filesystem.
So we can only extract the first vfsmount of a superblock and pass it to
mnt_want_write() to do the protection.

This method is wrong, becasue one fs  may be mounted on the multi places
at the same time, someone is R/O, someone is R/W, you may get a R/O and
fail to get the write permission.

This shouldn't happen. If someone is ro, the whole fs should be ro, right?
You can mount a device which is already mounted as rw to other point as ro,
and remount a mount point to ro will also cause all other mount point to ro.

So I didn't see the problem here.


I think you do label/feature change by sysfs interface by the following way

btrfs_sysfs_change_()
{
/* Use trylock to avoid the race with umount */
if(!mutex_trylock(&sb->s_umount))
return -EBUSY;

check R/O and FREEZE

mutex_unlock(&sb->s_umount);
}

This looks better since it not introduce changes to VFS.

Thanks,
Qu


Thanks
Miao


Cc: linux-fsdevel 
Signed-off-by: Qu Wenruo 
---
  fs/namespace.c| 25 +
  include/linux/mount.h |  1 +
  2 files changed, 26 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index cd1e968..5a16a62 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
  }
  EXPORT_SYMBOL(mntget);
  
+/*

+ * get a vfsmount from a given sb
+ *
+ * This is especially used for case where change fs' sysfs interface
+ * will lead to a write, e.g. Change label through sysfs in btrfs.
+ * So vfs can get a vfsmount and then use mnt_want_write() to protect.
+ */
+struct vfsmount *get_vfsmount_sb(struct super_block *sb)
+{
+   struct vfsmount *ret_vfs = NULL;
+   struct mount *mnt;
+   int ret = 0;
+
+   lock_mount_hash();
+   if (list_empty(&sb->s_mounts))
+   goto out;
+   mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
+   ret_vfs = &mnt->mnt;
+   ret_vfs = mntget(ret_vfs);
+out:
+   unlock_mount_hash();
+   return ret_vfs;
+}
+EXPORT_SYMBOL(get_vfsmount_sb);
+
  struct vfsmount *mnt_clone_internal(struct path *path)
  {
struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c2c561d..cf1b0f5 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write_file(struct file *file);
  extern void mntput(struct vfsmount *mnt);
  extern struct vfsmount *mntget(struct vfsmount *mnt);
  extern struct vfsmount *mnt_clone_internal(struct path *path);
+extern struct vfsmount *get_vfsmount_sb(struct super_block *sb);
  extern int __mnt_is_readonly(struct vfsmount *mnt);
  
  struct path;




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function 
to get vfsmount from a given sb.

From: Al Viro 
To: , Qu Wenruo , 
, linux-fsdevel 

Date: 2015年01月29日 23:23

On Thu, Jan 29, 2015 at 01:37:58PM +0100, David Sterba wrote:

Adding Al Viro into CC

On Thu, Jan 29, 2015 at 10:24:39AM +0800, Qu Wenruo wrote:

+struct vfsmount *get_vfsmount_sb(struct super_block *sb)
+{
+   struct vfsmount *ret_vfs = NULL;
+   struct mount *mnt;
+   int ret = 0;
+
+   lock_mount_hash();
+   if (list_empty(&sb->s_mounts))
+   goto out;
+   mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);

from include/linux/fs.h:

struct super_block {
...
struct list_heads_mounts;   /* list of mounts; _not_ for fs 
use */
...
};

I hear a storm in the distance coming our direction ... so I'll
preemptively NAK this change.

Could you explain what the devil is that for?
In sysfs interface handler, we don't have struct file or vfsmount to do 
the mnt_want_write* protection,

so this function is introduced to get a vfsmount and do the protection.

The primitive looks rather
bogus - if nothing else, it includes "make random instance of the filesystem
in someone's namespace appear busy to umount", which doesn't look like a
part of useful interface...

This is the problem I didn't find a good way to avoid.

If using the function, it will always use the first(or last?) vfsmount 
of the filesystem.
For 1 to 1 match case, it's OK, but if one device is mounted on multiple 
mount points,
it will delay the umount of that mount point. But we only need to keep 
at least one rw mount point exists.

   The only piece of context I'd been able to find
was something vague about sysfs-inflicted operations and wanting to use
mnt_want_write() but having nothing to pass it; BTW, what if the (random)
instance you run into happens to mounted r/o?
For the mounted ro case, that's not a problem, since if one instance is 
mounted ro,

other instances are also mounted ro, so that's not a problem.



Assuming that your superblock is guaranteed to stay alive and usable for
whatever work you are trying to do, what's wrong with sb_want_write()?
Did you mean change the function name and it's parameter to 
sb_want_write(sb) and sb_drop_write(sb).

That looks much better.
But I'm a little worried about just using sb_start_write() and 
s_readonly_remount/s_flags to do the protection,

will it be enough?

Thanks,
Qu
  


If it's _not_ guaranteed to stay so, and this is what you are trying to
solve, you are doing that at the wrong level - just take sysfs entry
removals earlier in shutdown process and be done with that.  Beginning of
close_ctree() would probably be early enough to be safe, but if that's
not enough, you can take it into the beginning of btrfs_kill_super().


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Miao Xie
On Thu, 29 Jan 2015 10:24:39 +0800, Qu Wenruo wrote:
> There are sysfs interfaces in some fs, only btrfs yet, which will modify
> on-disk data.
> Unlike normal file operation routine we can use mnt_want_write_file() to
> protect the operation, change through sysfs won't to be binded to any file
> in the filesystem.
> So we can only extract the first vfsmount of a superblock and pass it to
> mnt_want_write() to do the protection.

This method is wrong, becasue one fs  may be mounted on the multi places
at the same time, someone is R/O, someone is R/W, you may get a R/O and
fail to get the write permission.

I think you do label/feature change by sysfs interface by the following way

btrfs_sysfs_change_()
{
/* Use trylock to avoid the race with umount */
if(!mutex_trylock(&sb->s_umount))
return -EBUSY;

check R/O and FREEZE

mutex_unlock(&sb->s_umount);
}

Thanks
Miao

> 
> Cc: linux-fsdevel 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/namespace.c| 25 +
>  include/linux/mount.h |  1 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index cd1e968..5a16a62 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
>  }
>  EXPORT_SYMBOL(mntget);
>  
> +/*
> + * get a vfsmount from a given sb
> + *
> + * This is especially used for case where change fs' sysfs interface
> + * will lead to a write, e.g. Change label through sysfs in btrfs.
> + * So vfs can get a vfsmount and then use mnt_want_write() to protect.
> + */
> +struct vfsmount *get_vfsmount_sb(struct super_block *sb)
> +{
> + struct vfsmount *ret_vfs = NULL;
> + struct mount *mnt;
> + int ret = 0;
> +
> + lock_mount_hash();
> + if (list_empty(&sb->s_mounts))
> + goto out;
> + mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
> + ret_vfs = &mnt->mnt;
> + ret_vfs = mntget(ret_vfs);
> +out:
> + unlock_mount_hash();
> + return ret_vfs;
> +}
> +EXPORT_SYMBOL(get_vfsmount_sb);
> +
>  struct vfsmount *mnt_clone_internal(struct path *path)
>  {
>   struct mount *p;
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index c2c561d..cf1b0f5 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -79,6 +79,7 @@ extern void mnt_drop_write_file(struct file *file);
>  extern void mntput(struct vfsmount *mnt);
>  extern struct vfsmount *mntget(struct vfsmount *mnt);
>  extern struct vfsmount *mnt_clone_internal(struct path *path);
> +extern struct vfsmount *get_vfsmount_sb(struct super_block *sb);
>  extern int __mnt_is_readonly(struct vfsmount *mnt);
>  
>  struct path;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread Al Viro
On Thu, Jan 29, 2015 at 01:37:58PM +0100, David Sterba wrote:
> Adding Al Viro into CC
> 
> On Thu, Jan 29, 2015 at 10:24:39AM +0800, Qu Wenruo wrote:
> > +struct vfsmount *get_vfsmount_sb(struct super_block *sb)
> > +{
> > +   struct vfsmount *ret_vfs = NULL;
> > +   struct mount *mnt;
> > +   int ret = 0;
> > +
> > +   lock_mount_hash();
> > +   if (list_empty(&sb->s_mounts))
> > +   goto out;
> > +   mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
> 
> from include/linux/fs.h:
> 
> struct super_block {
> ...
>   struct list_heads_mounts;   /* list of mounts; _not_ for fs 
> use */
> ...
> };
> 
> I hear a storm in the distance coming our direction ... so I'll
> preemptively NAK this change.

Could you explain what the devil is that for?  The primitive looks rather
bogus - if nothing else, it includes "make random instance of the filesystem
in someone's namespace appear busy to umount", which doesn't look like a
part of useful interface...  The only piece of context I'd been able to find
was something vague about sysfs-inflicted operations and wanting to use
mnt_want_write() but having nothing to pass it; BTW, what if the (random)
instance you run into happens to mounted r/o?

Assuming that your superblock is guaranteed to stay alive and usable for
whatever work you are trying to do, what's wrong with sb_want_write()?  

If it's _not_ guaranteed to stay so, and this is what you are trying to
solve, you are doing that at the wrong level - just take sysfs entry
removals earlier in shutdown process and be done with that.  Beginning of
close_ctree() would probably be early enough to be safe, but if that's
not enough, you can take it into the beginning of btrfs_kill_super().
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-29 Thread David Sterba
Adding Al Viro into CC

On Thu, Jan 29, 2015 at 10:24:39AM +0800, Qu Wenruo wrote:
> +struct vfsmount *get_vfsmount_sb(struct super_block *sb)
> +{
> + struct vfsmount *ret_vfs = NULL;
> + struct mount *mnt;
> + int ret = 0;
> +
> + lock_mount_hash();
> + if (list_empty(&sb->s_mounts))
> + goto out;
> + mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);

from include/linux/fs.h:

struct super_block {
...
struct list_heads_mounts;   /* list of mounts; _not_ for fs 
use */
...
};

I hear a storm in the distance coming our direction ... so I'll
preemptively NAK this change.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND v4 6/8] vfs: Add get_vfsmount_sb() function to get vfsmount from a given sb.

2015-01-28 Thread Qu Wenruo
There are sysfs interfaces in some fs, only btrfs yet, which will modify
on-disk data.
Unlike normal file operation routine we can use mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to any file
in the filesystem.
So we can only extract the first vfsmount of a superblock and pass it to
mnt_want_write() to do the protection.

Cc: linux-fsdevel 
Signed-off-by: Qu Wenruo 
---
 fs/namespace.c| 25 +
 include/linux/mount.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index cd1e968..5a16a62 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1105,6 +1105,31 @@ struct vfsmount *mntget(struct vfsmount *mnt)
 }
 EXPORT_SYMBOL(mntget);
 
+/*
+ * get a vfsmount from a given sb
+ *
+ * This is especially used for case where change fs' sysfs interface
+ * will lead to a write, e.g. Change label through sysfs in btrfs.
+ * So vfs can get a vfsmount and then use mnt_want_write() to protect.
+ */
+struct vfsmount *get_vfsmount_sb(struct super_block *sb)
+{
+   struct vfsmount *ret_vfs = NULL;
+   struct mount *mnt;
+   int ret = 0;
+
+   lock_mount_hash();
+   if (list_empty(&sb->s_mounts))
+   goto out;
+   mnt = list_entry(sb->s_mounts.next, struct mount, mnt_instance);
+   ret_vfs = &mnt->mnt;
+   ret_vfs = mntget(ret_vfs);
+out:
+   unlock_mount_hash();
+   return ret_vfs;
+}
+EXPORT_SYMBOL(get_vfsmount_sb);
+
 struct vfsmount *mnt_clone_internal(struct path *path)
 {
struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c2c561d..cf1b0f5 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write_file(struct file *file);
 extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
+extern struct vfsmount *get_vfsmount_sb(struct super_block *sb);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
 
 struct path;
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html