Re: [PATCH v4 03/11] lsm: add file opener's cred to a setprocattr arguments

2015-11-10 Thread Al Viro
On Wed, Oct 14, 2015 at 02:41:57PM +0200, Lukasz Pawelczyk wrote:
>   int (*getprocattr)(struct task_struct *p, char *name, char **value);
> - int (*setprocattr)(struct task_struct *p, char *name, void *value,
> - size_t size);
> + int (*setprocattr)(struct task_struct *p, const struct cred *f_cred,
> +char *name, void *value, size_t size);

*grumble*

Why the hell is that thing taking char *name - not even const char *?
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-17 Thread Al Viro
On Tue, Nov 17, 2015 at 10:39:03AM -0600, Seth Forshee wrote:
> Hi Eric,
> 
> Here's another update to my patches for user namespace mounts, based on
> your for-testing branch. These patches add safeguards necessary to allow
> unprivileged mounts and update SELinux and Smack to safely handle
> device-backed mounts from unprivileged users.
> 
> The v2 posting received very little in the way of feedback, so changes
> are minimal. I've made a trivial style change to the Smack changes at
> Casey's request, and I've added Stephen's ack for the SELinux changes.

Would you mind explaining which filesystem types do you plan to allow?
SELinux and the rest of Linux S&M bunch do fuck-all for attacks via
handcrafted fs image fed to the code in fs driver that does not expect
a given kind of inconsistencies.

As it is, validation of on-disk metadata is not particularly strong;
what's more, protection against concurrent malicious *changes* of
fs image (via direct writes by root) is simply inexistent.

So what is that about?
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-17 Thread Al Viro
On Tue, Nov 17, 2015 at 11:25:51AM -0600, Seth Forshee wrote:

> Shortly after that I plan to follow with support for ext4. I've been
> fuzzing ext4 for a while now and it has held up well, and I'm currently
> working on hand-crafted attacks. Ted has commented privately (to others,
> not to me personally) that he will fix bugs for such attacks, though I
> haven't seen any public comments to that effect.

_Static_ attacks, or change-image-under-mounted-fs attacks?
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-17 Thread Al Viro
On Tue, Nov 17, 2015 at 02:02:09PM -0500, Austin S Hemmelgarn wrote:

> >_Static_ attacks, or change-image-under-mounted-fs attacks?
> To properly protect against attacks on mounted filesystems, we'd
> need some new concept of a userspace immutable file (that is, one
> where nobody can write to it except the kernel, and only the kernel
> can change it between regular access and this new state), and then
> have the kernel set an image (or block device) to this state when a
> filesystem is mounted from it (this introduces all kinds of other
> issues too however, for example stuff that allows an online fsck on
> the device will stop working, as will many un-deletion tools).
> 
> The only other option would be to force the FS to cache all metadata
> in memory, and validate between the cache and what's on disk on
> every access, which is not realistic for any real world system.

Doctor, it hurt when I do it...

IOW, the other option is to refuse attempting this insanity.  Fuse probably
can be handled, but being able to mount (with kernel-space drivera) an
arbitrary ext4 image is equivalent to being able to do anything and it's
going to stay that way for the forseeable future.  You are talking about
a large pile of code that deals with rather convoluted data structure,
had not been written with validation in mind *and* keeps being developed.
What's more, that code runs with maximal priveleges there are.

This is absolutely insane, no matter how much LSM snake oil you slatter on
the whole thing.  All of a sudden you are exposing a huge attack surface
in the place where it would hurt most and as the consolation we are offered
basically "Ted is willing to fix holes when they are found".

I know that security community tends to be less than sane, but this really
takes the damn cake.

Al, still not quite able to believe this is not a badly mistimed AFD posting...
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-17 Thread Al Viro
On Tue, Nov 17, 2015 at 03:39:16PM -0500, Austin S Hemmelgarn wrote:

> >This is absolutely insane, no matter how much LSM snake oil you slatter on
> >the whole thing.  All of a sudden you are exposing a huge attack surface
> >in the place where it would hurt most and as the consolation we are offered
> >basically "Ted is willing to fix holes when they are found".
> For the context of static image attacks, anything that's found
> _needs_ to be fixed regardless, and unless you can find some way to
> actually prevent attacks on mounted filesystems that doesn't involve
> a complete re-write of the filesystem drivers, then there's not much
> we can do about it.  Yes, unprivileged mounts expose an attack
> surface, but so does userspace access to the network stack, and so
> do a lot of other features that are considered essential in a modern
> general purpose operating system.

"X is exposes an attack surface.  Y exposes a diferent attack surface.
Y is considered important.  Therefore X is important enough to implement it"

Right...
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-18 Thread Al Viro
On Wed, Nov 18, 2015 at 08:22:38AM -0600, Seth Forshee wrote:

> But it still requires the admin set it up that way, no? And aren't
> privileges required to set up those devices in the first place?
> 
> I'm not saying that it wouldn't be a good idea to lock down the backing
> stores for those types of devices too, just that it isn't something that
> a regular user could exploit without an admin doing something to
> facilitate it.

Sigh...  If it boils down to "all admins within all containers must be
trusted not to try and break out" (along with "roothole in any container
escalates to kernel-mode code execution on host"), then what the fuck
is the *point* of bothering with containers, userns, etc. in the first
place?  If your model is basically "you want isolation, just use kvm",
fine, but where's the place for userns in all that?

And if you are talking about the _host_ admin, then WTF not have him just
mount what's needed as part of setup and to hell with mounting those
inside the container?

Look at that from the hosting company POV - they are offering a bunch of
virtual machines on one physical system.  And you want the admins on those
virtual machines independent from the host admin.  Fine, but then you
really need to keep them unable to screw each other or gain kernel-mode
execution on the host.

Again, what's the point of all that?  I assumed the model where containers
do, you know, contain what's in them, regardless of trust.  You guys seem
to assume something different and I really wonder what it _is_...
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH v3 0/7] User namespace mount updates

2015-11-18 Thread Al Viro
On Wed, Nov 18, 2015 at 09:05:12AM -0600, Seth Forshee wrote:

> Yes, the host admin. I'm not talking about trusting the admin inside the
> container at all.

Then why not have the same host admin just plain mount it when setting the
container up and be done with that?  From the host namespace, before spawning
the docker instance or whatever framework you are using.  IDGI...
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()

2018-06-25 Thread Al Viro
On Sat, Jun 23, 2018 at 10:57:06AM +0900, David Miller wrote:
> From: Paul Moore 
> Date: Fri, 22 Jun 2018 17:18:20 -0400
> 
> > -   const mm_segment_t old_fs = get_fs();
> > -
> > -   set_fs(KERNEL_DS);
> > -   ret_val = ipv6_renew_options(sk, opt, newtype,
> > -(struct ipv6_opt_hdr __user *)newopt,
> > -newoptlen);
> > -   set_fs(old_fs);
> 
> So is it really the case that the traditional construct:
> 
>   set_fs(KERNEL_DS);
>   ... copy_{from,to}_user(...);
>   set_fs(old_fs);
> 
> is no longer allowed?

s/no longer allowed/best avoided/, but IMO in this case the replacement is
too ugly to live ;-/
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()

2018-06-25 Thread Al Viro
On Sat, Jun 23, 2018 at 10:26:27PM +0100, Al Viro wrote:
> On Sat, Jun 23, 2018 at 10:57:06AM +0900, David Miller wrote:
> > From: Paul Moore 
> > Date: Fri, 22 Jun 2018 17:18:20 -0400
> > 
> > > - const mm_segment_t old_fs = get_fs();
> > > -
> > > - set_fs(KERNEL_DS);
> > > - ret_val = ipv6_renew_options(sk, opt, newtype,
> > > -  (struct ipv6_opt_hdr __user *)newopt,
> > > -  newoptlen);
> > > - set_fs(old_fs);
> > 
> > So is it really the case that the traditional construct:
> > 
> > set_fs(KERNEL_DS);
> > ... copy_{from,to}_user(...);
> > set_fs(old_fs);
> > 
> > is no longer allowed?
> 
> s/no longer allowed/best avoided/, but IMO in this case the replacement is
> too ugly to live ;-/

BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
ipv6_renew_options_kern()...

Incidentally, is copying the entire value in case newoptlen > ipv6_optlen(...)
the right thing?  After all, the next update in any of those options will
quietly lose everything past ipv6_optlen(...), wouldn't it?

IOW, how about the following (completely untested):

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 16475c269749..d02881e4ad1f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -355,14 +355,7 @@ struct ipv6_txoptions *ipv6_dup_options(struct sock *sk,
 struct ipv6_txoptions *ipv6_renew_options(struct sock *sk,
  struct ipv6_txoptions *opt,
  int newtype,
- struct ipv6_opt_hdr __user *newopt,
- int newoptlen);
-struct ipv6_txoptions *
-ipv6_renew_options_kern(struct sock *sk,
-   struct ipv6_txoptions *opt,
-   int newtype,
-   struct ipv6_opt_hdr *newopt,
-   int newoptlen);
+ struct ipv6_opt_hdr *newopt);
 struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
  struct ipv6_txoptions *opt);
 
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 1323b9679cf7..1c0bb9fb76e6 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -799,8 +799,7 @@ static int calipso_opt_update(struct sock *sk, struct 
ipv6_opt_hdr *hop)
 {
struct ipv6_txoptions *old = txopt_get(inet6_sk(sk)), *txopts;
 
-   txopts = ipv6_renew_options_kern(sk, old, IPV6_HOPOPTS,
-hop, hop ? ipv6_optlen(hop) : 0);
+   txopts = ipv6_renew_options(sk, old, IPV6_HOPOPTS, hop);
txopt_put(old);
if (IS_ERR(txopts))
return PTR_ERR(txopts);
@@ -1222,8 +1221,7 @@ static int calipso_req_setattr(struct request_sock *req,
if (IS_ERR(new))
return PTR_ERR(new);
 
-   txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
-new, new ? ipv6_optlen(new) : 0);
+   txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
 
kfree(new);
 
@@ -1260,8 +1258,7 @@ static void calipso_req_delattr(struct request_sock *req)
if (calipso_opt_del(req_inet->ipv6_opt->hopopt, &new))
return; /* Nothing to do */
 
-   txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
-new, new ? ipv6_optlen(new) : 0);
+   txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
 
if (!IS_ERR(txopts)) {
txopts = xchg(&req_inet->ipv6_opt, txopts);
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 5bc2bf3733ab..4b6915eedfc3 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -1015,29 +1015,15 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions 
*opt)
 }
 EXPORT_SYMBOL_GPL(ipv6_dup_options);
 
-static int ipv6_renew_option(void *ohdr,
-struct ipv6_opt_hdr __user *newopt, int newoptlen,
-int inherit,
+static void ipv6_renew_option(struct ipv6_opt_hdr *opt,
 struct ipv6_opt_hdr **hdr,
 char **p)
 {
-   if (inherit) {
-   if (ohdr) {
-   memcpy(*p, ohdr, ipv6_optlen((struct ipv6_opt_hdr 
*)ohdr));
-   *hdr = (struct ipv6_opt_hdr *)*p;
-   *p += CMSG_ALIGN(ipv6_optlen(*hdr));
-   }
-   } else {
-   if (newopt) {
-   if (copy_from_user(*

Re: BUG: Mount ignores mount options

2018-08-10 Thread Al Viro
On Fri, Aug 10, 2018 at 09:05:22AM -0500, Eric W. Biederman wrote:
> 
> There is a serious problem with mount options today that fsopen does not
> address.  The problem is that mount options are ignored for block based
> filesystems, and any other type of filesystem that follows the same
> pattern.
> 
> The script below demonstrates this bug.  Showing this bug can cause the
> ext4 "acl" "quota" and "user_xattr" options to be silently ignored.
> 
> fsopen has my nack until it addresses this issue.
> 
> I don't know if we can fix this in the context of sys_mount.  But we if
> we are redoing the option parsing of how we mount filesystems this needs
> to be fixed before we start worrying about bug compatibility.
> 
> Hopefully this report is simple and clear enough that we can at least
> agree on the problem.

Sure, it is simple.  So's the solution: MNT_USERNS_SPECIAL_SEMANTICS that
would get passed to filesystems, so that Eric would be able to implement
his mount(2)-incompatible behaviour at leisure, without worrying about
compatibility issues.

Does that address your complaint?  Because one thing we are not going
to do is changing mount(2) behaviour.  Reason: userland-visible
behaviour of hell knows how many local scripts.  Another thing that
is flat-out not feasible is some kind of blanket "compare options"
stuff; it *can* be done as helpers to be used by filesystem when
it sees that new flag, but it's simply not going to work at the
fs-independent level.  Trivial example with the same ext4:
mount /dev/sda1 /mnt/a -o bsddf vs. mount /dev/sda1 /mnt/b
ext4 can tell that these are the same.  syscall itself has no
clue.  What's more, it's not just explicitly spelled default
options - it's the stuff that has more than one form.  And while
we are at it, the things like two NFS mounts of different trees
from the same server; they might or might not get the same superblock.
Depending upon the options.

Convenience helper that would allow ext4 to compare options and reject
the incompatible mount?  Not sure how much ext4-specific knowledge
would have to go in it, but if you can come up with one - more power
to you.  But the decision to use it *must* be ext4-specific.  Because
for e.g. NFS such thing as -o fsid=..., while certainly a part of
options, has a very different meaning - it's "use a separate fs instance"
(and let the server deal with coherency issues on its end).

Decision to use sget() (and the way it's used) is up to filesystem.
We *can't* lift that into syscall.  Not without breaking the fuck out
of existing behaviour.

Having something like a second callback for mount_bdev() that would
be called when we'd found an existing instance for the same block
device?  Sure, no problem.  Having a helper for doing such comparison
that would work in enough cases to bother, so that different fs
could avoid boilerplate in that callback?  Again, more power to you.

But I don't see what the hell does that have to the syscall interface.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: BUG: Mount ignores mount options

2018-08-10 Thread Al Viro
On Fri, Aug 10, 2018 at 07:36:17AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Aug 10, 2018, at 7:05 AM, Eric W. Biederman  
> > wrote:
> > 
> > 
> > There is a serious problem with mount options today that fsopen does not
> > address.  The problem is that mount options are ignored for block based
> > filesystems, and any other type of filesystem that follows the same
> > pattern.
> > 
> 
> > /dev/loop0 /root/loop0-noacl-noquota-nouser_xattr ext4 
> > rw,relatime,nouser_xattr,noacl 0 0
> > /dev/loop0 /root/loop0-acl-quota-user_xattr ext4 
> > rw,relatime,nouser_xattr,noacl 0 0
> 
> To make sure I understand correctly: the problem is that the second mount 
> ignored the options because the device was already mounted, right?
> 
> For the new API, I think the only remotely sane approach is to refuse to 
> mount or init or whatever you call it an already mounted bdev. If user code 
> genuinely needs to bind-mount an existing mount that is known only by its 
> bdev, we can add a specific API just for that.

First of all, that does NOT belong anywhere other than fs itself.
Example: NFS.  Not every attempt to mount something leads to creation
of new fs instance; moreover, whether it will or not can't be predicted
in general.

PS: for pity sake, fix your MUA; 270-character lines are way over the
top.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: BUG: Mount ignores mount options

2018-08-13 Thread Al Viro
On Fri, Aug 10, 2018 at 08:05:44PM -0500, Eric W. Biederman wrote:

> All I proposed was that we distinguish between a first mount and an
> additional mount so that userspace knows the options will be ignored.

For pity sake, just what does it take to explain to you that your
notions of "first mount" and "additional mount" ARE HEAVILY FS-DEPENDENT
and may depend upon the pieces of state userland (especially in container)
simply does not have?

One more time, slowly:

mount -t nfs4 wank.example.org:/foo/bar /mnt/a
mount -t nfs4 wank.example.org:/baz/barf /mnt/b

yield the same superblock.  Is anyone who mounts something over NFS
required to know if anybody else has mounted something from the same
server, and if so how the hell are they supposed to find that out,
so that they could decide whether they are creating the "first" or
"additional" mount, whatever that might mean in this situation?

And how, kernel-side, is that supposed to be handled by generic code
of any description?  

While we are at it,
mount -t nfs4 wank.example.org:/foo/bar -o wsize=16384 /mnt/c
is *NOT* the same superblock as the previous two.

> I don't know how the above wound up being construed as asking that the
> code call sget directly but that is what has happened.

Not by me.  What I'm saying is that the entire superblock-creating
machinery - all of it - is nothing but library helpers.  With the
decision of when/how/if they are to be used being down to filesystem
driver.  Your "first mount"/"additional mount" simply do not map
to anything universally applicable.

> > Having something like a second callback for mount_bdev() that would
> > be called when we'd found an existing instance for the same block
> > device?  Sure, no problem.  Having a helper for doing such comparison
> > that would work in enough cases to bother, so that different fs
> > could avoid boilerplate in that callback?  Again, more power to you.
> 
> Normal forms etc.  If we want to do that it just requires a wee bit of
> discipline.  And if all of the option parsing is being rewritten and
> retested anyway I don't see why we can't do something like that as well.
> So it does not sound unreasonable to me.

See above.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: BUG: Mount ignores mount options

2018-08-13 Thread Al Viro
On Sat, Aug 11, 2018 at 02:58:15AM +0100, Al Viro wrote:
> On Fri, Aug 10, 2018 at 08:05:44PM -0500, Eric W. Biederman wrote:
> 
> > All I proposed was that we distinguish between a first mount and an
> > additional mount so that userspace knows the options will be ignored.
> 
> For pity sake, just what does it take to explain to you that your
> notions of "first mount" and "additional mount" ARE HEAVILY FS-DEPENDENT
> and may depend upon the pieces of state userland (especially in container)
> simply does not have?
> 
> One more time, slowly:
> 
> mount -t nfs4 wank.example.org:/foo/bar /mnt/a
> mount -t nfs4 wank.example.org:/baz/barf /mnt/b
> 
> yield the same superblock.  Is anyone who mounts something over NFS
> required to know if anybody else has mounted something from the same
> server, and if so how the hell are they supposed to find that out,
> so that they could decide whether they are creating the "first" or
> "additional" mount, whatever that might mean in this situation?
> 
> And how, kernel-side, is that supposed to be handled by generic code
> of any description?  
> 
> While we are at it,
> mount -t nfs4 wank.example.org:/foo/bar -o wsize=16384 /mnt/c
> is *NOT* the same superblock as the previous two.

s/as the previous two/as in the previous two cases/, that is - the first two
examples yield one superblock, this one - another.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: BUG: Mount ignores mount options

2018-08-13 Thread Al Viro
On Sat, Aug 11, 2018 at 09:31:29AM -0700, Andy Lutomirski wrote:

> I don’t see why we need all this fancy “do the options match” stuff.  For the 
> handful of filesystems (like NFS) that do something intelligent when multiple 
> non-bind mount requests against the same underlying storage happen,  we can 
> keep that behavior in the new API. For other filesystems that don’t have this 
> feature, we should simply fail the request.

> IOW I see so compelling reason to call sget() at all from the new API.  The 
> only sort-of-legit use case I can think of is mounting more than one btrfs 
> subvolume. But even that should probably not be done by asking the kernel to 
> separately instantiate the filesystem.


May I politely suggest the esteemed participants of that conversation
to RTFS?  Yes, I know that it's less fun that talking about your
rather vague ideas of how the things (surely) work, but it just might
avoid the feats of idiocy like the above.

Andy, I don't know how to put it more plainly: read the fucking source.
Even grep would do.  The same NFS you've granted (among the "handful"
of filesystems) an exception, *DOES* *CALL* *THE* *FUCKING* sget().

Yes, really.  And in some obscure[1] cases (including the one mentioned
upthread) it does reuse a pre-existing superblock.  For a very good
reason.

[1] such as, oh, mounting two filesystems from the same server with
default options - who would've ever thought of doing something so
perverted?

___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.

Re: BUG: Mount ignores mount options

2018-08-13 Thread Al Viro
On Mon, Aug 13, 2018 at 09:48:53AM -0700, Andy Lutomirski wrote:

> I would consider the GFS2 case to be essentially equivalent to the NFS
> case.  I think we can probably divide all the filesystems into three
> or four types:
> 
> pseudo file systems: Multiple instantiations of the same fs driver
> pointing at the same backing store give separate filesystems.  (Same
> backing store includes the case where there isn't any backing store.)
> tmpfs is an example.  This isn't particularly interesting.
> 
> network-like file systems: Multiple instantiations of the same fs
> driver pointing at the same backing store are expected.  This includes
> NFS, GFS2, AFS, CIFS, etc.  This is only really interesting to the
> extent that, if the fs driver internally wants to share state between
> multiple instantiations, it should be smart enough to make sure the
> options are compatible or that it can otherwise handle mismatched
> options correctly.  NFS does this right.
> 
> non-network-like filesystems: There are complicated ones like btrfs
> and ZFS and simple ones like ext4.  In either case, multiple totally
> separate instantiations of the driver sharing the backing store will
> lead to corruption.  In cases like ext4, we seem to support it for
> legacy reasons, because we're afraid that there are scripts that try
> to mount the same block device more than once, and I think the new API
> has no need to support this.  In cases like btrfs, we also seem to
> support multiple user requests for "mounts" with the same underlying
> block devices because we need it for full functionality.  But I think
> this is because our API is wrong.
> 
> Are there cases I'm missing?  It sounds like the API could be improved
> to fully model the last case, and everything will work nicely.

You know, that's starting to remind of this little gem of Borges:
http://www.alamut.com/subj/artiface/language/johnWilkins.html
Especially the delightful (fake) quote contained in there:
[...] it is written that the animals are divided into:
(a) belonging to the emperor,
(b) embalmed,
(c) tame,
(d) sucking pigs,
(e) sirens,
(f) fabulous,
(g) stray dogs,
(h) included in the present classification,
(i) frenzied,
(j) innumerable,
(k) drawn with a very fine camelhair brush,
(l) et cetera,
(m) having just broken the water pitcher,
(n) that from a long way off look like flies.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH] selinux: fix race when removing selinuxfs entries

2018-10-02 Thread Al Viro
On Tue, Oct 02, 2018 at 01:18:30PM +0200, Ondrej Mosnacek wrote:

No.  With the side of Hell, No.  The bug is real, but this is
not the way to fix it.

First of all, it's still broken - e.g. mount something on a
subdirectory and watch what that thing will do to it.  And
anyone who has permission to reload policy _will_ have one
for mount.

And yes, debugfs_remove() suffers the same problem.  Frankly, the
only wish debugfs_remove() implementation inspires is to shoot it
before it breeds.  Which is the second problem with that approach -
such stuff really shouldn't be duplicated in individual filesystems.  

Don't copy (let along open-code) d_walk() in there.  The guts of
dcache tree handling are subtle enough (and had more than enough
locking bugs over the years) to make spreading the dependencies
on its details all over the tree an invitation for massive PITA
down the road.

I have beginnings of patch doing that for debugfs; the same thing
should be usable for selinuxfs as well.  However, the main problem
with selinuxfs wrt policy loading is different - what to do if
sel_make_policy_nodes() fails halfway through?  And what to do
with accesses to the unholy mix of old and new nodes we currently
have left behind?

Before security_load_policy() we don't know which nodes to create;
after it we have nothing to fall back onto.  It looks like we
need to split security_load_policy() into "load", "switch over" and
"free old" parts, so that the whole thing would look like
load
create new directory trees, unconnected to the root so
they couldn't be reached by lookups until we are done
switch over
move the new directory trees in place
kill the old trees (using that invalidate+genocide carefully
combination)
free old data structures
with failures upon the first two steps handled by killing whatever detached
trees we'd created (if any) and freeing the new data structures.

However, I'd really like to have the folks familiar with selinux guts to
comment upon the feasibility of the above.  AFAICS, nobody has ever seriously
looked at that code wrt graceful error handling, etc.[*], so I'm not happy
with making inferences from what the existing code is doing.

If you are interested in getting selinuxfs into sane shape, that would
be a good place to start.  As for the kernel-side rm -rf (which is what
debugfs_remove() et.al. are trying to be)...
* it absolutely needs to be in fs/*.c - either dcache or libfs.
It's too closely tied to dcache guts to do otherwise.
* as the first step it needs to do d_invalidate(), to get rid of
anything that might be mounted on it and to prevent new mounts from appearing.
It's rather tempting to unhash everything in the victim tree at the same
time, but that needs to be done with care - I'm still not entirely happy
with the solution I've got in that area.  Alternative is to unhash them
on the way out of subtree.  simple_unlink()/simple_rmdir() are wrong
there - we don't want to bother with the parent's timestamps as we go,
for one thing; that should be done only once to parent of the root of
that subtree.  For another, we bloody well enforce the emptiness ourselves,
so this simple_empty() is pointless (especially since we have no choice other
than ignoring it anyway).

BTW, another selinuxfs unpleasantness is, the things like sel_write_enforce()
don't have any exclusion against themselves, let alone the policy reloads.
And quite a few of them obviously expect that e.g. permission check is done
against the same policy the operation will apply to, not the previous one.
That one definitely needs selinux folks involved.

[*] not too unreasonably so - anyone who gets to use _that_ as an attack
vector has already won, so it's not a security problem pretty much by
definition and running into heavy OOM at the time of policy reload is
almost certainly going to end up with the userland parts of the entire
thing not handling failures gracefully.
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.