Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Tetsuo Handa
Hello.



Indan Zupancic wrote:
  It seems to me that the alternatives you are proposing include
  modification of userland applications. But my assumption is
  that Don't require modification of userland applications.
 
 If you want a secure system it isn't that unreasonable to expect
 applications to not do brain dead things, so not requiring any
 modifications or config changes seems a bit optimistic to me.

It depends.
Some users have to continue using brain dead legacy applications
without modification because ...

   the application's source code is not available.

   the distributor no longer supports the application.

   the application is too difficult/complicated to reconstruct.

For cases where you can expect application won't do brain dead things
and/or we can reconstruct application, your approach is OK.



  In other words, I want to implement without asking applications
  to use /dev/dynamic/ or something.
  This filesystem is intended to provide support for legacy applications.
  (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and
  later.)
 
 Legacy applications should cope with a static /dev/.
 What is the advantage of your filesystem compared to a static /dev/?

I assume a static /dev/ means a /dev/ directory in 2.4 kernels.
This filesystem's advantage:

  (1) Can guarantee filename/attribute pairs.

  A process with root privilege can do
  mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp 
/dev/hda2
  if /dev is in / partition or is a devfs partition, whereas
  a process with root privilege cannot do
  mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp 
/dev/hda2
  if /dev is this filesystem unless granted by the configuration file.

  So, you can guarantee that /dev/hda1 is block-3-1 and /dev/hda2 is 
block-3-2 .
  (e.g. mount /dev/hda1 /home won't mount block-3-2 partition on /home .)

  (2) Can keep nodes that needn't to be deleted/modified for read-only.

  A process with root privilege can delete /dev/null on / partition or
  on devfs partition, whereas a process with root privilege cannot delete
  /dev/null on this filesystem unless granted by the configuration file.

  So, you can guarantee the node which needn't to be deleted/modified
  won't be deleted/modified.
  (e.g. /dev/null is always there with char-1-3 attribute.)

  (3) Can hide unwanted device nodes.

  A process with root privilege can create new nodes on / partition or on 
devfs,
  whereas a process with root privilege cannot create new nodes on this 
filesystem
  that are not specified by configuration file.

  So, you can expose specific nodes selectively.
  (e.g. Allow accessing /dev/hda1 , but forbid accessing /dev/hda2 .)



  Use of a tiny daemon that communicates with udev is not sufficient.
  The udev is not the only application that modifies /dev files.
 
 Oh, it isn't? Which other applications do modify /dev files? I'd like to
 hear about a few, no matter how obscure or proprietary. And please
 tell how many of those will stop working with a static /dev with all
 nodes they might create already existing.

I don't know. I'm not using rare software.



  At least, the tiny daemon should communicate with the kernel
  so that all requests are checked by the tiny daemon.
 
 No, why should the kernel be involved? The tiny daemon would be
 the only one allowed to modify /dev/, so all mknod commands will
 be done by it. Of course it means that you might need to modify
 the two or three apps wanting to create device nodes, or you can
 make an LD_PRELOAD lib that intercepts mknod commands and
 sends them to the daemon.

No. The kernel must be involved.

Suppose the tiny daemon is the only one allowed to modify /dev/ .
foo requests mknod /dev/null from chroot() environment.
bar requests mknod /dev/null from clone(CLONE_FS) + mount() environment.

How can the daemon know where to create the node?
How can the daemon determine whether the requested pathname is
in /dev directory or not?
The process who requests mknod and the process who performs mknod
are not always using the same / directory.
The daemon must not forbid creation of /dev/null if the realpath() is
/tmp/dev/null (i.e. mknod /dev/null after chroot /tmp),
because the daemon is not asked to manage /tmp/dev directory.

Who can guarantee that the daemon can access all namespaces?
The process who requests mknod and the process who performs mknod
are not always using the same namespace.

If foo or bar is a statically linked or suid-root application
(where LD_PRELOAD is ignored), they would attempt to create device nodes
directly (i.e. call sys_mknod() instead of communicating with the daemon)
and abort due to failure.
Not only applications who wants to create device nodes in /dev/ ,
but also all applications who wants to modify entries in /dev/ .


From the beginning, the kernel is deeply involved because in-kernel MAC
is essential 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Indan Zupancic
Hi,

On Fri, January 11, 2008 09:46, Tetsuo Handa wrote:
 It depends.
 Some users have to continue using brain dead legacy applications
 without modification because ...

the application's source code is not available.

Source isn't needed, as long as the vendor has it.

the distributor no longer supports the application.

Then why should anyone else support it?

the application is too difficult/complicated to reconstruct.

Then you can't trust it and it shouldn't have permission to do
potentially dangerous things in /dev/ either. Even if you can
contain the device node creation, it most likely does other
potentially dangerous things too. As a whole it can't be trusted.

 I assume a static /dev/ means a /dev/ directory in 2.4 kernels.
 This filesystem's advantage:

I'm not talking about devfs, I'm talking about a real static /dev.
I'm using it now and it works fine (I let udev manage /udev/ to see
what's it's doing).

   (1) Can guarantee filename/attribute pairs.

Wrong. All nodes are created and thus there's never a need to create
new nodes. So /dev/ can't be modified by anyone. This works because
all nodes that anyone might want to create already exist.

   (2) Can keep nodes that needn't to be deleted/modified for read-only.

This would also be true for all nodes in a static /dev I think.

   (3) Can hide unwanted device nodes.

In a static /dev you only create the nodes you want. It's true that it
can't hide nodes for hardware that doesn't exist (other than deleting
the nodes manually), but that was the norm for years before the
whole dynamic /dev thing catched up.

 I don't know. I'm not using rare software.

It doesn't have to be rare, anything is fine. You don't know
anything else than udev? (And shell commands like mknod etc.)

Then why all the talk about mysterious apps that might need to
do all kind of crazy things in /dev?

 No. The kernel must be involved.

 Who can guarantee that the daemon can access all namespaces?
 The process who requests mknod and the process who performs mknod
 are not always using the same namespace.

This is true on a theoretical level. But practically I think you can either
run multiple daemons, one for each namespace where you want to
control /dev/, or if you really want one daemon you can pass the
directory fd to it where the node should be created and use mknodat().
I believe that crosses namespaces correctly.

If the daemon can't be contacted or doesn't want to do a mknod for you,
the preloaded lib can fallback to doing the mknod itself, though normally
that would be disallowed by MAC.

But I think that the chance that any process needs to create device nodes
in a chroot is at the level of fairy existance.

 If foo or bar is a statically linked or suid-root application
 (where LD_PRELOAD is ignored), they would attempt to create device nodes
 directly (i.e. call sys_mknod() instead of communicating with the daemon)
 and abort due to failure.
 Not only applications who wants to create device nodes in /dev/ ,
 but also all applications who wants to modify entries in /dev/ .

If the preloaded library is setuid, it will also work for setuid programs.
It's true that it won't work for statically linked apps, but so what?

Device node creating apps are rare enough, let alone the ones that are
also statically linked. Nice theoretical problem, but I don't think anyone
will care in practice.

 From the beginning, the kernel is deeply involved because in-kernel MAC
 is essential to realize only the tiny daemon can modify /dev/.
 Why not do this filename/attribute checking in the kernel too?

That only the tiny daemon can modify /dev/ is done with MAC rules,
the ones that should be the default for all applications except udev by
default already. For teh kernel nothing changes.

 The ammount of code will be the current parsing code + a few hundred
 lines of code, including the preloaded library.

 You will be bothered with what is the realpath of /dev/null? and
 how can I reach the realpath? because you have to manage
 namespace information.

Or ignore the problem and see if it's a real problem or a nice theoretical
case. And when it turns out to be a real problem, there are probably
ways to fix it (See above). But you know what exactly is needed only
after problems do turn up.

 OK. I'll consider adding this feature.
 But I'd like to use approach (B) to keep the advantage (3).

 (A) White-listing + Black-listing approach.

 Permit any operations if the filename didn't appear
  in the configuration file.

 (B) White-listing + Wild-card approach.

 Support wildcard and permit only operations if
  the filename-with-wildcard/attributes-with-wildcard appeared
  in the configuration file.

With this the filesystem at least adds some unique abilities.

If anyone really needs it and where/how it should be implemented is
another matter.

Without it it's a glorified and complicated drop-in replacement for
a static /dev/.

Regards,

Indan


-
To 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Lennart Sorensen
On Fri, Jan 11, 2008 at 11:05:07PM +0900, Tetsuo Handa wrote:
 Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may
 cause filename/attribute mismatching.
 
 How can the daemon know whether the request is trying to manipulate nodes
 in /dev directory or not?
 If mount --bind /dev/ /var/dir/ is used, the daemon must check
 filename/attribute pair when mknod(/var/dir/null) is requested
 because permitting the request will modify /dev state.
 If mount --bind /dev/ /var/dir/ is not used, the daemon must not check
 filename/attribute pair when mknod(/var/dir/null) is requested
 because permitting the request will not modify /dev state.
 
 
 
 What does the daemon do? It receives requests from the LD_PRELOAD library
 using UNIX domain socket and checks filename/attribute pair and issue
 mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is 
 appropriate?
 
 What does the LD_PRELOAD library do? It intercepts all pathname related 
 syscalls
 (except open()) and solve directory component and determine whether the 
 request is
 trying to manipulate nodes in /dev direcrtory and forward request to the 
 daemon
 using UNIX domain socket?
 
 Make the daemon and the LD_PRELOAD library bug-and-race free and
 develop the MAC policy for the daemon and the LD_PRELOAD library
 and Make this filesystem bug-and-race free. Which one is easier?

I think a good question is:

What kind of idiot wrote a program that thinks it is allowed to go
messing with the contents of /dev?  There simply can't be a good reason
for an application to do that.  Device nodes should match up with
devices, so as long as the device nodes exist for all your devices, then
everything should just work and no one should ever have a reason to go
changing things for any reason.

Perhaps the real solution is a preload library that blocks the idiotic
program from touching anything in /dev with anything other than
open/close/read/write.

Of course it could also help to simply tell people what this stupid
program is actually doing and why it should be allowed to mess in places
it doesn't belong.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Tetsuo Handa
Hello.



Indan Zupancic wrote:
 That only the tiny daemon can modify /dev/ is done with MAC rules,
 the ones that should be the default for all applications except udev by
 default already. For teh kernel nothing changes.

OK. You assume use of MAC with enough fine grained access control.



 Wrong. All nodes are created and thus there's never a need to create
 new nodes. So /dev/ can't be modified by anyone. This works because
 all nodes that anyone might want to create already exist.

Already exist is not enough.
These nodes have to be deletable if requested by appropriate process.
These nodes have to be protected by MAC from directly calling
mknod()/rename()/unlink()/link()/mount() etc.



 This is true on a theoretical level. But practically I think you can either
 run multiple daemons, one for each namespace where you want to control /dev/,

If the daemon does not exist in that namespace?

 or if you really want one daemon you can pass the
 directory fd to it where the node should be created and use mknodat().
 I believe that crosses namespaces correctly.

The fd passed to mknodat() is used for starting from
specified directory instead for current directory.
The object obtained by resolving the rest pathname depends on
the / of the calling process.

If /var/jail/dev/dyndev/link is a symlink to /dev ,
a process in chroot(/var/jail/) + chdir(/) will get /var/jail/dev/node
and a process not in chroot(/var/jail/) + chdir(/)  will get /dev/node
by resolving mknodat(fd_for_/var/jail/, dev/dyndev/link/node) .
If the process is in the chroot() but the daemon is not in the chroot() ,
the daemon will create nodes in a wrong location.

So, you let the LD_PRELOAD library to solve all directory components
before passing the fd to the daemon using UNIX domain socket
so that the daemon won't create nodes in a wrong location.

OK. It looks like working, although I'm not taking racy condition into account.



 But I think that the chance that any process needs to create device nodes
 in a chroot is at the level of fairy existance.

Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may
cause filename/attribute mismatching.

How can the daemon know whether the request is trying to manipulate nodes
in /dev directory or not?
If mount --bind /dev/ /var/dir/ is used, the daemon must check
filename/attribute pair when mknod(/var/dir/null) is requested
because permitting the request will modify /dev state.
If mount --bind /dev/ /var/dir/ is not used, the daemon must not check
filename/attribute pair when mknod(/var/dir/null) is requested
because permitting the request will not modify /dev state.



What does the daemon do? It receives requests from the LD_PRELOAD library
using UNIX domain socket and checks filename/attribute pair and issue
mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is 
appropriate?

What does the LD_PRELOAD library do? It intercepts all pathname related syscalls
(except open()) and solve directory component and determine whether the request 
is
trying to manipulate nodes in /dev direcrtory and forward request to the daemon
using UNIX domain socket?

Make the daemon and the LD_PRELOAD library bug-and-race free and
develop the MAC policy for the daemon and the LD_PRELOAD library
and Make this filesystem bug-and-race free. Which one is easier?



Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-10 Thread Indan Zupancic
On Thu, January 10, 2008 05:57, Tetsuo Handa wrote:
 It seems to me that the alternatives you are proposing include
 modification of userland applications. But my assumption is
 that Don't require modification of userland applications.

If you want a secure system it isn't that unreasonable to expect
applications to not do brain dead things, so not requiring any
modifications or config changes seems a bit optimistic to me.

 In other words, I want to implement without asking applications
 to use /dev/dynamic/ or something.
 This filesystem is intended to provide support for legacy applications.
 (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and
 later.)

Legacy applications should cope with a static /dev/.

What is the advantage of your filesystem compared to a static /dev/?

 Use of a tiny daemon that communicates with udev is not sufficient.
 The udev is not the only application that modifies /dev files.

Oh, it isn't? Which other applications do modify /dev files? I'd like to
hear about a few, no matter how obscure or proprietary. And please
tell how many of those will stop working with a static /dev with all
nodes they might create already existing.

 At least, the tiny daemon should communicate with the kernel
 so that all requests are checked by the tiny daemon.

No, why should the kernel be involved? The tiny daemon would be
the only one allowed to modify /dev/, so all mknod commands will
be done by it. Of course it means that you might need to modify
the two or three apps wanting to create device nodes, or you can
make an LD_PRELOAD lib that intercepts mknod commands and
sends them to the daemon.

The ammount of code will be the current parsing code + a few hundred
lines of code, including the preloaded library.

 But use of the tiny daemon (which is a process running in userland)
 causes a lot of troubles.

No, it doesn't, and most of those problems are true for all programs
that access /dev! If those are straced or whatever they can be forced
to open the wrong file, practically breaking the filename/attribute pairs.
So all security you think you need to have for the daemon process is
the same security you already need for all processes anyway to protect
them against each other.

 If an administrator wants something else than
 3 or 5, you're breaking something.
 That's the fate of white-list based access control.

 Does this filesystem sound too strict to support dynamic device?
 May be this filesystem should be able to permit creation of device
 nodes that are not listed in the policy file.

Actually, I assumed that was the case, because if it's strictly white-list
based it's almost the same as a static /dev with some nodes hidden.
Without it has even less value, because it just complicates matters
compared to a normal static dev.

I thought it checked that if a device name was in the list, it has the
correct attributes, and was free to create nodes without restricted
names.

 From your next posting:
 But I think doing more is getting ridiculous, because if a process can
 create a device node, it can also access it and do whatever harm could
 be done by the confusion caused by unexpected name/attribute pairs.

 FYI. Being able to create a device node is different from being able to access
 it and do whatever harm. You will need read and/or write permission to open
 that device.

Yes, but as the process creates the device it can also choose the file mode and
probably also ownership. And as it creates a new file there likely aren't strict
MAC rules in place restricting the process from reading or writing to it. So
yes, you're right, but in practise it isn't as easy to close that hole,
especially
not if the applications isn't very clean and single purpose. If it creates the
node
it probably wans to use it too, and that means read/write access. Even if it can
live without it, it could give access to the node to another process and let the
other process do the dirty work. Very tricky.

Greetings,

Indan


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-09 Thread Indan Zupancic
Hello,

On Wed, January 9, 2008 05:39, Tetsuo Handa wrote:
 Hello.

 Indan Zupancic wrote:
 I think you focus too much on your way of enforcing filename/attributes
 pairs.
 So?

So that you miss alternatives and don't see the bigger picture.


 The same can be achieved by creating the device nodes with
 expected attributes, and preventing processes from changing those files.
 The device nodes have to be deletable if some process (including udev) needs
 to delete.
 Thus, you cannot unconditionally prevent processes from changing those files.

 This because expected combinations are known beforehand.
 Yes.

 And once those files are present, the MAC system used doesn't have to have
 special
 device nodes attributes support. Protecting those files is enough to
 guarantee filename/attributes pairs.
 If MAC system needn't to support this filesystem's functionality,
 who creates those files with warrantee of expected attributes? The udev does?
 If udev is exploited, who can guarantee?

The person that would write the config file for your fs, the one who wants
that guarantee.


 No, this is because rename permission was given for files that it shouldn't
 had.
 Do you think all MAC implementation have the same granularity and
 functionalities?
 I don't think so. Not all MAC implementation can control with such
 granularity.
 This filesystem is designed to be combined with any MAC,
 although the MAC used with this filesystem should be able to restrict
 namespace manipulation requests so that this filesystem can remain /dev
 and visible to userland applications.

Good point, but I assume they all have at least a directory granularity, and 
then
/dev/ can be static and udev and other can have free reign in e.g. 
/dev/dynamic/.
Just use subdirs for the dynamic stuff and this granularity problem is, with
slight
inconvenience, solved.


 Either you want a process to manage device names and attributes, and then
 you
 give it permission to do that, or you want to enforce certain
 filename/attribute
 pairs and then you just do it yourself.
 If I modify udev to enforce certain filename/attribute pairs and the modified
 udev
 was exploited, who can guarantee?
 Don't trust userland application is the basis of restricting access in
 kernel space.
 If you can trust userland application, you don't need in-kernel access
 control.

Funny, I thought that it was in the kernel because that's the way to protect
processes against eachother, the fs against processes, and for performance
reasons.

Exploits are in code, and where that code is doesn't matter that much, either
kernel or userspace, though if it's exploitable you'll rather not have it in the
kernel. So I think it's more secure if the checking would be done by udev than
in a special filesystem, even if that means that you're screwed if udev is
exploited. Of course you fully trust your own code, naturally.

A tiny daemon that communicates with udev and does the checking you have
now, and if ok it creates the node is really not much more code than your fs,
so as hard to exploit too. Then if udev is hacked you have the same guarantee
as you have now.

I can think of more alternatives that are as secure or more secure than the
current solution.



 Will your filesystem prevent the trivial case of

 rm /dev/hda1
 ln -s /dev/hda2 /dev/hda1

 Of course. To permit the above operation, the following permissions are
 needed.

   hda1660 0   6   2   b   3   1
   hda1777 0   0  33   l   .

Yes, I should've read the code before asking that, instead of the other way
round.


 Rename permission can be given for /dev in general, but prohibited for
 certain files in /dev, the ones you want to have specific attributes.
 It isn't all or nothing.
 Do you think all MAC implementation can prohibit renaming for certain files in
 /dev ?

 It's forbid modifying certain nodes that process needn't to modify
 versus forbid breaking filename/attribute pairs of certain nodes.

 Both have the same effect, except that the first one is generic and
 can be done by existing MAC systems, while the second one needs
 a special filesystem and a handful of MAC rules to make it effective.
 Do you think all MAC implementation can do?
 I think the first one is implementation specific and the second one is
 generic.

Protecting certain files from being modified seems to me more generic than
enforcing filename/attributes pairs on device nodes. And if they can't do it
surely they can do it per directory, and the using subdirs solves it.


 It doesn't matter where they are, it's that a different fs than yours could
 be
 mounted over it. You say a MAC can prevent that from happening, but a
 MAC can also prevent all processes except for udev from modifying /dev.
 But MAC cannot prevent udev from modifying /dev . And what if exploited?
 Not all MAC can enforce access control over all processes with the granularity
 you are talking. And what if a process that cannot be 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-09 Thread Serge E. Hallyn
Quoting Indan Zupancic ([EMAIL PROTECTED]):
 Hello,
 
 On Wed, January 9, 2008 05:39, Tetsuo Handa wrote:
  Hello.
 
  Indan Zupancic wrote:
  I think you focus too much on your way of enforcing filename/attributes
  pairs.
  So?
 
 So that you miss alternatives and don't see the bigger picture.

These emails again are getting really long, but I think the gist of
Indan's suggestion can be concisely summarized:

To confine process P3 to /dev/hda2 being 'b 3 2', create
/dev/p3, launch P3 in a new mounts namespace, mount --bind
/dev/p3 /dev, exec what you want p3 running, and have
MAC prevent umount /dev/p3.

This is a neat idea, but Tetsuo's rebutall is

P3 may be legacy code needing to create or delete
/dev/floppy, where -EPERM confuses P3 and prevents
it working correctly.

Indan's idea is interesting and I like it, but is there an answer to
Tetsuo's problem with it?

thanks,
-serge

PS - Indan, you also said in essence if P3 can be trusted to create
/dev/floppy why can't it be trusted to create /dev/hda1.  I trust that,
phrased that way, the question answers itself?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-09 Thread Indan Zupancic
On Thu, January 10, 2008 00:08, Serge E. Hallyn wrote:
 These emails again are getting really long, but I think the gist of
 Indan's suggestion can be concisely summarized:

No worry, I wasn't planning on extending it, I've said what I've to say.

Except...


   To confine process P3 to /dev/hda2 being 'b 3 2', create
   /dev/p3, launch P3 in a new mounts namespace, mount --bind
   /dev/p3 /dev, exec what you want p3 running, and have
   MAC prevent umount /dev/p3.

 This is a neat idea, but Tetsuo's rebutall is

   P3 may be legacy code needing to create or delete
   /dev/floppy, where -EPERM confuses P3 and prevents
   it working correctly.

 Indan's idea is interesting and I like it, but is there an answer to
 Tetsuo's problem with it?

...that I didn't mean that, but a more simple

/dev/ directory protected from any modifications by MAC,

/dev/* all the nodes that need to have guaranteed name/attribute pairs,
like /dev/null, /dev/zero, /dev/random, etc. and:

/dev/dynamic/ being a dir where apps who really need to create/modify
device nodes can do whatever they want to do. It can be multiple dirs
too, like /dev/snd/, /dev/input/ etc.

I guess this covers about 96% of the usecases of this tamper-proof dev fs.

You can think of unlikely cases that aren't solved by this, but those can
be solved in another way if really wanted (like a checking daemon,
modified udev, shadow /dev/, to name a few).

But I think doing more is getting ridiculous, because if a process can
create a device node, it can also access it and do whatever harm could
be done by the confusion caused by unexpected name/attribute pairs.

As for information snooping, that's mostly about /dev/null or other
things that are known beforehand.

 PS - Indan, you also said in essence if P3 can be trusted to create
 /dev/floppy why can't it be trusted to create /dev/hda1.  I trust that,
 phrased that way, the question answers itself?

Not exactly. If there's a process that dynamically created certain device
nodes, and it wants to create one that doesn't fit the rules, you can't
know if it's wrong or if your rules are wrong. The process has a certain
policy of naming/creating the devices, but you also have a policy at the
kernel side with this fs. If it mismatches you don't know which one is
right.

If you trust a process to create /dev/hd*, you can also trust it to create
the proper /dev/hdXn, no need to verify if /dev/hda1 is really 3 1.

The whole thing about filename/attribute pairs is that it's about what
applications expect. There aren't many expectations about dynamically
created device nodes which might not always be there, because their
name isn't stable.

The use case for this fs is a malicious app that can create device nodes,
and we're worried about mismatching name/attribute pairs. Not about
our data, or anything else. Call me an optimist, but I think you don't
need to worry about name/attribute pairs.

Greetings,

Indan


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-09 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 Good point, but I assume they all have at least a directory granularity, and 
 then
 /dev/ can be static and udev and other can have free reign in e.g. 
 /dev/dynamic/.
 Just use subdirs for the dynamic stuff and this granularity problem is, with
 slight inconvenience, solved.

It seems to me that the alternatives you are proposing include modification of
userland applications. But my assumption is that
Don't require modification of userland applications.
In other words, I want to implement without asking applications
to use /dev/dynamic/ or something.
This filesystem is intended to provide support for legacy applications.
(In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and 
later.)



 Exploits are in code, and where that code is doesn't matter that much, either
 kernel or userspace, though if it's exploitable you'll rather not have it in 
 the
 kernel. So I think it's more secure if the checking would be done by udev than
 in a special filesystem, even if that means that you're screwed if udev is
 exploited. Of course you fully trust your own code, naturally.

I'm keeping the mechanism as simple as possible
so that there is unlikely room (e.g. buffer overflow) for running exploits.



 A tiny daemon that communicates with udev and does the checking you have
 now, and if ok it creates the node is really not much more code than your fs,
 so as hard to exploit too. Then if udev is hacked you have the same guarantee
 as you have now.

Use of a tiny daemon that communicates with udev is not sufficient.
The udev is not the only application that modifies /dev files.
At least, the tiny daemon should communicate with the kernel
so that all requests are checked by the tiny daemon.
But use of the tiny daemon (which is a process running in userland)
causes a lot of troubles.
See the block after the -- boundary -- of this posting.

My assumption is that Don't require userland process's assistance,
as written at Why not use FUSE?.



 Protecting certain files from being modified seems to me more generic than
 enforcing filename/attributes pairs on device nodes. 
OK. You are saying that from the point of view of what it can.
I thought you were saying enforcing filename/attributes pairs
from out-of-this-filesystem (e.g. MAC) is more flexible than this-filesystem.



 rm -f /dev/either-null-or-zero
 
 as said before, if this is possible then the MAC config used is wrong. Exactly
 the same as for your filesystem with
 
 mknod /dev/tmp1 c 1 X
 mount --bind /dev/tmp1 /dev/either-null-or-zero
 
 and you count on the MAC to prevent that.

An administrator asks MAC to prevent processes
(except specific processes who need to do rm -f /dev/either-null-or-zero)
from doing rm -f /dev/either-null-or-zero.

An administrator asks this filesystem to prevent processes from doing
mknod /dev/tmp1 c 1 X.

An administrator asks MAC to prevent processes from doing
mount --bind /dev/tmp1 /dev/either-null-or-zero.



 And as for that app, if you trust it to create device nodes, why don't you
 trust it to make the right nodes too?

If that app has a bug that triggers
  mknod /dev/either-null-or-zero 1$REPLY
instead of
  mknod /dev/either-null-or-zero $REPLY
under an unexpected circumstance, it will create unwanted nodes.
Thus I don't trust the app.



 If an administrator wants something else than
 3 or 5, you're breaking something.
That's the fate of white-list based access control.

Does this filesystem sound too strict to support dynamic device?
May be this filesystem should be able to permit creation of device nodes
that are not listed in the policy file.



  Can SELinux guarantee the same result as my filesystem even if udev or
  administrative programs have to be able to modify /dev ?
 
 More, because your filesystem doesn't guarantee anything at all on its own.
 But assuming the MAC is decent enough to protect your fs from being bypassed,
 I'm sure it can do what's needed fine without your fs. I can't answer for 
 SELinux
 because I don't know it well. But I trust it can protect files and/or
 directories, and that's all that's needed to achieve the same end result.

I don't know SELinux well, but as far as seeing an example
(found by Googling selinux allow mknod)

  allow udev_t self:capability { chown dac_override dac_read_search fowner 
fsetid sys_admin sys_nice mknod net_raw net_admin sys_rawio };

I can't find a place to specify filename/attributes pairs in this syntax.
So, if the process who is permitted to create device nodes misbehaves,
it will generate unexpected filename/attribute pairs.
I think SELinux can't guarantee the same result as my filesystem.



 You seem to assume that the in-kernel implementation is suddenly
 guaranteed bugfree.
I keep the implementation as simple as possible.



From your next posting:
 But I think doing more is getting ridiculous, because if a process can
 create a device node, it can also access it and do whatever harm could
 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.


Indan Zupancic wrote:
  I want to use this filesystem in case where a process with root privilege 
  was
  hijacked but the behavior of the hijacked process is still restricted by 
  MAC.
 
 1) If the behaviour can be controlled, why can't the process be
disallowed to change anything badly in /dev? Like disallowing anything
from modifying existing nodes that weren't created by that process.
That would have practically the same effect as your filesystem,
won't it?
MAC system can prevent hijacked processes from changing anything badly in /dev .
But MAC system can't prevent hijacked processes from doing
mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2
if permissions to rename device nodes in /dev are given to hijacked processes.
This is because MAC implementation doesn't check filename/attribute pairs.

But this filesystem can prevent hijacked processes from doing
mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2
even if permissions to rename device nodes in /dev are given to hijacked 
processes.

This filesystem is not designed to
forbid modifying nodes if that process needn't to modify nodes.
This filesystem is designed to
forbid breaking filename/attribute pairs of nodes
even if that process need to (or permitted to) modify nodes.

Or phrased differently, if the MAC system used can't protect /dev, it
won't be able to protect other directories either, and if it can't
protect e.g. my homedir, doesn't it make the whole MAC system
ineffective? And if the MAC system used is ineffective, your
filesystem is useless and you've bigger problems to fix.
You can use nodev mount option to prevent attackers from opening device files.
You can use MAC system to prevent attackers from mounting partitions (other than
/dev partition) without nodev option.


 2) The MAC system may not be able to guarantee certain combinations
of device names and properties, but isn't that policy that shouldn't
be in the kernel anyway? But if it is, shouldn't all device nodes be
checked? That is, shouldn't it be a global check instead of a filesystem
specific one?
I think the reason why MAC system doesn't handle filename/attributes pairs is 
that:

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

Thus, this should be a global check. But usually device nodes are only in /dev .



 3) Code efficiency. Thousand lines of code just to close one very specific
attack, which can be done in lots of different other ways that all need
to be prevented by the MAC system. (mounting over it, intercepting open
calls, duping the fd, etc.) Is it worth it?
This filesystem is doing what MAC system is not doing.
So, please don't complain about inability of this filesystem to close all 
attacks.
You can use MAC system to prevent attackers from mounting other filesystem
over this filesystem.

The filename/attribute pairs are something like system call entry tables.
The application will go wrong if __NR_read is mapped to sys_write() and
__NR_write is mapped to sys_read().
Userland applications access special functionalities (e.g. /dev/zero and 
/dev/random)
by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs
tamper-proof is important.

You recognize that there is a threat that device nodes may have irregular
attribute (e.g. /dev/null existing as a regular file), do you?
You don't deny implementing mechanisms somehow to avoid such threat, do you?
OK. Then the matter is the comparison of code efficiency.

This patch is less than 1100 lines in total.
Large part of this patch is for parsing and managing policy file.
If you try to extend every MAC implementation (SELinux, SMACK, AppArmor, TOMOYO)
so that they can handle filename/attributes pairs (i.e. expand policy file's 
syntax
and both in-kernel and userland data structures, manage strings with variant 
length
and non-printable characters etc.), I think that modification exceeds this 
patch.
I think guaranteeing filename/attribute pairs in filesystem layer can keep
MAC system implementation simple and compact.
http://www.mail-archive.com/linux-fsdevel@vger.kernel.org/msg10653.html


Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

[EMAIL PROTECTED] wrote:
 Ouch.  The .c files should generally be built into their own .o files and
 then the Makefile should do something like
 
 obj-$(CONFIG_SYAORAN) += syaoran.o
 
 unless there's *really* good reasons for including .c files (such as an
 otherwise-messy variable-namespace issue or similar).
Yes. The final implementation will become so.
This is a temporal hack to keep all functions and variables static.

 Also, has this been double-checked to Do The Right Thing if you have
 *two* instances of ramfs mounted, one with Syaoran and one without?
Yes. The memory for superblock is allocated for each instance.
Thus, mounting one as syaoran and the other as tmpfs won't cause problems.

 (incidentally, all of these should probably be abstracted into a helper
 function that's 'static inline' so we have just one #ifdef in the definition
 in a .h file, and none in open .c code).
Oh, good idea.

 Similarly for other places you have #ifdef CONFIG_ in ramfs .c code - see if
 you can abstract it out.
This patch replaces the previous patch and
this patch modifies only tmpfs (fs/shm*) files.
I'm no longer modifying ramfs (fs/ramfs/*) files.

  +/*
  + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
  + * Now I'm setting the field to share tmpfs/rootfs/syaoran code.
 
 Question for the audience: *should* ramfs set that field so setattr works
 on ramfs (even if it's just a stub similar to the SELinux fscontext= mount
 stuff)?
 
 Question for Tetsuo:  What happens to this code if somebody actually does the
 above change?
Please forget this question.
I'm no longer setting ramfs_dir_inode_operations.setattr field.

  + Applications using well-known device locations under /dev
  +  get the device they want (e.g. an application that accesses
  +  /dev/null can always get a character special device
  +  with major=1 and minor=3).
 
 This should say will always get, not can always, as this code will
 mandate, rather than just make possible.
OK.

  + The list of possible combinations of filename and its attributes
  + that can exist on this filesystem is defined at mount time
  + using a configuration file.
 
 The format of this file needs to be documented.
Yes. It is a line-by-line processable format defined as:

  filename permission owner group flags type [ symlink_data | major minor ]

where flags are bit-wised combinations of

  *  1: Allow creation of the file.
  *  2: Allow deletion of the file.
  *  4: Allow changing permissions of the file.
  *  8: Allow changing owner or group of the file.
  * 16: For internal use. Remembers whether this file is opened or not.
  * 32: Don't create this file at mount time.

and here are some example entries:

  pts 755 0   0   0   d
  shm 755 0   0   0   d
  fd  777 0   0   0   l   /proc/self/fd
  stdin   777 0   0   0   l   /proc/self/fd/0
  stdout  777 0   0   0   l   /proc/self/fd/1
  stderr  777 0   0   0   l   /proc/self/fd/2
  null666 0   0   0   c   1   3
  zero666 0   0   0   c   1   5
  random  644 0   0   0   c   1   8
  urandom 644 0   0   0   c   1   9
  tty 666 0   0   0   c   5   0
  tty0600 0   0   12  c   4   0
  cdrom   777 0   0   3   l   /dev/scd0
  console 600 0   0   1   c   5   1
  hda 660 0   6   0   b   3   0
  hda1660 0   6   0   b   3   1
  initctl 600 0   0   3   p
  log 666 0   0   15  s
  rtc 644 0   0   0   c   10  135
  ptmx666 0   0   0   c   5   2
  ram 777 0   0   3   l   /dev/ram0
  ram0660 0   6   0   b   1   0
  ram1660 0   6   0   b   1   1
  sda 660 0   6   0   b   8   0
  initrd  660 0   6   1   b   1   250

Full documentation of this filesystem is at
http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html

 I'm not terribly thrilled by
 the idea of passing a file to be read by the kernel, but I also understand
 that if it isn't done before mount, you have a race condition betweet the
 mount and the load.
What race condition is possible?
Are you worrying that the file gets modified while reading?

  Perhaps write some configfs code so that you can
 'mount /configfs; cat config.file  /configfs/syaoran; mount -t syaoran?
If you worry that the file gets modified while reading in kernel space,
you will also worry that the file gets modified while doing
cat config.file  /configfs/syaoran.

To use configfs (or whatever approach that is done before mount syscall),
some tag for 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Indan Zupancic
Hi Tetsuo,

I think you focus too much on your way of enforcing filename/attributes
pairs. The same can be achieved by creating the device nodes with
expected attributes, and preventing processes from changing those files.
This because expected combinations are known beforehand. And once
those files are present, the MAC system used doesn't have to have special
device nodes attributes support. Protecting those files is enough to
guarantee filename/attributes pairs.

On Tue, January 8, 2008 14:50, Tetsuo Handa wrote:
 Hello.


 Indan Zupancic wrote:
  I want to use this filesystem in case where a process with root privilege
 was
  hijacked but the behavior of the hijacked process is still restricted by
 MAC.

 1) If the behaviour can be controlled, why can't the process be
disallowed to change anything badly in /dev? Like disallowing anything
from modifying existing nodes that weren't created by that process.
That would have practically the same effect as your filesystem,
won't it?
 MAC system can prevent hijacked processes from changing anything badly in /dev
 .
 But MAC system can't prevent hijacked processes from doing
 mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp
 /dev/hda2
 if permissions to rename device nodes in /dev are given to hijacked processes.
 This is because MAC implementation doesn't check filename/attribute pairs.

No, this is because rename permission was given for files that it shouldn't had.
Either you want a process to manage device names and attributes, and then you
give it permission to do that, or you want to enforce certain filename/attribute
pairs and then you just do it yourself.

Will your filesystem prevent the trivial case of

rm /dev/hda1
ln -s /dev/hda2 /dev/hda1


 But this filesystem can prevent hijacked processes from doing
 mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp
 /dev/hda2
 even if permissions to rename device nodes in /dev are given to hijacked
 processes.

Rename permission can be given for /dev in general, but prohibited for
certain files in /dev, the ones you want to have specific attributes.
It isn't all or nothing.


 This filesystem is not designed to
 forbid modifying nodes if that process needn't to modify nodes.
 This filesystem is designed to
 forbid breaking filename/attribute pairs of nodes
 even if that process need to (or permitted to) modify nodes.

It's forbid modifying certain nodes that process needn't to modify
versus forbid breaking filename/attribute pairs of certain nodes.

Both have the same effect, except that the first one is generic and
can be done by existing MAC systems, while the second one needs
a special filesystem and a handful of MAC rules to make it effective.


 2) The MAC system may not be able to guarantee certain combinations
of device names and properties, but isn't that policy that shouldn't
be in the kernel anyway? But if it is, shouldn't all device nodes be
checked? That is, shouldn't it be a global check instead of a filesystem
specific one?
 I think the reason why MAC system doesn't handle filename/attributes pairs is
 that:

 Filename and its attributes pairs are conventionally considered as
 constant and reliable.

 It makes the MAC's policy syntax complicated to describe this attribute
 enforcement information in MAC's policy.

 Thus, this should be a global check. But usually device nodes are only in /dev
 .

It doesn't matter where they are, it's that a different fs than yours could be
mounted over it. You say a MAC can prevent that from happening, but a
MAC can also prevent all processes except for udev from modifying /dev.
Done globally instead of as a filesystem it can actually guarantee name/attr
pairs, now it can't even do that on its own.


 3) Code efficiency. Thousand lines of code just to close one very specific
attack, which can be done in lots of different other ways that all need
to be prevented by the MAC system. (mounting over it, intercepting open
calls, duping the fd, etc.) Is it worth it?
 This filesystem is doing what MAC system is not doing.
 So, please don't complain about inability of this filesystem to close all
 attacks.

I don't. What I complain about is that it's too specific and does it one chosen
job badly. It lacks abstraction. As far as I can see any decent MAC can achieve
the same end result as your filesystem, without directly enforcing name/attr
pairs.

 You can use MAC system to prevent attackers from mounting other filesystem
 over this filesystem.

 The filename/attribute pairs are something like system call entry tables.
 The application will go wrong if __NR_read is mapped to sys_write() and
 __NR_write is mapped to sys_read().
 Userland applications access special functionalities (e.g. /dev/zero and
 /dev/random)
 by name (i.e. syscall numbers). Therefore, keeping the filename/attribute
 pairs
 tamper-proof is important.

 You recognize that there is a threat that device 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 I think you focus too much on your way of enforcing filename/attributes
 pairs.
So?

 The same can be achieved by creating the device nodes with
 expected attributes, and preventing processes from changing those files.
The device nodes have to be deletable if some process (including udev) needs to 
delete.
Thus, you cannot unconditionally prevent processes from changing those files.

 This because expected combinations are known beforehand.
Yes.

 And once those files are present, the MAC system used doesn't have to have 
 special
 device nodes attributes support. Protecting those files is enough to
 guarantee filename/attributes pairs.
If MAC system needn't to support this filesystem's functionality,
who creates those files with warrantee of expected attributes? The udev does?
If udev is exploited, who can guarantee?

 No, this is because rename permission was given for files that it shouldn't 
 had.
Do you think all MAC implementation have the same granularity and 
functionalities?
I don't think so. Not all MAC implementation can control with such granularity.
This filesystem is designed to be combined with any MAC,
although the MAC used with this filesystem should be able to restrict
namespace manipulation requests so that this filesystem can remain /dev
and visible to userland applications.

 Either you want a process to manage device names and attributes, and then you
 give it permission to do that, or you want to enforce certain 
 filename/attribute
 pairs and then you just do it yourself.
If I modify udev to enforce certain filename/attribute pairs and the modified 
udev
was exploited, who can guarantee?
Don't trust userland application is the basis of restricting access in kernel 
space.
If you can trust userland application, you don't need in-kernel access control.


 Will your filesystem prevent the trivial case of
 
 rm /dev/hda1
 ln -s /dev/hda2 /dev/hda1
 
Of course. To permit the above operation, the following permissions are needed.

  hda1660 0   6   2   b   3   1
  hda1777 0   0  33   l   .

 Rename permission can be given for /dev in general, but prohibited for
 certain files in /dev, the ones you want to have specific attributes.
 It isn't all or nothing.
Do you think all MAC implementation can prohibit renaming for certain files in 
/dev ?

 It's forbid modifying certain nodes that process needn't to modify
 versus forbid breaking filename/attribute pairs of certain nodes.
 
 Both have the same effect, except that the first one is generic and
 can be done by existing MAC systems, while the second one needs
 a special filesystem and a handful of MAC rules to make it effective.
Do you think all MAC implementation can do?
I think the first one is implementation specific and the second one is generic.

 It doesn't matter where they are, it's that a different fs than yours could be
 mounted over it. You say a MAC can prevent that from happening, but a
 MAC can also prevent all processes except for udev from modifying /dev.
But MAC cannot prevent udev from modifying /dev . And what if exploited?
Not all MAC can enforce access control over all processes with the granularity
you are talking. And what if a process that cannot be controlled with your
boolean level granularity exists (e.g. an administrator running his/her
administrative applications that require modification of /dev )?

A crazy example of administrative applications:
(Please don't say Don't use such crazy application.)

  #! /bin/sh
  rm -f /dev/either-null-or-zero
  read
  mknod /dev/either-null-or-zero c 1 $REPLY  echo Administrative task 
finished successfully. | mail root

This filesystem can guarantee /dev/either-null-or-zero is either char-1-3 or 
char-1-5 by using a policy

  either-null-or-zero666 0   0   3   c   1   3
  either-null-or-zero666 0   0  35   c   1   5

The boolean level granularity (e.g. forbid all processes except for udev ,
and modify udev to perform name/attribute pair enforcement) is not generic.
Userland application sometimes misbehaves.
I assume kernel process doesn't misbehave.
If you doubt my assumption, you have to doubt in-kernel MAC implementation too.

 I don't. What I complain about is that it's too specific and does it one 
 chosen
 job badly. It lacks abstraction. As far as I can see any decent MAC can 
 achieve
 the same end result as your filesystem, without directly enforcing name/attr
 pairs.
Can SELinux guarantee the same result as my filesystem even if udev or
administrative programs have to be able to modify /dev ?

 The thing is, all special device nodes that are expected to exist by 
 applications
 are known beforehand.
Yes.

 Thus they can be created statically and can be protected
 against any modifications with any MAC system.
But sometimes some modifications needs to be permitted.
Who can guarantee that there is no application (other than udev)

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

[EMAIL PROTECTED] wrote:
 Good summary - probably should add that to the patch, drop it into
 Documentation/syaoran-config.txt or similar...
I see.

 Modification while reading *is* an issue, but can probably be worked around
 with some clever locking.  The race condition I was thinking of was if you
 had the mount and the policy load be 2 separate events, you could see:
 
 (a) issue mount request
 (b) do something malicious in /dev while..
 (c) load the policy that would have prevented (b).
 
 This is partly why SELinux has init load the policy *very* early on, before
 any other userspace have had a chance to run and do things that would have
 been prevented by policy.
So, you suggested to load policy before mount() request so that
this filesystem can prevent attackers from doing something malicious
by minimizing (i.e. implement as non-blocking operation) the latency
between the userland process's call of mount() and the nodes become visible
to userland process.

I didn't take such cases into account.
My assumed usage of this filesystem is that run a script with

 #!/bin/sh
 mount -t syaoran -o accept=/etc/ccs/syaoran.conf none /dev
 exec /sbin/init $@

by passing init=/path/to/this/script to the kernel command line
so that /sbin/init can create /dev/initlog on this filesystem.
If you mount this filesystem after /sbin/init starts,
it will shadow /dev/initctl opened by /sbin/init .

 Which basically ends up meaning that anybody who can trick the mount into
 happening can reset the permitted list and create (for example) a mode 666
 entry for a hard drive, and go scribbling around at will.  Note that you
 don't seem to do any sanity checking on the path (for instance, that each
 component is owned by root, and not world-writable) - so anybody who finds
 a way to get the mount to happen can supply their own list in 
 /home/joeuser/blat
 or /tmp/surprise-mount-list  or wherever.
I assume that being able to reach this location means the caller of mount() is 
root.
But, the patches to allow mount() by non-root is in progress? 
http://lkml.org/lkml/2008/1/8/131
May be I should add some sanity checking on the path.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-06 Thread Tetsuo Handa
Hello.

Changes from previous posting:

 (1) I rebased this patch using tmpfs.

 I didn't know I was making this patch using ramfs...

This patch is for 2.6.24-rc6-mm1.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/Kconfig   |   18 +
 include/linux/shmem_fs.h |5 
 mm/shmem.c   |  124 +++
 mm/shmem_mac.h   |   57 +
 mm/shmem_mac_debug.c |  183 +
 mm/shmem_mac_init.c  |  486 +++
 mm/shmem_mac_main.c  |  205 +++
 7 files changed, 1077 insertions(+), 1 deletion(-)

--- linux-2.6-mm.orig/mm/shmem.c
+++ linux-2.6-mm/mm/shmem.c
@@ -736,11 +736,39 @@ static void shmem_truncate(struct inode 
shmem_truncate_range(inode, inode-i_size, (loff_t)-1);
 }
 
+#ifdef CONFIG_SYAORAN
+#include shmem_mac.h
+#include shmem_mac_init.c
+#include shmem_mac_main.c
+#include shmem_mac_debug.c
+
+static bool with_mac(struct super_block *sb)
+{
+   return sb-s_type == syaoran_fs_type;
+}
+#else
+static inline bool with_mac(struct super_block *sb)
+{
+   return 0;
+}
+#endif
+
 static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
 {
struct inode *inode = dentry-d_inode;
struct page *page = NULL;
int error;
+#ifdef CONFIG_SYAORAN
+   if (with_mac(inode-i_sb)) {
+   unsigned int flags = 0;
+   if (attr-ia_valid  (ATTR_UID | ATTR_GID))
+   flags |= MAY_CHOWN;
+   if (attr-ia_valid  ATTR_MODE)
+   flags |= MAY_CHMOD;
+   if (syaoran_may_modify_node(dentry, flags))
+   return -EPERM;
+   }
+#endif
 
if (S_ISREG(inode-i_mode)  (attr-ia_valid  ATTR_SIZE)) {
if (attr-ia_size  inode-i_size) {
@@ -1515,6 +1543,10 @@ shmem_get_inode(struct super_block *sb, 
default:
inode-i_op = shmem_special_inode_operations;
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+   if (with_mac(sb))
+   init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode-i_op = shmem_inode_operations;
@@ -1739,8 +1771,15 @@ static int shmem_statfs(struct dentry *d
 static int
 shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 {
-   struct inode *inode = shmem_get_inode(dir-i_sb, mode, dev);
+   struct inode *inode;
int error = -ENOSPC;
+#ifdef CONFIG_SYAORAN
+   if (with_mac(dir-i_sb)) {
+   if (syaoran_may_create_node(dentry, mode, dev)  0)
+   return -EPERM;
+   }
+#endif
+   inode = shmem_get_inode(dir-i_sb, mode, dev);
 
if (inode) {
error = security_inode_init_security(inode, dir, NULL, NULL,
@@ -1792,6 +1831,13 @@ static int shmem_link(struct dentry *old
 {
struct inode *inode = old_dentry-d_inode;
int ret;
+#ifdef 

[PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Tetsuo Handa
Hello.

Changes from previous posting:

 (1) Added kernel config so that users can choose
 whether to compile this filesystem or not.

 I didn't receive any ACK/NACK regarding whether I'm permitted to
 implement this filesystem as an extension to tmpfs or not.
 So, I continued implementing this filesystem as an extension to tmpfs.

 (2) Removed indirect grabbing of blkdev_open() and chrdev_open().

 The previous posting was using indirect approach to call
 blkdev_open() and chrdev_open() so that users can compile
 this filesystem as a module without exporting blkdev_open()
 from fs/block_dev.c and chrdev_open() from fs/char_dev.c .
 But since tmpfs cannot be compiled as a module,
 I changed it to direct accessing.

 (3) Splitted single file into three files.

 syaoran_init.c:  initialization part
 syaoran_main.c:  access control part
 syaoran_debug.c: taking snapshot part

This patch is for 2.6.24-rc6-mm1.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/Kconfig   |   18 +
 fs/ramfs/inode.c |  177 ++
 fs/ramfs/syaoran.h   |   75 ++
 fs/ramfs/syaoran_debug.c |  183 +++
 fs/ramfs/syaoran_init.c  |  568 +++
 fs/ramfs/syaoran_main.c  |  207 +
 6 files changed, 1222 insertions(+), 6 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -36,6 +36,20 @@
 #include asm/uaccess.h
 #include internal.h
 
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+  dev_t dev, bool tmpfs_with_mac);
+
+#define TMPFS_WITH_MAC1
+#define TMPFS_WITHOUT_MAC 0
+#include linux/quotaops.h
+
+#ifdef CONFIG_SYAORAN
+#include syaoran.h
+#include syaoran_init.c
+#include syaoran_main.c
+#include syaoran_debug.c
+#endif
+
 /* some random number */
 #define RAMFS_MAGIC0x858458f6
 
@@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac
 
 struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
 {
+   return __ramfs_get_inode(sb, mode, dev, TMPFS_WITHOUT_MAC);
+}
+
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+  dev_t dev, const bool tmpfs_with_mac)
+{
struct inode * inode = new_inode(sb);
 
if (inode) {
@@ -65,10 +85,18 @@ struct inode *ramfs_get_inode(struct sup
switch (mode  S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+   if (tmpfs_with_mac)
+   init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode-i_op = ramfs_file_inode_operations;
inode-i_fop = ramfs_file_operations;
+#ifdef CONFIG_SYAORAN
+   if (tmpfs_with_mac)
+   init_syaoran_inode(inode, mode);
+#endif
break;
 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Willy Tarreau
Hello,

On Sun, Jan 06, 2008 at 03:20:00PM +0900, Tetsuo Handa wrote:
 Hello.
 
 Changes from previous posting:
 
  (1) Added kernel config so that users can choose
  whether to compile this filesystem or not.
 
  I didn't receive any ACK/NACK regarding whether I'm permitted to
  implement this filesystem as an extension to tmpfs or not.
  So, I continued implementing this filesystem as an extension to tmpfs.
 
  (2) Removed indirect grabbing of blkdev_open() and chrdev_open().
 
  The previous posting was using indirect approach to call
  blkdev_open() and chrdev_open() so that users can compile
  this filesystem as a module without exporting blkdev_open()
  from fs/block_dev.c and chrdev_open() from fs/char_dev.c .
  But since tmpfs cannot be compiled as a module,
  I changed it to direct accessing.
 
  (3) Splitted single file into three files.
 
  syaoran_init.c:  initialization part
  syaoran_main.c:  access control part
  syaoran_debug.c: taking snapshot part
 
 This patch is for 2.6.24-rc6-mm1.
 
 Regards.
 --
 Subject: Simple tamper-proof device filesystem.
 
 The goal of this filesystem is to guarantee that
 applications using well-known device locations under /dev
 get the device they want (e.g. an application that accesses /dev/null can
 always get a character special device with major=1 and minor=3).
 
 This idea sounds silly? Indeed, if you think the root can do whatever
 he/she wants do do. But this filesystem makes sense when used with
 access control mechanisms like MAC (mandatory access control).
 I want to use this filesystem in case where a process with root privilege was
 hijacked but the behavior of the hijacked process is still restricted by MAC.
 
 Why not use FUSE?
 
   Because /dev has to be available through the lifetime of the kernel.
   It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
 
 Why not use SELinux?
 
   Because SELinux doesn't guarantee filename and its attribute.
   As far as I know, no MAC implementation can handle filename and its 
 attribute.
   I guess this is because
 
 Filename and its attributes pairs are conventionally considered as
 constant and reliable.
 
 It makes the MAC's policy syntax complicated to describe this attribute
 enforcement information in MAC's policy.
 
   I want to add functionality that the MACs are missing.
   Instead of adding this functionality per MAC,
   I propose to add it as ground work, to be combined with any MAC.
 
 Why not drop CAP_MKNOD?
 
   Dropping CAP_MKNOD is not enough for emulating this filesystem because
   a process can still rename()/unlink() to break filename and its attributes
   handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
   mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).
 
 This time, I'm implementing this filesystem as an extension to tmpfs
 because what this filesystem does are nothing but check filename and
 its attributes in addition to what tmpfs does.
 
 Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
 ---
  fs/Kconfig   |   18 +
  fs/ramfs/inode.c |  177 ++
  fs/ramfs/syaoran.h   |   75 ++
  fs/ramfs/syaoran_debug.c |  183 +++
  fs/ramfs/syaoran_init.c  |  568 
 +++
  fs/ramfs/syaoran_main.c  |  207 +
  6 files changed, 1222 insertions(+), 6 deletions(-)

Your patch is very confusing. In your description, as well as in the
comments you talk about tmpfs, but your patch does not touch even one
line of tmpfs and only changes ramfs. Even your variables and arguments
refer to tmpfs. The Kconfig entry indicates that the feature depends
on TMPFS too.

Judging from the following comment :
  * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.

I suspect that you confuse both filesystems.
  - ramfs is in fs/ramfs and is always compiled in, you cannot disable it
  - tmpfs is in mm/shmem.c and is optional. It also supports options that
ramfs does not (eg: size) and data may be swapped.

Please understand that I'm not discussing the usefulness of your patch,
I'm just trying to avoid a huge confusion.

Regards,
Willy

 --- linux-2.6-mm.orig/fs/ramfs/inode.c
 +++ linux-2.6-mm/fs/ramfs/inode.c
 @@ -36,6 +36,20 @@
  #include asm/uaccess.h
  #include internal.h
  
 +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
 +dev_t dev, bool tmpfs_with_mac);
 +
 +#define TMPFS_WITH_MAC1
 +#define TMPFS_WITHOUT_MAC 0
 +#include linux/quotaops.h
 +
 +#ifdef CONFIG_SYAORAN
 +#include syaoran.h
 +#include syaoran_init.c
 +#include syaoran_main.c
 +#include syaoran_debug.c
 +#endif
 +
  /* some random number */
  #define RAMFS_MAGIC  0x858458f6
  
 @@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac
  
  struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
  {
 + return 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Tetsuo Handa
Hello.

Willy Tarreau wrote:
 Your patch is very confusing. In your description, as well as in the
 comments you talk about tmpfs, but your patch does not touch even one
 line of tmpfs and only changes ramfs. Even your variables and arguments
 refer to tmpfs. The Kconfig entry indicates that the feature depends
 on TMPFS too.
 
 Judging from the following comment :
   * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
 
 I suspect that you confuse both filesystems.
   - ramfs is in fs/ramfs and is always compiled in, you cannot disable it
   - tmpfs is in mm/shmem.c and is optional. It also supports options that
 ramfs does not (eg: size) and data may be swapped.
 
 Please understand that I'm not discussing the usefulness of your patch,
 I'm just trying to avoid a huge confusion.

Oh, I thought the filesystem mounted by mount -t tmpfs none /tmp is tmpfs
and the source code of tmpfs is located in fs/ramfs directory.
So, I should write the description as an extension to ramfs rather than
an extension to tmpfs.
I'll fix it in next posting.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Willy Tarreau
On Sun, Jan 06, 2008 at 04:36:06PM +0900, Tetsuo Handa wrote:
 Hello.
 
 Willy Tarreau wrote:
  Your patch is very confusing. In your description, as well as in the
  comments you talk about tmpfs, but your patch does not touch even one
  line of tmpfs and only changes ramfs. Even your variables and arguments
  refer to tmpfs. The Kconfig entry indicates that the feature depends
  on TMPFS too.
  
  Judging from the following comment :
* Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
  
  I suspect that you confuse both filesystems.
- ramfs is in fs/ramfs and is always compiled in, you cannot disable it
- tmpfs is in mm/shmem.c and is optional. It also supports options that
  ramfs does not (eg: size) and data may be swapped.
  
  Please understand that I'm not discussing the usefulness of your patch,
  I'm just trying to avoid a huge confusion.
 
 Oh, I thought the filesystem mounted by mount -t tmpfs none /tmp is tmpfs

Yes, that is a tmpfs.

 and the source code of tmpfs is located in fs/ramfs directory.

No, ramfs is what you get by mount -t ramfs none /tmp :-)
You will notice that df will not report your ramfs by default because it
reports zero blocks. But mount or df /tmp will report it.

 So, I should write the description as an extension to ramfs rather than
 an extension to tmpfs.

and please also the comments, macros and variable names in the code, as they
are what confused me first.

 I'll fix it in next posting.

Thanks,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2007-12-31 Thread Serge E. Hallyn
Quoting Tetsuo Handa ([EMAIL PROTECTED]):
 Hello.
 
 Thank you for attending discussion for previous posting
 (starting from http://lkml.org/lkml/2007/12/16/23 ).
 
 The previous posting was for feasibility test to know
 whether this kind of trivial filesystem is acceptable for mainline.
 
 Now, it seems that there is a little chance for accepting.
 Therefore I rebased the patch using the -mm tree.
 
 Regards.
 --
 Subject: Simple tamper-proof device filesystem.
 
 The goal of this filesystem is to guarantee that
 applications using well-known device locations under /dev
 get the device they want (e.g. an application that accesses /dev/null can
 always get a character special device with major=1 and minor=3).
 
 This idea sounds silly? Indeed, if you think the root can do whatever
 he/she wants do do. But this filesystem makes sense when used with
 access control mechanisms like MAC (mandatory access control).
 I want to use this filesystem in case where a process with root privilege was
 hijacked but the behavior of the hijacked process is still restricted by MAC.
 
 Why not use FUSE?
 
   Because /dev has to be available through the lifetime of the kernel.
   It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
 
 Why not use SELinux?
 
   Because SELinux doesn't guarantee filename and its attribute.
   As far as I know, no MAC implementation can handle filename and its 
 attribute.
   I guess this is because
 
 Filename and its attributes pairs are conventionally considered as
 constant and reliable.
 
 It makes the MAC's policy syntax complicated to describe this attribute
 enforcement information in MAC's policy.
 
   I want to add functionality that the MACs are missing.
   Instead of adding this functionality per MAC,
   I propose to add it as ground work, to be combined with any MAC.
 
 Why not drop CAP_MKNOD?
 
   Dropping CAP_MKNOD is not enough for emulating this filesystem because
   a process can still rename()/unlink() to break filename and its attributes
   handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
   mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).
 
 This time, I'm implementing this filesystem as an extension to tmpfs
 because what this filesystem does are nothing but check filename and
 its attributes in addition to what tmpfs does.
 
 Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
 ---
  fs/ramfs/inode.c   |  101 -
  fs/ramfs/syaoran.h | 1066 
 +
  2 files changed, 1160 insertions(+), 7 deletions(-)
 
 --- linux-2.6-mm.orig/fs/ramfs/inode.c
 +++ linux-2.6-mm/fs/ramfs/inode.c
 @@ -35,6 +35,7 @@
  #include linux/sched.h
  #include asm/uaccess.h
  #include internal.h
 +#include syaoran.h
 
  /* some random number */
  #define RAMFS_MAGIC  0x858458f6
 @@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac
 BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | 
 BDI_CAP_EXEC_MAP,
  };
 
 -struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
 +struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev,
 + const int mac)
  {
   struct inode * inode = new_inode(sb);
 
 @@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup
   switch (mode  S_IFMT) {
   default:
   init_special_inode(inode, mode, dev);
 + if (mac) {
 + if (S_ISBLK(mode))
 + inode-i_fop = wrapped_def_blk_fops;
 + else if (S_ISCHR(mode))
 + inode-i_fop = wrapped_def_chr_fops;
 + inode-i_op = syaoran_file_inode_operations;
 + }
   break;
   case S_IFREG:
   inode-i_op = ramfs_file_inode_operations;
   inode-i_fop = ramfs_file_operations;
 + if (mac)
 + inode-i_op = syaoran_file_inode_operations;
   break;
   case S_IFDIR:
   inode-i_op = ramfs_dir_inode_operations;
 @@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup
   break;
   case S_IFLNK:
   inode-i_op = page_symlink_inode_operations;
 + if (mac)
 + inode-i_op = syaoran_symlink_inode_operations;
   break;
   }
   }
   return inode;
  }
 
 +struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
 +{
 + return __ramfs_get_inode(sb, mode, dev, 0);
 +}
 +
  /*
   * File creation. Allocate an inode, and we're done..
   */
 @@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup
  static int
  ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
  {
 - 

[PATCH][RFC] Simple tamper-proof device filesystem.

2007-12-23 Thread Tetsuo Handa
Hello.

Thank you for attending discussion for previous posting
(starting from http://lkml.org/lkml/2007/12/16/23 ).

The previous posting was for feasibility test to know
whether this kind of trivial filesystem is acceptable for mainline.

Now, it seems that there is a little chance for accepting.
Therefore I rebased the patch using the -mm tree.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/ramfs/inode.c   |  101 -
 fs/ramfs/syaoran.h | 1066 +
 2 files changed, 1160 insertions(+), 7 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -35,6 +35,7 @@
 #include linux/sched.h
 #include asm/uaccess.h
 #include internal.h
+#include syaoran.h
 
 /* some random number */
 #define RAMFS_MAGIC0x858458f6
@@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac
  BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | 
BDI_CAP_EXEC_MAP,
 };
 
-struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev,
+   const int mac)
 {
struct inode * inode = new_inode(sb);
 
@@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup
switch (mode  S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+   if (mac) {
+   if (S_ISBLK(mode))
+   inode-i_fop = wrapped_def_blk_fops;
+   else if (S_ISCHR(mode))
+   inode-i_fop = wrapped_def_chr_fops;
+   inode-i_op = syaoran_file_inode_operations;
+   }
break;
case S_IFREG:
inode-i_op = ramfs_file_inode_operations;
inode-i_fop = ramfs_file_operations;
+   if (mac)
+   inode-i_op = syaoran_file_inode_operations;
break;
case S_IFDIR:
inode-i_op = ramfs_dir_inode_operations;
@@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup
break;
case S_IFLNK:
inode-i_op = page_symlink_inode_operations;
+   if (mac)
+   inode-i_op = syaoran_symlink_inode_operations;
break;
}
}
return inode;
 }
 
+struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+{
+   return __ramfs_get_inode(sb, mode, dev, 0);
+}
+
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup
 static int
 ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 {
-   struct inode * inode = ramfs_get_inode(dir-i_sb, mode, dev);
+   struct inode *inode;
int