I'm sponsoring the following fast-track for Tom Haynes.

The timer is set for Thursday, 4 Oct, 2007.

This case seeks patch binding (to match PSARC 2007/416).

        Rich


Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
This information is Copyright 2007 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         Add S_IFTRIGGER to st_mode
    1.2. Name of Document Author/Supplier:
         Author:  Thomas Haynes
    1.3  Date of This Document:
        27 September, 2007
4. Technical Description

== PROBLEM OVERVIEW

nftw(3C) is a routine in libc which is the "new file tree walk". It
recursively calls walk() to traverse a directory tree. One main
consumer of it is find(1).

There are several flags that control how walk() behaves:

        FTW_MOUNT directs walk() not to cross mountpoints

        FTW_PHYS directs walk() not to follow symbolic links.

The walk() routine uses stat() to test each component that it
encounters to ensure that it does not violate the requested behavior.

The following code snippet succinctly captures the security test and
the window of opportunity:

        struct stat     statPre;
        struct stat     statFile;
        DIR             *pdir;

        stat(szPath, &statPre);
        pdir = opendir(szPath);
        fstat(pdir->dd_fd, &statFile);

        if (statPre.st_ino != statFile.st_ino ||
            statPre.st_dev != statFile.st_dev) {
                return(EAGAIN);
        }

There is a window between the stat() and opendir() calls when the user
might move directory contents (an innocent case we need to avoid) or
use a symlink to get outside of the directory hierarchy (a security
breach).  If the results of the stat() do not match those of the
fstat(), then assume that there is some problem and return to the
caller.

find(1) will for example report:

        find: cannot open /mnt: Resource temporarily unavailable

A problem with this test occurs when the filesystem is of type "autofs"
(PSARC 1992/024). In that case, the directory entry, whose name is
given by szPath,  is a trigger mount - a mount occurs when the
directory is entered.  By definition, getting attributes on the
directory (i.e., stat()) does not constitute entering the directory,
but the opendir() does, which triggers an autofs mount.

This leads to a false positive case. The code is not able to detect
that a trigger mount occured beneath it - the st_ino and st_dev are
expected to not match. As expected, if the user were to immediately
retry the application, it would now succeed. The mount has been
established and the results from the stat() will match the fstat().

The current code addresses this by doing a strcmp() on st_fstype to
determine if it is an autofs filesystem (see fix 6198351). If so, then
statPre is refreshed after the opendir(). This is safe in that the
kernel owns the contents of the autofs filesystem.

If we add the test from the current code for ntfw()/walk(), the code
snippet would now look like this:

        struct stat     statPre;
        struct stat     statFile;
        DIR             *pdir;

        stat(szPath, &statPre);
        pdir = opendir(szPath);

        if (statPre.st_fstype[0] == 'a' &&
            strcmp(statPre.st_fstype, "autofs") == 0) {
                /*
                 * this dir is on autofs
                 */
                fstat(pdir->fd->dd_fd, &statPre)
        }

        fstat(pdir->dd_fd, &statFile);

        if (statPre.st_ino != statFile.st_ino ||
            statPre.st_dev != statFile.st_dev) {
                return(EAGAIN);
        }

With the addition of mirror mounts for NFSv4 (see PSARC 2007/416), we
have another case where trigger mounts can cause a false positive.
Also note that other NFSv4 features, such as referrals and migration
will employ trigger mounts as the integral interface to remote
filesystems. 

We could once again try checking the st_fstype for "nfs4" to
for exception checking, but this check will fail for these reasons:

    1) st_fstype for "nfs3" and "nfs4" is truncated to "nfs" for
    backwards compatibility in 3rd party applications. I.e., this would
    lead to us allowing exemptions for all directory entries on all
    versions of nfs.

    The problem is that we can only allow exemptions for directories
    which are "nfs4" and mirror mount trigger points.

    2) All nfs filesystems are not strictly controlled in the kernel as
    with the autofs filesystem. I.e., it is possible for an user
    application to mangle the directory tree.

    The point here is that an autofs filesystem is not directly
    writeable by the user. The only objects in an autofs filesystem are
    automount trigger points, and then cannot be manipulated.

    The user can not move directory hierarchies around in an autofs
    filesystem. So walk() can be a bit relaxed. With a nfs filesystem,
    walk() does not have that luxury.


=== PROPOSED SOLUTION

The solution is to determine if the directory is a trigger mount before
calling opendir(). If so, then we refresh statPre.

In order to do this, we propose to add a new bit, S_IFTRIGGER, to the
st_mode field of the struct stat to identify the trigger mount.

In particular, we would add to sys/stat.h:

#define S_IFTRIGGER       0x20000 /* Operations can trigger a mount */
#define S_ISTRIGGER(mode) (((mode)&0xF0000) == 0x20000)

By keeping S_IFTRIGGER above S_IFMT, we keep any conflicts from
occurring.  I.e., we need to be able to detect an entry is both
S_IFTRIGGER and S_IFDIR.

Also, S_IFTRIGGER would only be set in the kernel. It would not be
stored on disk.

The code snippet would now look like this:

        struct stat     statPre;
        struct stat     statFile;
        DIR             *pdir;

        stat(szPath, &statPre);
        pdir = opendir(szPath);

        if (S_ISTRIGGER(statPre.st_mode)) {
                stat(szPath, &statPre);
        }

        fstat(pdir->dd_fd, &statFile);

        if (statPre.st_ino != statFile.st_ino ||
            statPre.st_dev != statFile.st_dev) {
                return(EAGAIN);
        }


=== EXPORTED INTERFACE TABLE

                        |Proposed       |Specified      |
                        |Stability      |in what        |
Interface Name          |Classification |Document?      | Comments
===============================================================================
                        |               |This           | 
                        | Committed     |Document       | 
S_IFTRIGGER             |               |               | New bit value 
S_ISTRIGGER()           |               |               | and test macro
                        |               |               | for st_mode field
                        |               |               | in struct stat


=== MAN PAGE UPDATE TO stat(2)

Existing stat(2):

     st_mode       The mode of the  file  as  described  for  the
                   mknod()  function.  In  addition  to the modes
                   described on the  mknod(2)  manual  page,  the
                   mode  of  a  file  can also be S_IFSOCK if the
                   file is a socket, S_IFDOOR if the  file  is  a
                   door,  S_IFPORT  if the file is an event port,
                   or S_IFLNK if the file  is  a  symbolic  link.
                   S_IFLNK  can  be returned either by lstat() or
                   by fstat() when the  AT_SYMLINK_NOFOLLOW  flag
                   is set.

Proposed change:

     st_mode       The mode of the  file  as  described  for  the
                   mknod()  function.  In  addition  to the modes
                   described on the  mknod(2)  manual  page,  the
                   mode  of  a  file  can also be S_IFSOCK if the
                   file is a socket, S_IFDOOR if the  file  is  a
                   door,  S_IFPORT  if the file is an event port,
                   S_IFTRIGGER if the file is a trigger mount
                   point, or S_IFLNK if the file  is  a  symbolic  link.
                   S_IFLNK  can  be returned either by lstat() or
                   by fstat() when the  AT_SYMLINK_NOFOLLOW  flag
                   is set.


6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open


Reply via email to