I'm sponsoring the following fast-track for Tom Haynes.
The timer is set for Thursday, 4 Oct, 2007.
This case seeks patch binding (to match PSARC 2007/416).
Rich
Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
This information is Copyright 2007 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
Add S_IFTRIGGER to st_mode
1.2. Name of Document Author/Supplier:
Author: Thomas Haynes
1.3 Date of This Document:
27 September, 2007
4. Technical Description
== PROBLEM OVERVIEW
nftw(3C) is a routine in libc which is the "new file tree walk". It
recursively calls walk() to traverse a directory tree. One main
consumer of it is find(1).
There are several flags that control how walk() behaves:
FTW_MOUNT directs walk() not to cross mountpoints
FTW_PHYS directs walk() not to follow symbolic links.
The walk() routine uses stat() to test each component that it
encounters to ensure that it does not violate the requested behavior.
The following code snippet succinctly captures the security test and
the window of opportunity:
struct stat statPre;
struct stat statFile;
DIR *pdir;
stat(szPath, &statPre);
pdir = opendir(szPath);
fstat(pdir->dd_fd, &statFile);
if (statPre.st_ino != statFile.st_ino ||
statPre.st_dev != statFile.st_dev) {
return(EAGAIN);
}
There is a window between the stat() and opendir() calls when the user
might move directory contents (an innocent case we need to avoid) or
use a symlink to get outside of the directory hierarchy (a security
breach). If the results of the stat() do not match those of the
fstat(), then assume that there is some problem and return to the
caller.
find(1) will for example report:
find: cannot open /mnt: Resource temporarily unavailable
A problem with this test occurs when the filesystem is of type "autofs"
(PSARC 1992/024). In that case, the directory entry, whose name is
given by szPath, is a trigger mount - a mount occurs when the
directory is entered. By definition, getting attributes on the
directory (i.e., stat()) does not constitute entering the directory,
but the opendir() does, which triggers an autofs mount.
This leads to a false positive case. The code is not able to detect
that a trigger mount occured beneath it - the st_ino and st_dev are
expected to not match. As expected, if the user were to immediately
retry the application, it would now succeed. The mount has been
established and the results from the stat() will match the fstat().
The current code addresses this by doing a strcmp() on st_fstype to
determine if it is an autofs filesystem (see fix 6198351). If so, then
statPre is refreshed after the opendir(). This is safe in that the
kernel owns the contents of the autofs filesystem.
If we add the test from the current code for ntfw()/walk(), the code
snippet would now look like this:
struct stat statPre;
struct stat statFile;
DIR *pdir;
stat(szPath, &statPre);
pdir = opendir(szPath);
if (statPre.st_fstype[0] == 'a' &&
strcmp(statPre.st_fstype, "autofs") == 0) {
/*
* this dir is on autofs
*/
fstat(pdir->fd->dd_fd, &statPre)
}
fstat(pdir->dd_fd, &statFile);
if (statPre.st_ino != statFile.st_ino ||
statPre.st_dev != statFile.st_dev) {
return(EAGAIN);
}
With the addition of mirror mounts for NFSv4 (see PSARC 2007/416), we
have another case where trigger mounts can cause a false positive.
Also note that other NFSv4 features, such as referrals and migration
will employ trigger mounts as the integral interface to remote
filesystems.
We could once again try checking the st_fstype for "nfs4" to
for exception checking, but this check will fail for these reasons:
1) st_fstype for "nfs3" and "nfs4" is truncated to "nfs" for
backwards compatibility in 3rd party applications. I.e., this would
lead to us allowing exemptions for all directory entries on all
versions of nfs.
The problem is that we can only allow exemptions for directories
which are "nfs4" and mirror mount trigger points.
2) All nfs filesystems are not strictly controlled in the kernel as
with the autofs filesystem. I.e., it is possible for an user
application to mangle the directory tree.
The point here is that an autofs filesystem is not directly
writeable by the user. The only objects in an autofs filesystem are
automount trigger points, and then cannot be manipulated.
The user can not move directory hierarchies around in an autofs
filesystem. So walk() can be a bit relaxed. With a nfs filesystem,
walk() does not have that luxury.
=== PROPOSED SOLUTION
The solution is to determine if the directory is a trigger mount before
calling opendir(). If so, then we refresh statPre.
In order to do this, we propose to add a new bit, S_IFTRIGGER, to the
st_mode field of the struct stat to identify the trigger mount.
In particular, we would add to sys/stat.h:
#define S_IFTRIGGER 0x20000 /* Operations can trigger a mount */
#define S_ISTRIGGER(mode) (((mode)&0xF0000) == 0x20000)
By keeping S_IFTRIGGER above S_IFMT, we keep any conflicts from
occurring. I.e., we need to be able to detect an entry is both
S_IFTRIGGER and S_IFDIR.
Also, S_IFTRIGGER would only be set in the kernel. It would not be
stored on disk.
The code snippet would now look like this:
struct stat statPre;
struct stat statFile;
DIR *pdir;
stat(szPath, &statPre);
pdir = opendir(szPath);
if (S_ISTRIGGER(statPre.st_mode)) {
stat(szPath, &statPre);
}
fstat(pdir->dd_fd, &statFile);
if (statPre.st_ino != statFile.st_ino ||
statPre.st_dev != statFile.st_dev) {
return(EAGAIN);
}
=== EXPORTED INTERFACE TABLE
|Proposed |Specified |
|Stability |in what |
Interface Name |Classification |Document? | Comments
===============================================================================
| |This |
| Committed |Document |
S_IFTRIGGER | | | New bit value
S_ISTRIGGER() | | | and test macro
| | | for st_mode field
| | | in struct stat
=== MAN PAGE UPDATE TO stat(2)
Existing stat(2):
st_mode The mode of the file as described for the
mknod() function. In addition to the modes
described on the mknod(2) manual page, the
mode of a file can also be S_IFSOCK if the
file is a socket, S_IFDOOR if the file is a
door, S_IFPORT if the file is an event port,
or S_IFLNK if the file is a symbolic link.
S_IFLNK can be returned either by lstat() or
by fstat() when the AT_SYMLINK_NOFOLLOW flag
is set.
Proposed change:
st_mode The mode of the file as described for the
mknod() function. In addition to the modes
described on the mknod(2) manual page, the
mode of a file can also be S_IFSOCK if the
file is a socket, S_IFDOOR if the file is a
door, S_IFPORT if the file is an event port,
S_IFTRIGGER if the file is a trigger mount
point, or S_IFLNK if the file is a symbolic link.
S_IFLNK can be returned either by lstat() or
by fstat() when the AT_SYMLINK_NOFOLLOW flag
is set.
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open