Given that NetBSD, OpenBSD and DragonFly (as well as Solaris and maybe
others) it'd be nice and worthwhile to implement it too on FreeBSD.

The attached shar archive contains 4 possible implementations of it.
One, a system call (the approach use by the other BSD's), available
here as a loadable kernel module for quick testing.  The remaining 3
others are library versions.  One of them doesn't currently work since
FreeBSD lacks a /proc/<pid>/fd/ that I tried to emulate with /dev/fd/,
both via devfs(5) and fdescfs(5): they seem to lacks some types of
file descriptors...  Another just does what a lot of programs do: try
close() on every possible file descriptor and the other uses sysctl().

The implementation was inspired by the DragonFly code but the semantics
match Open/NetBSD's (EBADF vs EINVAL). Their code is available at:
http://www.dragonflybsd.org/cvsweb/~checkout~/src/sys/kern/kern_descrip.c
http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/sys/kern/kern_descrip.c

Also included in the archive is a timing test along with a regression
test borrowed from OpenSSH.

It was successfully built and tested on FreeBSD 6.2-STABLE.
There's code to make it work in -CURRENT.

A sample run on a Pentium 4 1.7Ghz:

$ make test
Trying closefrom_syscall(3) with 58976 open file descriptors
user    0.000000        sys     0.030874        total   0.030874
Trying closefrom_syscall(3) with 58976 closed file descriptors
user    0.000000        sys     0.000008        total   0.000008

Trying closefrom_sysctl(3) with 58976 open file descriptors
user    0.050941        sys     0.045333        total   0.096274
Trying closefrom_sysctl(3) with 58976 closed file descriptors
user    0.000877        sys     0.000939        total   0.001816

Trying closefrom_brute(3) with 58976 open file descriptors
user    0.037777        sys     0.043793        total   0.081570
Trying closefrom_brute(3) with 58976 closed file descriptors
user    0.026666        sys     0.046383        total   0.073049

closefrom_sysctl() has a a worst-case scenario when a lot of files
are open that may make it slower than closefrom_brute().
Implementations using /proc/<pid>/fd/ are also vulnerable to this.
With no library version guaranteed to be faster, and because of the
various reasons discussed in
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/thread.html
I believe it'd be best to implement it as a system call (which can be
done through fcntl() anyway).

More info is included in the README.

Any ideas, suggestions?
Salutes,
Igh
#!/bin/sh
# This is a shell archive
echo x closefrom
mkdir -p closefrom > /dev/null 2>&1
echo x closefrom/Makefile
sed 's/^X//' > closefrom/Makefile << 'SHAR_END'
XSUBDIR = module test
X
X.include <bsd.subdir.mk>
SHAR_END
echo x closefrom/README
sed 's/^X//' > closefrom/README << 'SHAR_END'
XOVERVIEW
X
XThis tarball contains 4 possible implementations of closefrom().
XThe first, a system call, is located in ./module/syscall.c and is
Xavailable as a kernel module for quick testing.
X
XBoth NetBSD >= 3.0 and DragonFly >= 1.4 implement it as a system call.
XIn NetBSD, it uses the F_CLOSEM fcntl(), available since version 2.0.
X
XThe second, implemented with the kern.file sysctl(), is available
Xon both FreeBSD >= 5.0 and DragonFly >= 1.2.  Dynamic memory should be
Xallocated for an array of "struct xfile" structures that describes each
Xopen file descriptor open file descriptor _for every running process_ in
Xthe system...! (Note: the sysctl(3) manpage should be patched to reflect
Xthe current behaviour since FreeBSD 5.0: it should mention struct xfile).
XIn my system, the size of this structure is 52 bytes, so it could fail
Xon systems that setup a larger kern.maxfiles.  This function would be
Xcleaner to implement in NetBSD which has an (undocumented) kern.file2
Xthat lets you work with a specific pid instead by passing KERN_FILE_BYPID.
X
XThe third is the usual brute force approach that uses getdtablesize(),
Xused for reference on the approach most applications take.
X
XThe fourth tries to do what some implementations (including Solaris') do
Xby browsing /proc/<pid>/fd/ but using /dev/fd/.  Unfortunately, it doesn't
Xwork because neither devfs(5) nor fdescfs(5) seem to include duplicated
Xfile descriptors, sockets and maybe others.
X
X-o-
X
XIt was successfully built and tested on FreeBSD 6.2-STABLE (as of
XSept, 18 2007), though code that should work on -CURRENT is present
X(namely, the new FILEDESC_S[UN]LOCK macros).
X
XTo try the implementations, run these commands as follows:
X
Xcd module
Xmake
Xsudo make load
Xcd ..
Xcd test
Xmake
Xmake check
Xmake test
X
XFor repeated testing of any of the implementations you may run:
X./closefrom syscall
X./closefrom sysctl
X./closefrom brute
X
SHAR_END
echo x closefrom/module
mkdir -p closefrom/module > /dev/null 2>&1
echo x closefrom/test
mkdir -p closefrom/test > /dev/null 2>&1
echo x closefrom/test/closefrom.c
sed 's/^X//' > closefrom/test/closefrom.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X *    notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X *    notice, this list of conditions and the following disclaimer in the
X *    documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <dirent.h>
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <limits.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include <unistd.h>
X#include <sys/types.h>
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/resource.h>
X#include <sys/time.h>
X#include <sys/sysctl.h>
X
X#include <sys/syscall.h>
X#include <sys/module.h>
X
X#define DEBUG
X
Xstatic void
Xusage(const char *argv0)
X{
X       fprintf(stderr, "Usage: %s syscall|sysctl|brute|devfd\n"
X               "Usage: %s check\n", argv0, argv0);
X       exit(1);
X}
X
Xstatic int (*closefrom)(int);  /* pointer to closefrom_xxx() */
X
X/*
X * LKM version of closefrom()
X */
X
Xstatic int syscall_num;
X
Xstatic void
Xfind_module(void)
X{
X       struct module_stat stat;
X       int modid;
X
X       modid = modfind("closefrom");
X       if (modid == -1)
X               err(1, "modfind(closefrom)");
X
X       stat.version = sizeof(stat);
X       if (modstat(modid, &stat) == -1)
X               err(1, "modstat()");
X
X       syscall_num = stat.data.intval;
X}
X
Xstatic int
Xclosefrom_syscall(int lowfd)
X{
X       return (syscall(syscall_num, lowfd));
X}
X
X/*
X * This version uses the kern.file sysctl()
X */
Xstatic int
Xclosefrom_sysctl(int lowfd)
X{
X       int mib[2] = { CTL_KERN, KERN_FILE };
X       struct xfile *files = NULL;
X       pid_t pid = getpid();
X       size_t fsize;
X       int i, nfiles;
X
X       if (lowfd < 0) {
X               errno = EBADF;
X               return (-1);
X       }
X
X       for (;;) {
X               if (sysctl(mib, 2, files, &fsize, NULL, 0) == -1) {
X                       if (errno != ENOMEM)
X                               goto bad;
X                       else if (files != NULL) {
X                               free(files);
X                               files = NULL;
X                       }
X               } else if (files == NULL) {
X                       files = (struct xfile *) malloc(fsize);
X                       if (files == NULL)
X                               return (-1);
X               } else
X                       break;
X       }
X
X        /* XXX This structure may change */
X       if (files->xf_size != sizeof(struct xfile) ||
X               fsize % sizeof(struct xfile))
X       {
X               errno = ENOSYS;
X               goto bad;
X       }
X
X       nfiles = fsize / sizeof(struct xfile);
X
X       for (i = 0; i < nfiles; i++)
X               if (files[i].xf_pid == pid && files[i].xf_fd >= lowfd)
X                       if (close(files[i].xf_fd) < 0 && errno == EINTR)
X                               goto bad;
X
X       free(files);
X       return (0);
X
Xbad:
X       if (files != NULL) {
X               int save_errno = errno;
X               free(files);
X               errno = save_errno;
X       }
X       return (-1);
X}
X
X/*
X * This version iterates over all possible file descriptors >= lowfd
X */
Xstatic int
Xclosefrom_brute(int lowfd)
X{
X       int fd;
X
X       if (lowfd < 0) {
X               errno = EBADF;
X               return (-1);
X       }
X
X       for (fd = getdtablesize(); fd >= lowfd; fd--)
X               if (close(fd) < 0 && errno == EINTR)
X                       return (-1);
X
X       return (0);
X}
X
X/*
X * An example implementation using /dev/fd (other systems use /proc/<pid>/fd)
X * Unfortunately, on FreeBSD, fdescf(5) doesn't include duplicated file
X * descriptors and sockets.
X */
Xstatic int
Xclosefrom_devfd(int lowfd)
X{
X       struct dirent *d;
X       DIR *dir;
X       int fd;
X
X       if (lowfd < 0) {
X               errno = EBADF;
X               return (-1);
X       }
X
X       /*
X        * Close lowfd so we have a spare fd to use with /dev/fd
X        */
X       close(lowfd++);
X
X       if ((dir = opendir("/dev/fd")) == NULL)
X               return (-1);
X
X       while ((d = readdir(dir)) != NULL) {
X#ifdef DEBUG
X               printf("%s\n", d->d_name);
X#endif
X               if (d->d_name[0] == '.')
X                       continue;
X               fd = atoi(d->d_name);
X               if (fd >= lowfd && fd != dirfd(dir))
X                       if (close(fd) < 0 && errno == EINTR)
X                               goto bad;
X       }
X
X       (void)closedir(dir);
X       return (0);
X
Xbad:
X       {
X               int save_errno = errno;
X               (void)closedir(dir);
X               errno = save_errno;
X               return (-1);
X       }
X}
X
Xstatic void
Xtime_closefrom(int lowfd)
X{
X       struct rusage ru, rux;
X       struct timeval tv;
X       double usecs, ssecs;
X
X       if (getrusage(RUSAGE_SELF, &ru) < 0)
X               err(1, "getrusage()");
X       if (closefrom(lowfd) < 0)
X               err(1, "closefrom()");
X       if (getrusage(RUSAGE_SELF, &rux) < 0)
X               err(1, "getrusage()");
X
X       timersub(&rux.ru_utime, &ru.ru_utime, &tv);
X       usecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X       printf("user\t%f\t", usecs);
X       timersub(&rux.ru_stime, &ru.ru_stime, &tv);
X       ssecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X       printf("sys\t%f\t", ssecs);
X       usecs += ssecs;
X       printf("total\t%f\n", usecs);
X}
X
Xstatic void
Xtry(int (*xclosefrom)(int), const char *str)
X{
X       int fd, lowfd, maxfd;
X
X       lowfd = dup(STDIN_FILENO);
X       maxfd = getdtablesize();
X       for (fd = 1; fd < maxfd; fd++)
X               if (dup(STDIN_FILENO) < 0)
X                       break;
X
X       closefrom = xclosefrom;
X       printf("Trying %s(%d) with %d open file descriptors\n", str, lowfd, fd);
X       time_closefrom(lowfd);
X
X       printf("Trying %s(%d) with %d closed file descriptors\n", str, lowfd, 
fd);
X       time_closefrom(lowfd);
X       printf("\n");
X}
X
Xint test(int (*)(int));
X
Xint
Xmain(int argc, char *argv[])
X{
X       if (argv[1] == NULL)
X               usage(argv[0]);
X
X       if (!strcmp(argv[1], "check")) {
X               find_module();
X               printf("testing closefrom_syscall():\t%s\n",
X                       test(&closefrom_syscall) ? "failed" : "ok");
X               printf("testing closefrom_sysctl():\t%s\n",
X                       test(&closefrom_sysctl) ? "failed" : "ok");
X               printf("testing closefrom_brute():\t%s\n",
X                       test(&closefrom_brute) ? "failed" : "ok");
X       }
X       else if (!strcmp(argv[1], "syscall")) {
X               find_module();
X               try(&closefrom_syscall, "closefrom_syscall");
X       }
X       else if (!strcmp(argv[1], "sysctl"))
X               try(&closefrom_sysctl, "closefrom_sysctl");
X       else if (!strcmp(argv[1], "devfd"))
X               try(&closefrom_devfd, "closefrom_devfd");
X       else if (!strcmp(argv[1], "brute"))
X               try(&closefrom_brute, "closefrom_brute");
X       else
X               usage(argv[0]);
X
X       return (0);
X}
X
X/*
X * NOTE:
X *   The following code was adapted from OpenSSH's
X *   openbsd-compat/regress/closefromtest.c
X */
X
X/*
X * Copyright (c) 2006 Darren Tucker
X *
X * Permission to use, copy, modify, and distribute this software for any
X * purpose with or without fee is hereby granted, provided that the above
X * copyright notice and this permission notice appear in all copies.
X *
X * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
X * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
X * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
X * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
X * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
X * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
X */
X
X#define NUM_OPENS 10
X
X#define fail(str)      \
X    do { printf("%s\n", (str));        \
X       return -1; } while(0)
X
Xint
Xtest(int (*xclosefrom)(int))
X{
X       int i, max, fds[NUM_OPENS];
X       char buf[512];
X
X       for (i = 0; i < NUM_OPENS; i++)
X               if ((fds[i] = open("/dev/null", O_RDONLY)) == -1)
X                       exit(0);        /* can't test */
X       max = i - 1;
X
X       /* should close last fd only */
X       xclosefrom(fds[max]);
X       if (close(fds[max]) != -1)
X               fail("failed to close highest fd");
X
X       /* make sure we can still use remaining descriptors */
X       for (i = 0; i < max; i++)
X               if (read(fds[i], buf, sizeof(buf)) == -1)
X                       fail("closed descriptors it should not have");
X
X       /* should close all fds */
X       xclosefrom(fds[0]);
X       for (i = 0; i < NUM_OPENS; i++)
X               if (close(fds[i]) != -1)
X                       fail("failed to close from lowest fd");
X
X       return 0;
X}
SHAR_END
echo x closefrom/test/Makefile
sed 's/^X//' > closefrom/test/Makefile << 'SHAR_END'
XPROG   = closefrom
XNO_MAN =
X
XCFLAGS = -Wall -O2
X
Xcheck: ${PROG}
X       @./${PROG} check
X
Xtest:  ${PROG}
X       @./${PROG} syscall
X       @./${PROG} sysctl
X       @./${PROG} brute
X
X.include <bsd.prog.mk>
SHAR_END
echo x closefrom/module/Makefile
mkdir -p closefrom/module > /dev/null 2>&1
sed 's/^X//' > closefrom/module/Makefile << 'SHAR_END'
XKMOD   = syscall
XSRCS   = syscall.c vnode_if.h
X
XCFLAGS += -Wall
X
Xreload:
X       @${MAKE} unload
X       @${MAKE} load
X
X.include <bsd.kmod.mk>
SHAR_END
echo x closefrom/module/syscall.c
sed 's/^X//' > closefrom/module/syscall.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X *    notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X *    notice, this list of conditions and the following disclaimer in the
X *    documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/filedesc.h>
X#include <sys/kernel.h>
X#include <sys/proc.h>
X#include <sys/syscallsubr.h>
X#include <sys/sysent.h>
X#include <sys/systm.h>
X#include <sys/vnode.h>
X#include <sys/module.h>
X
X/*
X * Newer code in FreeBSD > 6.2 use shared/exclusive locks
X */
X#ifndef FILEDESC_SLOCK
X#define FILEDESC_SLOCK         FILEDESC_LOCK_FAST
X#define FILEDESC_SUNLOCK       FILEDESC_UNLOCK_FAST
X#endif
X
X/*
X * kern_closefrom()
X */
Xstatic int
Xkern_closefrom(struct thread *td, int lowfd)
X{
X       struct filedesc *fdp;
X       int fd;
X
X       /*
X        * Note: NetBSD uses EBADF and Dragonly uses (undocumented) EINVAL
X        */
X       if (lowfd < 0)
X               return (EBADF);
X
X       fdp = td->td_proc->p_fd;
X
X       FILEDESC_SLOCK(fdp);
X       while ((fd = fdp->fd_lastfile) >= lowfd) {
X               FILEDESC_SUNLOCK(fdp);
X               if (kern_close(td, fd) == EINTR)
X                       return (EINTR);
X               FILEDESC_SLOCK(fdp);
X       }
X       FILEDESC_SUNLOCK(fdp);
X
X       return (0);
X}
X
X/* closefrom() arguments */
Xstruct closefrom_args {
X       int fd;
X};
X
Xstatic int
Xclosefrom(struct thread *td, void *args)
X{
X       struct closefrom_args *uap = (struct closefrom_args *)args;
X
X       return (kern_closefrom(td, uap->fd));
X}
X
X/* closefrom() sysent[] */
Xstatic struct sysent closefrom_sysent = {
X       1,              /* number of arguments */
X       closefrom       /* implementing function */
X};
X
X/*
X * LKM stuff
X */
X
X/* offset in sysent[] where the syscall will be allocated */
Xstatic int offset = NO_SYSCALL;
X
Xstatic int
Xload(struct module *module, int cmd, void *arg)
X{
X       int error = 0;
X
X       switch (cmd) {
X       case MOD_LOAD:
X               uprintf("closefrom loaded at offset %d\n", offset);
X               break;
X
X       case MOD_UNLOAD:
X               uprintf("closefrom unloaded from offset %d\n", offset);
X               break;
X
X       default:
X               error = EOPNOTSUPP;
X               break;
X       }
X
X       return (error);
X}
X
XSYSCALL_MODULE(closefrom, &offset, &closefrom_sysent, load, NULL);
SHAR_END
exit
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to