On Wed, Nov 04 2015, Eric Dumazet <eric.duma...@gmail.com> wrote:

> On Tue, 2015-11-03 at 10:41 +0100, Rasmus Villemoes wrote:
>
>> @@ -667,7 +667,7 @@ void do_close_on_exec(struct files_struct *files)
>>              fdt = files_fdtable(files);
>>              if (fd >= fdt->max_fds)
>>                      break;
>> -            set = fdt->close_on_exec[i];
>> +            set = fdt->close_on_exec[i] & fdt->open_fds[i];
>>              if (!set)
>>                      continue;
>>              fdt->close_on_exec[i] = 0;
>
> If you don't bother, why leaving this final fdt->close_on_exec[i] = 0 ?

Thanks, it should go, along with the mentioned memsets. Updated patch below.

Reading dup_fd() I'm even more convinced that we're not relying on any
particular value for close_on_exec bits for unused fds. After

        /*
         * The fd may be claimed in the fd bitmap but not yet
         * instantiated in the files array if a sibling thread
         * is partway through open().  So make sure that this
         * fd is available to the new process.
         */

we only __clear_open_fd(), so the close_on_exec bit may be left set in
the new process.

From: Rasmus Villemoes <li...@rasmusvillemoes.dk>
Date: Tue, 3 Nov 2015 09:43:53 +0100
Subject: [PATCH] vfs: don't bother clearing close_on_exec bit for unused fds

In fc90888d07b8 (vfs: conditionally clear close-on-exec flag) a
conditional was added to __clear_close_on_exec to avoid dirtying a
cache line in the common case where the bit is already clear. However,
AFAICT, we don't rely on the close_on_exec bit being clear for unused
fds, except as an optimization in do_close_on_exec(); if I haven't
missed anything, __{set,clear}_close_on_exec is always called when a
new fd is allocated. At the expense of also reading through ->open_fds
in do_close_on_exec(), we can avoid accessing the close_on_exec bitmap
altogether in close(), which I think is a reasonable trade-off.

The conditional added in the commit above still makes sense to avoid
the dirtying on the allocation paths, but I also think it might make
sense in __set_close_on_exec: I suppose any given app handling a
non-trivial amount of fds uses O_CLOEXEC for either almost none or
almost all of them, so after a while one would reach a sort of
steady-state where bits in ->close_on_exec are almost never flipped.

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>
---
 fs/file.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index c6986dce0334..1bb74923395c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -79,7 +79,6 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable 
*ofdt)
        memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
        memset((char *)(nfdt->open_fds) + cpy, 0, set);
        memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
-       memset((char *)(nfdt->close_on_exec) + cpy, 0, set);
 
        cpy = BITBIT_SIZE(ofdt->max_fds);
        set = BITBIT_SIZE(nfdt->max_fds) - cpy;
@@ -231,7 +230,8 @@ repeat:
 
 static inline void __set_close_on_exec(int fd, struct fdtable *fdt)
 {
-       __set_bit(fd, fdt->close_on_exec);
+       if (!test_bit(fd, fdt->close_on_exec))
+               __set_bit(fd, fdt->close_on_exec);
 }
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
@@ -369,7 +369,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
                int start = open_files / BITS_PER_LONG;
 
                memset(&new_fdt->open_fds[start], 0, left);
-               memset(&new_fdt->close_on_exec[start], 0, left);
        }
 
        rcu_assign_pointer(newf->fdt, new_fdt);
@@ -644,7 +643,6 @@ int __close_fd(struct files_struct *files, unsigned fd)
        if (!file)
                goto out_unlock;
        rcu_assign_pointer(fdt->fd[fd], NULL);
-       __clear_close_on_exec(fd, fdt);
        __put_unused_fd(files, fd);
        spin_unlock(&files->file_lock);
        return filp_close(file, files);
@@ -667,10 +665,9 @@ void do_close_on_exec(struct files_struct *files)
                fdt = files_fdtable(files);
                if (fd >= fdt->max_fds)
                        break;
-               set = fdt->close_on_exec[i];
+               set = fdt->close_on_exec[i] & fdt->open_fds[i];
                if (!set)
                        continue;
-               fdt->close_on_exec[i] = 0;
                for ( ; set ; fd++, set >>= 1) {
                        struct file *file;
                        if (!(set & 1))
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to