On Fri, 14 Jun 2013 21:09:47 +0200 Oleg Nesterov <[email protected]> wrote:

> fput() assumes that it can't be called after exit_task_work() but
> this is not true, for example free_ipc_ns()->shm_destroy() can do
> this. In this case fput() silently leaks the file.
> 
> Change it to fallback to delayed_fput_work if task_work_add() fails.
> The patch looks complicated but it is not, it changes the code from
> 
>       if (PF_KTHREAD) {
>               schedule_work(...);
>               return;
>       }
>       task_work_add(...)
> 
> to
>       if (!PF_KTHREAD) {
>               if (!task_work_add(...))
>                       return;
>               /* fallback */
>       }
>       schedule_work(...);
> 
> As for shm_destroy() in particular, we could make another fix but I
> think this change makes sense anyway. There could be another similar
> user, it is not safe to assume that task_work_add() can't fail.
> 
> ...
>
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -306,17 +306,18 @@ void fput(struct file *file)
>  {
>       if (atomic_long_dec_and_test(&file->f_count)) {
>               struct task_struct *task = current;
> +             unsigned long flags;
> +
>               file_sb_list_del(file);
> -             if (unlikely(in_interrupt() || task->flags & PF_KTHREAD)) {
> -                     unsigned long flags;
> -                     spin_lock_irqsave(&delayed_fput_lock, flags);
> -                     list_add(&file->f_u.fu_list, &delayed_fput_list);
> -                     schedule_work(&delayed_fput_work);
> -                     spin_unlock_irqrestore(&delayed_fput_lock, flags);
> -                     return;
> +             if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> +                     init_task_work(&file->f_u.fu_rcuhead, ____fput);
> +                     if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
> +                             return;

A comment here would be useful, explaining the circumstances under
which we fall through to the delayed fput.  This is particularly needed
because kernel/task_work.c is such undocumented crap.

This?

--- 
a/fs/file_table.c~fput-task_work_add-can-fail-if-the-caller-has-passed-exit_task_work-fix
+++ a/fs/file_table.c
@@ -313,6 +313,12 @@ void fput(struct file *file)
                        init_task_work(&file->f_u.fu_rcuhead, ____fput);
                        if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
                                return;
+                       /*
+                        * After this task has run exit_task_work(),
+                        * task_work_add() will fail.  free_ipc_ns()->
+                        * shm_destroy() can do this.  Fall through to delayed
+                        * fput to avoid leaking *file.
+                        */
                }
                spin_lock_irqsave(&delayed_fput_lock, flags);
                list_add(&file->f_u.fu_list, &delayed_fput_list);


>               }
> -             init_task_work(&file->f_u.fu_rcuhead, ____fput);
> -             task_work_add(task, &file->f_u.fu_rcuhead, true);
> +             spin_lock_irqsave(&delayed_fput_lock, flags);
> +             list_add(&file->f_u.fu_list, &delayed_fput_list);
> +             schedule_work(&delayed_fput_work);
> +             spin_unlock_irqrestore(&delayed_fput_lock, flags);

OT: I don't think that schedule_work() needs to be inside the locked
region.  Scalability improvements beckon!

>       }
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to