Hi,

On 27/09/2019 16:39, Masayoshi Mizuma wrote:
From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>

The system which has SVE feature crashed because of
the memory pointed by task->thread.sve_state was destroyed
by someone.

That is because sve_state is freed while the forking the
child process. The child process has the pointer of sve_state
which is same as the parent's because the child's task_struct
is copied from the parent's one. If the copy_process()
fails as an error on somewhere, for example, copy_creds(),
then the sve_state is freed even if the parent is alive.
The flow is as follows.

copy_process
         p = dup_task_struct
             => arch_dup_task_struct
                 *dst = *src;  // copy the entire region.
:
         retval = copy_creds
         if (retval < 0)
                 goto bad_fork_free;
:
bad_fork_free:
...
         delayed_free_task(p);
           => free_task
              => arch_release_task_struct
                 => fpsimd_release_task
                    => __sve_free
                       => kfree(task->thread.sve_state);
                          // free the parent's sve_state

Move child's sve_state = NULL and clearing TIF_SVE flag
to arch_dup_task_struct() so that the child doesn't free the
parent's one.

Cc: sta...@vger.kernel.org
Fixes: bc0ee4760364 ("arm64/sve: Core task context handling")

Looking at the log, it looks like THREAD_INFO_IN_TASK was selected before the bc0ee4760364. So it should be fine to backport for all the Linux tree contain this commit.

Signed-off-by: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>
Reported-by: Hidetoshi Seto <seto.hideto...@jp.fujitsu.com>
Suggested-by: Dave Martin <dave.mar...@arm.com>

I have tested the patch and can confirm that double-free disappeared after the patch is applied:

Tested-by: Julien Grall <julien.gr...@arm.com>

See below for a few comments.

---
  arch/arm64/kernel/process.c | 21 ++++-----------------
  1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index f674f28df..6937f5935 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -323,22 +323,16 @@ void arch_release_task_struct(struct task_struct *tsk)
        fpsimd_release_task(tsk);
  }
-/*
- * src and dst may temporarily have aliased sve_state after task_struct
- * is copied.  We cannot fix this properly here, because src may have
- * live SVE state and dst's thread_info may not exist yet, so tweaking
- * either src's or dst's TIF_SVE is not safe.
- *
- * The unaliasing is done in copy_thread() instead.  This works because
- * dst is not schedulable or traceable until both of these functions
- * have been called.
- */

It would be good to explain in the commit message why tweaking "dst" in arch_dup_task_struct() is fine.

From my understanding, Arm64 used to have thread_info on the stack. So it would not be possible to clear TIF_SVE until the stack is initialized.

Now that the thread_info is part of the task, it should be valid to modify the flag from arch_dup_task_struct().

Note that technically, TIF_SVE does not need to be cleared from arch_dup_task_struct(). It could also be done from copy_thread(). But it is easier to keep the both changes together.

  int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
  {
        if (current->mm)
                fpsimd_preserve_current_state();
        *dst = *src;
+ BUILD_BUG_ON(!IS_ENABLED(CONFIG_THREAD_INFO_IN_TASK));

You may want to add a comment on top explaining why TIF_SVE is cleared here.

+       dst->thread.sve_state = NULL;
+       clear_tsk_thread_flag(dst, TIF_SVE);
+
        return 0;
  }
@@ -351,13 +345,6 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context)); - /*
-        * Unalias p->thread.sve_state (if any) from the parent task
-        * and disable discard SVE state for p:
-        */
-       clear_tsk_thread_flag(p, TIF_SVE);
-       p->thread.sve_state = NULL;
-
        /*
         * In case p was allocated the same task_struct pointer as some
         * other recently-exited task, make sure p is disassociated from


Cheers,

--
Julien Grall

Reply via email to