On Wed 17-05-17 10:11:39, Vlastimil Babka wrote:
> When updating task's mems_allowed and rebinding its mempolicy due to cpuset's
> mems being changed, we currently only take the seqlock for writing when either
> the task has a mempolicy, or the new mems has no intersection with the old
> mems. This should be enough to prevent a parallel allocation seeing no
> available nodes, but the optimization is IMHO unnecessary (cpuset updates
> should not be frequent), and we still potentially risk issues if the
> intersection of new and old nodes has limited amount of free/reclaimable
> memory. Let's just use the seqlock for all tasks.

Agreed

> Signed-off-by: Vlastimil Babka <vba...@suse.cz>

Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  kernel/cgroup/cpuset.c | 29 ++++++++---------------------
>  1 file changed, 8 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index dfd5b420452d..26a1c360a481 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1038,38 +1038,25 @@ static void cpuset_post_attach(void)
>   * @tsk: the task to change
>   * @newmems: new nodes that the task will be set
>   *
> - * In order to avoid seeing no nodes if the old and new nodes are disjoint,
> - * we structure updates as setting all new allowed nodes, then clearing newly
> - * disallowed ones.
> + * We use the mems_allowed_seq seqlock to safely update both 
> tsk->mems_allowed
> + * and rebind an eventual tasks' mempolicy. If the task is allocating in
> + * parallel, it might temporarily see an empty intersection, which results in
> + * a seqlock check and retry before OOM or allocation failure.
>   */
>  static void cpuset_change_task_nodemask(struct task_struct *tsk,
>                                       nodemask_t *newmems)
>  {
> -     bool need_loop;
> -
>       task_lock(tsk);
> -     /*
> -      * Determine if a loop is necessary if another thread is doing
> -      * read_mems_allowed_begin().  If at least one node remains unchanged 
> and
> -      * tsk does not have a mempolicy, then an empty nodemask will not be
> -      * possible when mems_allowed is larger than a word.
> -      */
> -     need_loop = task_has_mempolicy(tsk) ||
> -                     !nodes_intersects(*newmems, tsk->mems_allowed);
>  
> -     if (need_loop) {
> -             local_irq_disable();
> -             write_seqcount_begin(&tsk->mems_allowed_seq);
> -     }
> +     local_irq_disable();
> +     write_seqcount_begin(&tsk->mems_allowed_seq);
>  
>       nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
>       mpol_rebind_task(tsk, newmems);
>       tsk->mems_allowed = *newmems;
>  
> -     if (need_loop) {
> -             write_seqcount_end(&tsk->mems_allowed_seq);
> -             local_irq_enable();
> -     }
> +     write_seqcount_end(&tsk->mems_allowed_seq);
> +     local_irq_enable();
>  
>       task_unlock(tsk);
>  }
> -- 
> 2.12.2

-- 
Michal Hocko
SUSE Labs

Reply via email to