Hi Reinette, Fenghua,

Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' 
conveys the
guts of this change more quickly!

On 03/12/2020 23:25, Reinette Chatre wrote:
> From: Fenghua Yu <fenghua...@intel.com>
> 
> Currently when moving a task to a resource group the PQR_ASSOC MSR
> is updated with the new closid and rmid in an added task callback.
> If the task is running the work is run as soon as possible. If the
> task is not running the work is executed later

> in the kernel exit path when the kernel returns to the task again.

kernel exit makes me thing of user-space... is it enough to just say:
"by __switch_to() when task is next run"?


> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
> is running is the right thing to do. Queueing work for a task that is
> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
> task is scheduled in) and causing system resource waste with the way in
> which it is implemented: Work to update the PQR_ASSOC register is queued
> every time the user writes a task id to the "tasks" file, even if the task
> already belongs to the resource group. This could result in multiple pending
> work items associated with a single task even if they are all identical and
> even though only a single update with most recent values is needed.
> Specifically, even if a task is moved between different resource groups
> while it is sleeping then it is only the last move that is relevant but
> yet a work item is queued during each move.
> This unnecessary queueing of work items could result in significant system
> resource waste, especially on tasks sleeping for a long time. For example,
> as demonstrated by Shakeel Butt in [1] writing the same task id to the
> "tasks" file can quickly consume significant memory. The same problem
> (wasted system resources) occurs when moving a task between different
> resource groups.
> 
> As pointed out by Valentin Schneider in [2] there is an additional issue with
> the way in which the queueing of work is done in that the task_struct update
> is currently done after the work is queued, resulting in a race with the
> register update possibly done before the data needed by the update is 
> available.
> 
> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
> right after the new closid and rmid are ready during the task movement,
> only if the task is running. If a moved task is not running nothing is
> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
> This is the same way used to update the register when tasks are moved as
> part of resource group removal.

(as t->on_cpu is already used...)

Reviewed-by: James Morse <james.mo...@arm.com>


> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 68db7d2dec8f..9d62f1fadcc3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)


> +static void update_task_closid_rmid(struct task_struct *t)
>  {
> +     int cpu;
>  
> +     if (task_on_cpu(t, &cpu))
> +             smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);


I think:
|       if (task_curr(t))
|               smp_call_function_single(task_cpu(t), _update_task_closid_rmid, 
t, 1);

here would make for an easier backport as it doesn't depend on the previous 
patch.


> +}

[...]

>  static int __rdtgroup_move_task(struct task_struct *tsk,
>                               struct rdtgroup *rdtgrp)
>  {

> +     if (rdtgrp->type == RDTCTRL_GROUP) {
> +             tsk->closid = rdtgrp->closid;
> +             tsk->rmid = rdtgrp->mon.rmid;
> +     } else if (rdtgrp->type == RDTMON_GROUP) {

[...]

> +     } else {

> +             rdt_last_cmd_puts("Invalid resource group type\n");
> +             return -EINVAL;

Wouldn't this be a kernel bug?
I'd have thought there would be a WARN_ON_ONCE() here to make it clear this 
isn't
user-space's fault!


>       }
> -     return ret;
> +
> +     /*
> +      * By now, the task's closid and rmid are set. If the task is current
> +      * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
> +      * group go into effect. If the task is not current, the MSR will be
> +      * updated when the task is scheduled in.
> +      */
> +     update_task_closid_rmid(tsk);
> +
> +     return 0;
>  }


Thanks,

James

Reply via email to