Re: Usecases for the per-task latency-nice attribute

2019-09-27 Thread Pavel Machek
Hi!

> > I don't want to start a bikeshedding session here, but I agree with Parth
> > on the interpretation of the values.
> > 
> > I've always read niceness values as
> > -20 (least nice to the system / other processes)
> > +19 (most nice to the system / other processes)
> > 
> > So following this trend I'd see for latency-nice:
> 
> 
> So jotting down separately, in case if we think to have "latency-nice"
> terminology, then we might need to select one of the 2 interpretation:
> 
> 1).
> > -20 (least nice to latency, i.e. sacrifice latency for throughput)
> > +19 (most nice to latency, i.e. sacrifice throughput for latency)
> > 
> 
> 2).
> -20 (least nice to other task in terms of sacrificing latency, i.e.
> latency-sensitive)
> +19 (most nice to other tasks in terms of sacrificing latency, i.e.
> latency-forgoing)

For the record, interpretation 2 makes sense to me.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: Usecases for the per-task latency-nice attribute

2019-09-20 Thread Parth Shah



On 9/19/19 8:13 PM, Qais Yousef wrote:
> On 09/18/19 18:11, Parth Shah wrote:
>> Hello everyone,
>>
>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type 
>> of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency and we can spend some more
>> time in the kernel to decide a better placement of a task (to save time,
>> energy, etc.)
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
>>
>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32, etc)?
> 
> IMO the main question is who is the intended user of this new knob/API?
> 
> If it's intended for system admins to optimize certain workloads on a system
> then I like the latency-nice range.
> 
> If we want to support application writers to define the latency requirements 
> of
> their tasks then I think latency-nice would be very confusing to use.
> Especially when one has to consider they lack a pre-knowledge about the system
> they will run on; and what else they are sharing the resources with.
> 

Yes, valid point.
But from my view, this will most certainly be for system admins who can
optimize certain workloads from the systemd, tuned or similar OS daemons.

>>
>>
>>
>> This mail is to initiate the discussion regarding the possible usecases of
>> such per task attribute and to come up with a specific name and value for
>> the same.
>>
>> Hopefully, interested one should plot out their usecase for which this new
>> attr can potentially help in solving or optimizing it.
>>
>>
>> Well, to start with, here is my usecase.
>>
>> ---
>> **Usecases**
>> ---
>>
>> $> TurboSched
>> 
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
> 
> 
> $> EAS
> 
> The new knob can help EAS path to switch to spreading behavior when
> latency-nice is set instead of packing tasks on the most energy efficient CPU.
> ie: pick the most energy efficient idle CPU.
> 

+1

Thanks,
Parth

> --
> Qais Yousef
> 



Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Valentin Schneider
On 19/09/2019 17:41, Parth Shah wrote:
> So jotting down separately, in case if we think to have "latency-nice"
> terminology, then we might need to select one of the 2 interpretation:
> 
> 1).
>> -20 (least nice to latency, i.e. sacrifice latency for throughput)
>> +19 (most nice to latency, i.e. sacrifice throughput for latency)
>>
> 
> 2).
> -20 (least nice to other task in terms of sacrificing latency, i.e.
> latency-sensitive)
> +19 (most nice to other tasks in terms of sacrificing latency, i.e.
> latency-forgoing)
> 
> 

I'd vote for 1 (duh) but won't fight for it, if it comes to it I'd be
happy with a random draw :D

>> Aren't we missing the point about tweaking the sched domain scans (which
>> AFAIR was the original point for latency-nice)?
>>
>> Something like default value is current behaviour and
>> - Being less latency-sensitive means increasing the scans (e.g. trending
>>   towards only going through the slow wakeup-path at the extreme setting)
>> - Being more latency-sensitive means reducing the scans (e.g. trending
>>   towards a fraction of the domain scanned in the fast-path at the extreme
>>   setting).
>>
> 
> Correct. But I was pondering upon the values required for this case.
> Is having just a range from [-20,19] even for larger system sufficient enough?
> 

As I said in the original thread by Subhra, this range should be plenty
enough IMO. You get ~5% deltas in each direction after all.

>>>
>>
>> $> Load balance tuning
>> ==
>>
>> Already mentioned these in [4]:
>>
>> - Increase (reduce) nr_balance_failed threshold when trying to active
>>   balance a latency-sensitive (non-latency-sensitive) task.
>>
>> - Increase (decrease) sched_migration_cost factor in task_hot() for
>>   latency-sensitive (non-latency-sensitive) tasks.
>>
> 
> Thanks for listing down your ideas.
> 
> These are pretty useful optimization in general. But one may wonder if we
> reduce the search scans for idle-core in wake-up path and by-chance selects
> the busy core, then one would expect load balancer to move the task to idle
> core.
> 
> If I got it correct, the in such cases, the sched_migration_cost should be
> carefully increased, right?
> 

IIUC you're describing a scenario where we fail to find an idle core due to
a wakee being latency-sensitive (thus shorter scan), and place it on a rq
that already has runnable tasks (despite idle rqs being available).

In this case yes, we could potentially have a balance attempt trying to pull
from that rq. We'd try to pull the non-running tasks first, and if a
latency-sensitive task happens to be one of them we should be careful with
what we do - a migration could lead to unwanted latency.

It might be a bit more clear when you're balancing between busy cores - 
overall I think you should try to migrate the non-latency-sensitive
tasks first. Playing with task_hot() could be one of the ways to do that, but
it's just a suggestion at this time.

> 
 References:
 ===
 [1]. https://lkml.org/lkml/2019/8/30/829
 [2]. https://lkml.org/lkml/2019/7/25/296
>>>
>>>   [3]. Message-ID: <20190905114709.gm2...@hirez.programming.kicks-ass.net>
>>>
>>> https://lore.kernel.org/lkml/20190905114709.gm2...@hirez.programming.kicks-ass.net/
>>>
>>
>> [4]: https://lkml.kernel.org/r/3d3306e4-3a78-5322-df69-7665cf01c...@arm.com
>>
>>>
>>> Best,
>>> Patrick
>>>
> 
> Thanks,
> Parth
> 


Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Parth Shah



On 9/18/19 9:12 PM, Valentin Schneider wrote:
> On 18/09/2019 15:18, Patrick Bellasi wrote:
>>> 1. Name: What should be the name for such attr for all the possible 
>>> usecases?
>>> =
>>> Latency nice is the proposed name as of now where the lower value indicates
>>> that the task doesn't care much for the latency
>>
>> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
>> I think the meaning should be the opposite.
>>
>> A -19 latency-nice task is a task which is not willing to give up
>> latency. For those tasks for example we want to reduce the wake-up
>> latency at maximum.
>>
>> This will keep its semantic aligned to that of process niceness values
>> which range from -20 (most favourable to the process) to 19 (least
>> favourable to the process).
>>
> 
> I don't want to start a bikeshedding session here, but I agree with Parth
> on the interpretation of the values.
> 
> I've always read niceness values as
> -20 (least nice to the system / other processes)
> +19 (most nice to the system / other processes)
> 
> So following this trend I'd see for latency-nice:


So jotting down separately, in case if we think to have "latency-nice"
terminology, then we might need to select one of the 2 interpretation:

1).
> -20 (least nice to latency, i.e. sacrifice latency for throughput)
> +19 (most nice to latency, i.e. sacrifice throughput for latency)
> 

2).
-20 (least nice to other task in terms of sacrificing latency, i.e.
latency-sensitive)
+19 (most nice to other tasks in terms of sacrificing latency, i.e.
latency-forgoing)


> However...
> 
>>> But there seems to be a bit of confusion on whether we want biasing as well
>>> (latency-biased) or something similar, in which case "latency-nice" may
>>> confuse the end-user.
>>
>> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
>> behave as "nice values" to avoid confusions to users. But, if we come up
>> with a different naming maybe we will have more freedom.
>>
> 
> ...just getting rid of the "-nice" would leave us free not to have to
> interpret the values as "nice to / not nice to" :)
> 
>> Personally, I like both "latency-nice" or "latency-tolerant", where:
>>
>>  - latency-nice:
>>should have a better understanding based on pre-existing concepts
>>
>>  - latency-tolerant:
>>decouples a bit its meaning from the niceness thus giving maybe a bit
>>more freedom in its complete definition and perhaps avoid any
>>possible interpretation confusion like the one I commented above.
>>
>> Fun fact: there was also the latency-nasty proposal from PaulMK :)
>>
> 
> [...]
> 
>>
>> $> Wakeup path tunings
>> ==
>>
>> Some additional possible use-cases was already discussed in [3]:
>>
>>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>>depending on crossing certain pre-configured threshold of latency
>>niceness.
>>   
>>  - dynamically bias the vruntime updates we do in place_entity()
>>depending on the actual latency niceness of a task.
>>   
>>PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>>bit there."
>>   
>>  - bias the decisions we take in check_preempt_tick() still depending
>>on a relative comparison of the current and wakeup task latency
>>niceness values.
> 
> Aren't we missing the point about tweaking the sched domain scans (which
> AFAIR was the original point for latency-nice)?
> 
> Something like default value is current behaviour and
> - Being less latency-sensitive means increasing the scans (e.g. trending
>   towards only going through the slow wakeup-path at the extreme setting)
> - Being more latency-sensitive means reducing the scans (e.g. trending
>   towards a fraction of the domain scanned in the fast-path at the extreme
>   setting).
> 

Correct. But I was pondering upon the values required for this case.
Is having just a range from [-20,19] even for larger system sufficient enough?

>>
> 
> $> Load balance tuning
> ==
> 
> Already mentioned these in [4]:
> 
> - Increase (reduce) nr_balance_failed threshold when trying to active
>   balance a latency-sensitive (non-latency-sensitive) task.
> 
> - Increase (decrease) sched_migration_cost factor in task_hot() for
>   latency-sensitive (non-latency-sensitive) tasks.
> 

Thanks for listing down your ideas.

These are pretty useful optimization in general. But one may wonder if we
reduce the search scans for idle-core in wake-up path and by-chance selects
the busy core, then one would expect load balancer to move the task to idle
core.

If I got it correct, the in such cases, the sched_migration_cost should be
carefully increased, right?


>>> References:
>>> ===
>>> [1]. https://lkml.org/lkml/2019/8/30/829
>>> [2]. https://lkml.org/lkml/2019/7/25/296
>>
>>   [3]. Message-ID: <20190905114709.gm2...@hirez.programming.kicks-ass.net>
>>
>> https://lore.kernel.org/lkml/20190905114709.gm2...@hirez.program

Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Tim Chen
On 9/19/19 2:06 AM, David Laight wrote:
> From: Tim Chen
>> Sent: 18 September 2019 18:16
> ...
>> Some users are running machine learning batch tasks with AVX512, and have 
>> observed
>> that these tasks affect the tasks needing a fast response.  They have to
>> rely on manual CPU affinity to separate these tasks.  With appropriate
>> latency hint on task, the scheduler can be taught to separate them.
> 
> Will (or can) the scheduler pre-empt a low priority process that is spinning
> in userspace in order to allow a high priority (or low latency) process run
> on that cpu?
> 
> My suspicion is that the process switch can't happen until (at least) the
> next hardware interrupt - and possibly only a timer tick into the scheduler.
> 

The issue has to do with AVX512 running on the HT sibling, which pulls down
the core frequency.  So latency sensitive tasks are not blocked but are
running concurrently on siblings, but slower.  With latency hint, the scheduler
can try to avoid putting them on the same core.

Tim


Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Tim Chen
On 9/19/19 1:37 AM, Parth Shah wrote:
> 
>>
>> $> Separating AVX512 tasks and latency sensitive tasks on separate cores
>> -
>> Another usecase we are considering is to segregate those workload that will 
>> pull down
>> core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
>> There are certain tasks that need to provide a fast response time (latency 
>> sensitive)
>> and they are best scheduled on cpu that has a lighter load and not have other
>> tasks running on the sibling cpu that could pull down the cpu core frequency.
>>
>> Some users are running machine learning batch tasks with AVX512, and have 
>> observed
>> that these tasks affect the tasks needing a fast response.  They have to
>> rely on manual CPU affinity to separate these tasks.  With appropriate
>> latency hint on task, the scheduler can be taught to separate them.
>>
> 
> Thanks for listing out your usecase.
> 
> This is interesting. If scheduler has the knowledge of AVX512 tasks then
> with these interface the scheduler can refrain from picking such core
> occupying AVX512 tasks for the task with "latency-nice = -19".
> 
> So I guess for this specific use-case, the value for such per-task
> attribute should have range (most probably [-19,20]) and the name
> "latency-nice" also suits the need.

Yes.

> 
> Do you have any specific values in mind for such attr?

Not really.  I assume a [-19 20] range that the user who launch the
task will set.  Probably something towards the -19 end for latency
sensitive task and something towards the 20 end for AVX512 tasks.  And 0
as default for most tasks.

Tim


Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Qais Yousef
On 09/18/19 18:11, Parth Shah wrote:
> Hello everyone,
> 
> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
> 
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
> 
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;
> 
> 1. Name: What should be the name for such attr for all the possible usecases?
> =
> Latency nice is the proposed name as of now where the lower value indicates
> that the task doesn't care much for the latency and we can spend some more
> time in the kernel to decide a better placement of a task (to save time,
> energy, etc.)
> But there seems to be a bit of confusion on whether we want biasing as well
> (latency-biased) or something similar, in which case "latency-nice" may
> confuse the end-user.
> 
> 2. Value: What should be the range of possible values supported by this new
> attr?
> ==
> The possible values of such task attribute still need community attention.
> Do we need a range of values or just binary/ternary values are sufficient?
> Also signed or unsigned and so the length of the variable (u64, s32, etc)?

IMO the main question is who is the intended user of this new knob/API?

If it's intended for system admins to optimize certain workloads on a system
then I like the latency-nice range.

If we want to support application writers to define the latency requirements of
their tasks then I think latency-nice would be very confusing to use.
Especially when one has to consider they lack a pre-knowledge about the system
they will run on; and what else they are sharing the resources with.

> 
> 
> 
> This mail is to initiate the discussion regarding the possible usecases of
> such per task attribute and to come up with a specific name and value for
> the same.
> 
> Hopefully, interested one should plot out their usecase for which this new
> attr can potentially help in solving or optimizing it.
> 
> 
> Well, to start with, here is my usecase.
> 
> ---
> **Usecases**
> ---
> 
> $> TurboSched
> 
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
> already active core and thus refrains from waking up of a new core if
> possible. This requires tagging of tasks from the userspace hinting which
> tasks are un-important and thus waking-up a new core to minimize the
> latency is un-necessary for such tasks.
> As per the discussion on the posted RFC, it will be appropriate to use the
> task latency property where a task with the highest latency-nice value can
> be packed.
> But for this specific use-cases, having just a binary value to know which
> task is latency-sensitive and which not is sufficient enough, but having a
> range is also a good way to go where above some threshold the task can be
> packed.


$> EAS

The new knob can help EAS path to switch to spreading behavior when
latency-nice is set instead of packing tasks on the most energy efficient CPU.
ie: pick the most energy efficient idle CPU.

--
Qais Yousef


RE: Usecases for the per-task latency-nice attribute

2019-09-19 Thread David Laight
From: Tim Chen
> Sent: 18 September 2019 18:16
...
> Some users are running machine learning batch tasks with AVX512, and have 
> observed
> that these tasks affect the tasks needing a fast response.  They have to
> rely on manual CPU affinity to separate these tasks.  With appropriate
> latency hint on task, the scheduler can be taught to separate them.

Will (or can) the scheduler pre-empt a low priority process that is spinning
in userspace in order to allow a high priority (or low latency) process run
on that cpu?

My suspicion is that the process switch can't happen until (at least) the
next hardware interrupt - and possibly only a timer tick into the scheduler.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Parth Shah



On 9/18/19 10:46 PM, Tim Chen wrote:
> On 9/18/19 5:41 AM, Parth Shah wrote:
>> Hello everyone,
>>
>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type 
>> of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
> 
> Thanks for starting the discussion.
> 
> 
>>
>> ---
>> **Usecases**
>> ---
>>
>> $> TurboSched
>> 
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
>>
>>
> 
> $> Separating AVX512 tasks and latency sensitive tasks on separate cores
> -
> Another usecase we are considering is to segregate those workload that will 
> pull down
> core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
> There are certain tasks that need to provide a fast response time (latency 
> sensitive)
> and they are best scheduled on cpu that has a lighter load and not have other
> tasks running on the sibling cpu that could pull down the cpu core frequency.
> 
> Some users are running machine learning batch tasks with AVX512, and have 
> observed
> that these tasks affect the tasks needing a fast response.  They have to
> rely on manual CPU affinity to separate these tasks.  With appropriate
> latency hint on task, the scheduler can be taught to separate them.
> 

Thanks for listing out your usecase.

This is interesting. If scheduler has the knowledge of AVX512 tasks then
with these interface the scheduler can refrain from picking such core
occupying AVX512 tasks for the task with "latency-nice = -19".

So I guess for this specific use-case, the value for such per-task
attribute should have range (most probably [-19,20]) and the name
"latency-nice" also suits the need.

Do you have any specific values in mind for such attr?


Thanks,
Parth



Re: Usecases for the per-task latency-nice attribute

2019-09-19 Thread Parth Shah



On 9/18/19 7:48 PM, Patrick Bellasi wrote:
> 
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
> 
>> Hello everyone,
> 
> Hi Parth,
> thanks for staring this discussion.
> 
> [ + patrick.bell...@matbug.net ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
> 

Noted. I will send new version with the summary of all the discussion and
add more people to CC. Will change your mail in that, thanks for notifying me.

>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type 
>> of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
> 
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
> 

Oops, my bad. i wanted to tell higher value but somehow missed that
latency-nice should be the opposite to the latency sensitivity.

But in the further scope of the discussion, I mean -19 to be the least
value (latency sensitive) and +20 to be the greatest value(does not care
for latency) if range is [-19,20]

> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
> 
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).

Totally agreed upon.

> 
>> and we can spend some more time in the kernel to decide a better
>> placement of a task (to save time, energy, etc.)
> 
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
> 
> Does that makes sense?

Correct. Task placement is one way to optimize which can benefit to both
the server and embedded world by saving power without compromising much on
performance.

> 
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
> 
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
> 
> Personally, I like both "latency-nice" or "latency-tolerant", where:
> 
>  - latency-nice:
>should have a better understanding based on pre-existing concepts
> 
>  - latency-tolerant:
>decouples a bit its meaning from the niceness thus giving maybe a bit
>more freedom in its complete definition and perhaps avoid any
>possible interpretation confusion like the one I commented above.
> 
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
> 

Cool. In that sense, latency-tolerant seems to be more flexible covering
multiple functionality that a scheduler can provide with such userspace hints.


>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32,
>> etc)?
> 
> AFAIR, the proposal on the table are essentially two:
> 
>  A) use a [-20,19] range
> 
> Which has similarities with the niceness concept and gives a minimal
> continuous range. This can be on hand for things like scaling the
> vruntime normalization [3]
> 
>  B) use some sort of "profile tagging"
> e.g. background, latency-sensible, etc...
> 
> If I correctly got what PaulT was proposing toward the end of the
> discussion at LPC.
> 

If I got it right, then for option B, we can have this attr to be used as a
latency_flag just like per-process flags (e.g. PF_IDLE). If so, then we can
piggyback on the p->flags itself, hence I will prefer the range unless we
have multiple usecases which can not get best out of the range.

> This last option deserves better exploration.
> 
> At first glance I'm more for option A, I see a range as something that:
> 
>   - gives us a bit of flexibility in terms of the possible internal
> usages of the actual value
> 
>   - better suppor

Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Tim Chen
On 9/18/19 5:41 AM, Parth Shah wrote:
> Hello everyone,
> 
> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
> 
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
> 
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;

Thanks for starting the discussion.


> 
> ---
> **Usecases**
> ---
> 
> $> TurboSched
> 
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
> already active core and thus refrains from waking up of a new core if
> possible. This requires tagging of tasks from the userspace hinting which
> tasks are un-important and thus waking-up a new core to minimize the
> latency is un-necessary for such tasks.
> As per the discussion on the posted RFC, it will be appropriate to use the
> task latency property where a task with the highest latency-nice value can
> be packed.
> But for this specific use-cases, having just a binary value to know which
> task is latency-sensitive and which not is sufficient enough, but having a
> range is also a good way to go where above some threshold the task can be
> packed.
> 
> 

$> Separating AVX512 tasks and latency sensitive tasks on separate cores
-
Another usecase we are considering is to segregate those workload that will 
pull down
core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
There are certain tasks that need to provide a fast response time (latency 
sensitive)
and they are best scheduled on cpu that has a lighter load and not have other
tasks running on the sibling cpu that could pull down the cpu core frequency.

Some users are running machine learning batch tasks with AVX512, and have 
observed
that these tasks affect the tasks needing a fast response.  They have to
rely on manual CPU affinity to separate these tasks.  With appropriate
latency hint on task, the scheduler can be taught to separate them.

Tim





Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Vincent Guittot
On Wed, 18 Sep 2019 at 17:46, Patrick Bellasi  wrote:
>
>
> On Wed, Sep 18, 2019 at 16:22:32 +0100, Vincent Guittot wrote...
>
> > On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi  
> > wrote:
>
> [...]
>
> >> $> Wakeup path tunings
> >> ==
> >>
> >> Some additional possible use-cases was already discussed in [3]:
> >>
> >>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
> >>depending on crossing certain pre-configured threshold of latency
> >>niceness.
> >>
> >>  - dynamically bias the vruntime updates we do in place_entity()
> >>depending on the actual latency niceness of a task.
> >>
> >>PeterZ thinks this is dangerous but that we can "(carefully) fumble a
> >>bit there."
> >
> > I agree with Peter that we can easily break the fairness if we bias vruntime
>
> Just to be more precise here and also to better understand, here I'm
> talking about turning the tweaks we already have for:
>
>  - START_DEBIT
>  - GENTLE_FAIR_SLEEPERS

ok. So extending these 2 features could make sense

>
> a bit more parametric and proportional to the latency-nice of a task.
>
> In principle, if a task declares a positive latency niceness, could we
> not read this also as "I accept to be a bit penalised in terms of
> fairness at wakeup time"?

I would say no. It's not because you declare a positive latency
niceness that you should lose some fairness and runtime. If task
accept long latency because it's only care about throughput, it
doesn't want to lost some running time

>
> Whatever tweaks we do there should affect anyway only one sched_latency
> period... although I'm not yet sure if that's possible and how.
>
> >>  - bias the decisions we take in check_preempt_tick() still depending
> >>on a relative comparison of the current and wakeup task latency
> >>niceness values.
> >
> > This one seems possible as it will mainly enable a task to preempt
> > "earlier" the running task but will not break the fairness
> > So the main impact will be the number of context switch between tasks
> > to favor or not the scheduling latency
>
> Preempting before is definitively a nice-to-have feature.
>
> At the same time it's interesting a support where a low latency-nice
> task (e.g. TOP_APP) RUNNABLE on a CPU has better chances to be executed
> up to completion without being preempted by an high latency-nice task
> (e.g. BACKGROUND) waking up on its CPU.
>
> For that to happen, we need a mechanism to "delay" the execution of a
> less important RUNNABLE task up to a certain period.
>
> It's impacting the fairness, true, but latency-nice in this case will
> means that we want to "complete faster", not just "start faster".

you TOP_APP task will have to set both nice and latency-nice  if it
wants to make (almost) sure to have time to finish before BACKGROUND


>
> Is this definition something we can reason about?
>
> Best,
> Patrick
>
> --
> #include 
>
> Patrick Bellasi


Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Patrick Bellasi


On Wed, Sep 18, 2019 at 16:22:32 +0100, Vincent Guittot wrote...

> On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi  wrote:

[...]

>> $> Wakeup path tunings
>> ==
>>
>> Some additional possible use-cases was already discussed in [3]:
>>
>>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>>depending on crossing certain pre-configured threshold of latency
>>niceness.
>>
>>  - dynamically bias the vruntime updates we do in place_entity()
>>depending on the actual latency niceness of a task.
>>
>>PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>>bit there."
>
> I agree with Peter that we can easily break the fairness if we bias vruntime

Just to be more precise here and also to better understand, here I'm
talking about turning the tweaks we already have for:

 - START_DEBIT
 - GENTLE_FAIR_SLEEPERS

a bit more parametric and proportional to the latency-nice of a task.

In principle, if a task declares a positive latency niceness, could we
not read this also as "I accept to be a bit penalised in terms of
fairness at wakeup time"?

Whatever tweaks we do there should affect anyway only one sched_latency
period... although I'm not yet sure if that's possible and how.

>>  - bias the decisions we take in check_preempt_tick() still depending
>>on a relative comparison of the current and wakeup task latency
>>niceness values.
>
> This one seems possible as it will mainly enable a task to preempt
> "earlier" the running task but will not break the fairness
> So the main impact will be the number of context switch between tasks
> to favor or not the scheduling latency

Preempting before is definitively a nice-to-have feature.

At the same time it's interesting a support where a low latency-nice
task (e.g. TOP_APP) RUNNABLE on a CPU has better chances to be executed
up to completion without being preempted by an high latency-nice task
(e.g. BACKGROUND) waking up on its CPU.

For that to happen, we need a mechanism to "delay" the execution of a
less important RUNNABLE task up to a certain period.

It's impacting the fairness, true, but latency-nice in this case will
means that we want to "complete faster", not just "start faster".

Is this definition something we can reason about?

Best,
Patrick

-- 
#include 

Patrick Bellasi


Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Valentin Schneider
On 18/09/2019 15:18, Patrick Bellasi wrote:
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
> 
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
> 
> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
> 
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).
> 

I don't want to start a bikeshedding session here, but I agree with Parth
on the interpretation of the values.

I've always read niceness values as
-20 (least nice to the system / other processes)
+19 (most nice to the system / other processes)

So following this trend I'd see for latency-nice:
-20 (least nice to latency, i.e. sacrifice latency for throughput)
+19 (most nice to latency, i.e. sacrifice throughput for latency)

However...

>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
> 
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
> 

...just getting rid of the "-nice" would leave us free not to have to
interpret the values as "nice to / not nice to" :)

> Personally, I like both "latency-nice" or "latency-tolerant", where:
> 
>  - latency-nice:
>should have a better understanding based on pre-existing concepts
> 
>  - latency-tolerant:
>decouples a bit its meaning from the niceness thus giving maybe a bit
>more freedom in its complete definition and perhaps avoid any
>possible interpretation confusion like the one I commented above.
> 
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
> 

[...]

> 
> $> Wakeup path tunings
> ==
> 
> Some additional possible use-cases was already discussed in [3]:
> 
>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>depending on crossing certain pre-configured threshold of latency
>niceness.
>   
>  - dynamically bias the vruntime updates we do in place_entity()
>depending on the actual latency niceness of a task.
>   
>PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>bit there."
>   
>  - bias the decisions we take in check_preempt_tick() still depending
>on a relative comparison of the current and wakeup task latency
>niceness values.

Aren't we missing the point about tweaking the sched domain scans (which
AFAIR was the original point for latency-nice)?

Something like default value is current behaviour and
- Being less latency-sensitive means increasing the scans (e.g. trending
  towards only going through the slow wakeup-path at the extreme setting)
- Being more latency-sensitive means reducing the scans (e.g. trending
  towards a fraction of the domain scanned in the fast-path at the extreme
  setting).

> 

$> Load balance tuning
==

Already mentioned these in [4]:

- Increase (reduce) nr_balance_failed threshold when trying to active
  balance a latency-sensitive (non-latency-sensitive) task.

- Increase (decrease) sched_migration_cost factor in task_hot() for
  latency-sensitive (non-latency-sensitive) tasks.

>> References:
>> ===
>> [1]. https://lkml.org/lkml/2019/8/30/829
>> [2]. https://lkml.org/lkml/2019/7/25/296
> 
>   [3]. Message-ID: <20190905114709.gm2...@hirez.programming.kicks-ass.net>
>
> https://lore.kernel.org/lkml/20190905114709.gm2...@hirez.programming.kicks-ass.net/
> 

[4]: https://lkml.kernel.org/r/3d3306e4-3a78-5322-df69-7665cf01c...@arm.com

> 
> Best,
> Patrick
> 


Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Vincent Guittot
On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi  wrote:
>
>
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
>
> > Hello everyone,
>
> Hi Parth,
> thanks for staring this discussion.
>
> [ + patrick.bell...@matbug.net ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
>
> > As per the discussion in LPC2019, new per-task property like latency-nice
> > can be useful in certain scenarios. The scheduler can take proper decision
> > by knowing latency requirement of a task from the end-user itself.
> >
> > There has already been an effort from Subhra for introducing Task
> > latency-nice [1] values and have seen several possibilities where this type 
> > of
> > interface can be used.
> >
> > From the best of my understanding of the discussion on the mail thread and
> > in the LPC2019, it seems that there are two dilemmas;
> >
> > 1. Name: What should be the name for such attr for all the possible 
> > usecases?
> > =
> > Latency nice is the proposed name as of now where the lower value indicates
> > that the task doesn't care much for the latency
>
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
>
> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
>
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).
>
> > and we can spend some more time in the kernel to decide a better
> > placement of a task (to save time, energy, etc.)
>
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
>
> Does that makes sense?
>
> > But there seems to be a bit of confusion on whether we want biasing as well
> > (latency-biased) or something similar, in which case "latency-nice" may
> > confuse the end-user.
>
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
>
> Personally, I like both "latency-nice" or "latency-tolerant", where:
>
>  - latency-nice:
>should have a better understanding based on pre-existing concepts
>
>  - latency-tolerant:
>decouples a bit its meaning from the niceness thus giving maybe a bit
>more freedom in its complete definition and perhaps avoid any
>possible interpretation confusion like the one I commented above.
>
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
>
> > 2. Value: What should be the range of possible values supported by this new
> > attr?
> > ==
> > The possible values of such task attribute still need community attention.
> > Do we need a range of values or just binary/ternary values are sufficient?
> > Also signed or unsigned and so the length of the variable (u64, s32,
> > etc)?
>
> AFAIR, the proposal on the table are essentially two:
>
>  A) use a [-20,19] range
>
> Which has similarities with the niceness concept and gives a minimal
> continuous range. This can be on hand for things like scaling the
> vruntime normalization [3]
>
>  B) use some sort of "profile tagging"
> e.g. background, latency-sensible, etc...
>
> If I correctly got what PaulT was proposing toward the end of the
> discussion at LPC.
>
> This last option deserves better exploration.
>
> At first glance I'm more for option A, I see a range as something that:
>
>   - gives us a bit of flexibility in terms of the possible internal
> usages of the actual value
>
>   - better supports some kind of linear/proportional mapping
>
>   - still supports a "profile tagging" by (possible) exposing to
> user-space some kind of system wide knobs defining threshold that
> maps the continuous value into a "profile"
> e.g. latency-nice >= 15: use SCHED_BATCH
>
> In the following discussion I'll call "threshold based profiling"
> this approach.
>
>
> > This mail is to initiate the discussion regarding the possible usecases of
> > such per task attribute and to come up with a specific name and value for
> > the same.
> >
> > Hopefully, interested one should plot out their usecase for which this new
> > attr can potentially help in solving or optimizing it.
>
> +1
>
> > Well, to start with, here is my usecase.
> >
> > ---
> > **Usecases**
> > ---
> >
> > $> TurboSched
> > 
> > TurboSched [2] tries to minimize the number of active cores in a socket by
> > packing an un-important and low-utilization (named jitter) task on an
>  ^^^
>
> We sh

Re: Usecases for the per-task latency-nice attribute

2019-09-18 Thread Patrick Bellasi


On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...

> Hello everyone,

Hi Parth,
thanks for staring this discussion.

[ + patrick.bell...@matbug.net ] my new email address, since with
@arm.com I will not be reachable anymore starting next week.

> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
>
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
>
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;
>
> 1. Name: What should be the name for such attr for all the possible usecases?
> =
> Latency nice is the proposed name as of now where the lower value indicates
> that the task doesn't care much for the latency

If by "lower value" you mean -19 (in the proposed [-20,19] range), then
I think the meaning should be the opposite.

A -19 latency-nice task is a task which is not willing to give up
latency. For those tasks for example we want to reduce the wake-up
latency at maximum.

This will keep its semantic aligned to that of process niceness values
which range from -20 (most favourable to the process) to 19 (least
favourable to the process).

> and we can spend some more time in the kernel to decide a better
> placement of a task (to save time, energy, etc.)

Tasks with an high latency-nice value (e.g. 19) are "less sensible to
latency". These are tasks we wanna optimize mainly for throughput and
thus, for example, we can spend some more time to find out a better task
placement at wakeup time.

Does that makes sense?

> But there seems to be a bit of confusion on whether we want biasing as well
> (latency-biased) or something similar, in which case "latency-nice" may
> confuse the end-user.

AFAIU PeterZ point was "just" that if we call it "-nice" it has to
behave as "nice values" to avoid confusions to users. But, if we come up
with a different naming maybe we will have more freedom.

Personally, I like both "latency-nice" or "latency-tolerant", where:

 - latency-nice:
   should have a better understanding based on pre-existing concepts

 - latency-tolerant:
   decouples a bit its meaning from the niceness thus giving maybe a bit
   more freedom in its complete definition and perhaps avoid any
   possible interpretation confusion like the one I commented above.

Fun fact: there was also the latency-nasty proposal from PaulMK :)

> 2. Value: What should be the range of possible values supported by this new
> attr?
> ==
> The possible values of such task attribute still need community attention.
> Do we need a range of values or just binary/ternary values are sufficient?
> Also signed or unsigned and so the length of the variable (u64, s32,
> etc)?

AFAIR, the proposal on the table are essentially two:

 A) use a [-20,19] range
 
Which has similarities with the niceness concept and gives a minimal
continuous range. This can be on hand for things like scaling the
vruntime normalization [3]

 B) use some sort of "profile tagging"
e.g. background, latency-sensible, etc...

If I correctly got what PaulT was proposing toward the end of the
discussion at LPC.

This last option deserves better exploration.

At first glance I'm more for option A, I see a range as something that:

  - gives us a bit of flexibility in terms of the possible internal
usages of the actual value

  - better supports some kind of linear/proportional mapping

  - still supports a "profile tagging" by (possible) exposing to
user-space some kind of system wide knobs defining threshold that
maps the continuous value into a "profile"
e.g. latency-nice >= 15: use SCHED_BATCH

In the following discussion I'll call "threshold based profiling"
this approach.


> This mail is to initiate the discussion regarding the possible usecases of
> such per task attribute and to come up with a specific name and value for
> the same.
>
> Hopefully, interested one should plot out their usecase for which this new
> attr can potentially help in solving or optimizing it.

+1

> Well, to start with, here is my usecase.
>
> ---
> **Usecases**
> ---
>
> $> TurboSched
> 
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
 ^^^

We should really come up with a different name, since jitters clashes
with other RT related concepts.

Maybe we don't even need a name at all, the other two attributes you
specify are good enough to identify those tasks: they are just "small
background" tasks.

  small  : because on the