Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-26 Thread Vineeth Pillai
Hi Alex,

>
> As discussed during Linux Plumbers, here is a small repo with test
> scripts and applications that I've used to look at core scheduling
> unfairness:
>
>https://github.com/agraf/schedgaps
>
Thanks for sharing :).

> Please let me know if it's unclear how to use it or if you see issues in
> your environment.
>
Will give it a try soon and let you know. Went through the
README quickly and documentation is very clear.

This is really helpful and would be really useful in future
testing as well.

Thanks,
Vineeth


Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-26 Thread Alexander Graf

Hi Vineeth,

On 30.06.20 23:32, Vineeth Remanan Pillai wrote:

Sixth iteration of the Core-Scheduling feature.

Core scheduling is a feature that allows only trusted tasks to run
concurrently on cpus sharing compute resources (eg: hyperthreads on a
core). The goal is to mitigate the core-level side-channel attacks
without requiring to disable SMT (which has a significant impact on
performance in some situations). Core scheduling (as of v6) mitigates
user-space to user-space attacks and user to kernel attack when one of
the siblings enters the kernel via interrupts. It is still possible to
have a task attack the sibling thread when it enters the kernel via
syscalls.

By default, the feature doesn't change any of the current scheduler
behavior. The user decides which tasks can run simultaneously on the
same core (for now by having them in the same tagged cgroup). When a
tag is enabled in a cgroup and a task from that cgroup is running on a
hardware thread, the scheduler ensures that only idle or trusted tasks
run on the other sibling(s). Besides security concerns, this feature
can also be beneficial for RT and performance applications where we
want to control how tasks make use of SMT dynamically.

This iteration is mostly a cleanup of v5 except for a major feature of
pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
introducing documentation and includes minor crash fixes.

One major cleanup was removing the hotplug support and related code.
The hotplug related crashes were not documented and the fixes piled up
over time leading to complex code. We were not able to reproduce the
crashes in the limited testing done. But if they are reroducable, we
don't want to hide them. We should document them and design better
fixes if any.

In terms of performance, the results in this release are similar to
v5. On a x86 system with N hardware threads:
- if only N/2 hardware threads are busy, the performance is similar
   between baseline, corescheduling and nosmt
- if N hardware threads are busy with N different corescheduling
   groups, the impact of corescheduling is similar to nosmt
- if N hardware threads are busy and multiple active threads share the
   same corescheduling cookie, they gain a performance improvement over
   nosmt.
   The specific performance impact depends on the workload, but for a
   really busy database 12-vcpu VM (1 coresched tag) running on a 36
   hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
   their own coresched tag), the performance drops by 54% with
   corescheduling and drops by 90% with nosmt.

v6 is rebased on 5.7.6(a06eb423367e)
https://github.com/digitalocean/linux-coresched/tree/coresched/v6-v5.7.y


As discussed during Linux Plumbers, here is a small repo with test 
scripts and applications that I've used to look at core scheduling 
unfairness:


  https://github.com/agraf/schedgaps

Please let me know if it's unclear how to use it or if you see issues in 
your environment.


Please also make sure to only run this on idle server class hardware. 
Notebooks will most definitely have too many uncontrollable sources of 
timing entropy to give sensible results.



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879





Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-20 Thread Joel Fernandes
On Thu, Aug 13, 2020 at 12:28:17PM +0800, Li, Aubrey wrote:
> On 2020/8/13 7:08, Joel Fernandes wrote:
> > On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
> >> Hi Joel,
> >>
> >> On 2020/8/10 0:44, Joel Fernandes wrote:
> >>> Hi Aubrey,
> >>>
> >>> Apologies for replying late as I was still looking into the details.
> >>>
> >>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
> >>> [...]
>  +/*
>  + * Core scheduling policy:
>  + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>  + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>  + * on the same core concurrently.
>  + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
>  + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>  + * with idle thread on the same core.
>  + */
>  +enum coresched_policy {
>  +   CORE_SCHED_DISABLED,
>  +   CORE_SCHED_COOKIE_MATCH,
>  +CORE_SCHED_COOKIE_TRUST,
>  +   CORE_SCHED_COOKIE_LONELY,
>  +};
> 
>  We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
>  of performance regression. Not sure if this sounds attractive?
> >>>
> >>> Instead of this, I think it can be something simpler IMHO:
> >>>
> >>> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply 
> >>> the
> >>>core-scheduling patchset, such tasks will share a core and sniff on 
> >>> each
> >>>other. So let us not pretend that such tasks are not trusted).
> >>>
> >>> 2. All kernel threads and idle task would have a cookie 0 (so that will 
> >>> cover
> >>>ksoftirqd reported in your original issue).
> >>>
> >>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
> >>> Default
> >>>enable it. Setting this option would tag all tasks that are forked 
> >>> from a
> >>>cookie-0 task with their own cookie. Later on, such tasks can be added 
> >>> to
> >>>a group. This cover's PeterZ's ask about having 'default untrusted').
> >>>(Users like ChromeOS that don't want to userspace system processes to 
> >>> be
> >>>tagged can disable this option so such tasks will be cookie-0).
> >>>
> >>> 4. Allow prctl/cgroup interfaces to create groups of tasks and override 
> >>> the
> >>>above behaviors.
> >>
> >> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
> >> uperf's
> >> cookie to be cookie-0 via prctl?
> > 
> > Yes, but let me try to understand better. There are 2 problems here I think:
> > 
> > 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
> > core with it: This should not be any worse than SMT OFF, because even SMT 
> > OFF
> > would also reduce ksoftirqd's CPU time just core sched is doing. Sure
> > core-scheduling adds some overhead with IPIs but such a huge drop of perf is
> > strange. Peter any thoughts on that?
> > 
> > 2. Interface: To solve the performance problem, you are saying you want 
> > uperf
> > to share a core with ksoftirqd so that it is not forced into idle.  Why not
> > just keep uperf out of the cgroup?
> 
> I guess this is unacceptable for who runs their apps in container and vm.

I think let us forget about #2, that's just a workaround.  #1 is probably
what we should look into for your problem. Was talking to Vineeth earlier, is
it possible that the fairness issues that Aaron and Peter are looking into is
causing the performance problem here?

So like, if ksoftirqd being higher prio is making the vruntime delta between
2 CFS tasks sharing a core to be quite high, then it causes the core-wide
min_vruntime to be high. Then if uperf gets enqueued, it will get starved by
ksoftirqd and not able to run till ksoftirqd's vruntime catches up.

Other than that, the only other thing (AFAIK) is the IPI/scheduler overhead
is giving uperf worse performance than SMT-off and we ought to reduce the
overhead some how. Does a kernel perf profile show you any smoking guns?

thanks,

 - Joel

> 
> Thanks,
> -Aubrey
> 
> > Then it will have cookie 0 and be able to
> > share core with kernel threads. About user-user isolation that you need, if
> > you tag any "untrusted" threads by adding it to CGroup, then there will
> > automatically isolated from uperf while allowing uperf to share CPU with
> > kernel threads.
> > 
> > Please let me know your thoughts and thanks,
> > 
> >  - Joel
> > 
> >>
> >> Thanks,
> >> -Aubrey
> >>>
> >>> 5. Document everything clearly so the semantics are clear both to the
> >>>developers of core scheduling and to system administrators.
> >>>
> >>> Note that, with the concept of "system trusted cookie", we can also do
> >>> optimizations like:
> >>> 1. Disable STIBP when switching into trusted tasks.
> >>> 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching 
> >>> into
> >>>trusted tasks.
> >>>
> >>> At least #1 seems to be biting enabling HT o

Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-14 Thread 蒋彪
Hi,

> On Aug 14, 2020, at 1:18 PM, Li, Aubrey  wrote:
> 
> On 2020/8/14 12:04, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 14, 2020, at 9:36 AM, Li, Aubrey  wrote:
>>> 
>>> On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
 
 
> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  
> wrote:
> 
> On 2020/8/13 7:08, Joel Fernandes wrote:
>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>>> Hi Joel,
>>> 
>>> On 2020/8/10 0:44, Joel Fernandes wrote:
 Hi Aubrey,
 
 Apologies for replying late as I was still looking into the details.
 
 On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
 [...]
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this 
> kind
> of performance regression. Not sure if this sounds attractive?
 
 Instead of this, I think it can be something simpler IMHO:
 
 1. Consider all cookie-0 task as trusted. (Even right now, if you 
 apply the
 core-scheduling patchset, such tasks will share a core and sniff on 
 each
 other. So let us not pretend that such tasks are not trusted).
 
 2. All kernel threads and idle task would have a cookie 0 (so that 
 will cover
 ksoftirqd reported in your original issue).
 
 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
 Default
 enable it. Setting this option would tag all tasks that are forked 
 from a
 cookie-0 task with their own cookie. Later on, such tasks can be added 
 to
 a group. This cover's PeterZ's ask about having 'default untrusted').
 (Users like ChromeOS that don't want to userspace system processes to 
 be
 tagged can disable this option so such tasks will be cookie-0).
 
 4. Allow prctl/cgroup interfaces to create groups of tasks and 
 override the
 above behaviors.
>>> 
>>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I 
>>> set uperf's
>>> cookie to be cookie-0 via prctl?
>> 
>> Yes, but let me try to understand better. There are 2 problems here I 
>> think:
>> 
>> 1. ksoftirqd getting idled when HT is turned on, because uperf is 
>> sharing a
>> core with it: This should not be any worse than SMT OFF, because even 
>> SMT OFF
>> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
>> core-scheduling adds some overhead with IPIs but such a huge drop of 
>> perf is
>> strange. Peter any thoughts on that?
>> 
>> 2. Interface: To solve the performance problem, you are saying you want 
>> uperf
>> to share a core with ksoftirqd so that it is not forced into idle.  Why 
>> not
>> just keep uperf out of the cgroup?
> 
> I guess this is unacceptable for who runs their apps in container and vm.
 IMHO,  just as Joel proposed, 
 1. Consider all cookie-0 task as trusted.
 2. All kernel threads and idle task would have a cookie 0 
 In that way, all tasks with cookies(including uperf in a cgroup) could run
 concurrently with kernel threads.
 That could be a good solution for the issue. :)
>>> 
>>> From uperf point of review, it can trust cookie-0(I assume we still need
>>> some modifications to change cookie-match to cookie-compatible to allow
>>> ZERO and NONZERO run together).
>>> 
>>> But from kernel thread point of review, it can NOT trust uperf, unless
>>> we set uperf's cookie to 0.
>> That’s right. :)
>> Could we set the cookie of cgroup where uperf lies to 0?
>> 
> IMHO the disadvantage is that if there are two or more cgroups set cookie-0,
> then the user applications in these cgroups could run concurrently on a core,
> though all of them are set as trusted, we made a hole of user->user isolation.
For that case, how about,
- use a special cookie(cookie-trust) instead of cookie-0 for kernel thread
- implement cookie_partial_match() to match part of the cookie
- Cookie-normal(normal tasks use) could trust cookie-trust,
- tasks tend to be trusted by cookie-trust could use cookies inc

Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread Li, Aubrey
On 2020/8/14 12:04, benbjiang(蒋彪) wrote:
> 
> 
>> On Aug 14, 2020, at 9:36 AM, Li, Aubrey  wrote:
>>
>> On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
>>>
>>>
 On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:

 On 2020/8/13 7:08, Joel Fernandes wrote:
> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>> Hi Joel,
>>
>> On 2020/8/10 0:44, Joel Fernandes wrote:
>>> Hi Aubrey,
>>>
>>> Apologies for replying late as I was still looking into the details.
>>>
>>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
>>> [...]
 +/*
 + * Core scheduling policy:
 + * - CORE_SCHED_DISABLED: core scheduling is disabled.
 + * - CORE_COOKIE_MATCH: tasks with same cookie can run
 + * on the same core concurrently.
 + * - CORE_COOKIE_TRUST: trusted task can run with kernel
thread on the same core concurrently. 
 + * - CORE_COOKIE_LONELY: tasks with cookie can run only
 + * with idle thread on the same core.
 + */
 +enum coresched_policy {
 +   CORE_SCHED_DISABLED,
 +   CORE_SCHED_COOKIE_MATCH,
 +  CORE_SCHED_COOKIE_TRUST,
 +   CORE_SCHED_COOKIE_LONELY,
 +};

 We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this 
 kind
 of performance regression. Not sure if this sounds attractive?
>>>
>>> Instead of this, I think it can be something simpler IMHO:
>>>
>>> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply 
>>> the
>>>  core-scheduling patchset, such tasks will share a core and sniff on 
>>> each
>>>  other. So let us not pretend that such tasks are not trusted).
>>>
>>> 2. All kernel threads and idle task would have a cookie 0 (so that will 
>>> cover
>>>  ksoftirqd reported in your original issue).
>>>
>>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
>>> Default
>>>  enable it. Setting this option would tag all tasks that are forked 
>>> from a
>>>  cookie-0 task with their own cookie. Later on, such tasks can be added 
>>> to
>>>  a group. This cover's PeterZ's ask about having 'default untrusted').
>>>  (Users like ChromeOS that don't want to userspace system processes to 
>>> be
>>>  tagged can disable this option so such tasks will be cookie-0).
>>>
>>> 4. Allow prctl/cgroup interfaces to create groups of tasks and override 
>>> the
>>>  above behaviors.
>>
>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
>> uperf's
>> cookie to be cookie-0 via prctl?
>
> Yes, but let me try to understand better. There are 2 problems here I 
> think:
>
> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing 
> a
> core with it: This should not be any worse than SMT OFF, because even SMT 
> OFF
> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
> core-scheduling adds some overhead with IPIs but such a huge drop of perf 
> is
> strange. Peter any thoughts on that?
>
> 2. Interface: To solve the performance problem, you are saying you want 
> uperf
> to share a core with ksoftirqd so that it is not forced into idle.  Why 
> not
> just keep uperf out of the cgroup?

 I guess this is unacceptable for who runs their apps in container and vm.
>>> IMHO,  just as Joel proposed, 
>>> 1. Consider all cookie-0 task as trusted.
>>> 2. All kernel threads and idle task would have a cookie 0 
>>> In that way, all tasks with cookies(including uperf in a cgroup) could run
>>> concurrently with kernel threads.
>>> That could be a good solution for the issue. :)
>>
>> From uperf point of review, it can trust cookie-0(I assume we still need
>> some modifications to change cookie-match to cookie-compatible to allow
>> ZERO and NONZERO run together).
>>
>> But from kernel thread point of review, it can NOT trust uperf, unless
>> we set uperf's cookie to 0.
> That’s right. :)
> Could we set the cookie of cgroup where uperf lies to 0?
> 
IMHO the disadvantage is that if there are two or more cgroups set cookie-0,
then the user applications in these cgroups could run concurrently on a core,
though all of them are set as trusted, we made a hole of user->user isolation.

Thanks,
-Aubrey


Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread 蒋彪


> On Aug 14, 2020, at 9:36 AM, Li, Aubrey  wrote:
> 
> On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:
>>> 
>>> On 2020/8/13 7:08, Joel Fernandes wrote:
 On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
> Hi Joel,
> 
> On 2020/8/10 0:44, Joel Fernandes wrote:
>> Hi Aubrey,
>> 
>> Apologies for replying late as I was still looking into the details.
>> 
>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
>> [...]
>>> +/*
>>> + * Core scheduling policy:
>>> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>>> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>>> + * on the same core concurrently.
>>> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>>> thread on the same core concurrently. 
>>> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>>> + * with idle thread on the same core.
>>> + */
>>> +enum coresched_policy {
>>> +   CORE_SCHED_DISABLED,
>>> +   CORE_SCHED_COOKIE_MATCH,
>>> +   CORE_SCHED_COOKIE_TRUST,
>>> +   CORE_SCHED_COOKIE_LONELY,
>>> +};
>>> 
>>> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
>>> of performance regression. Not sure if this sounds attractive?
>> 
>> Instead of this, I think it can be something simpler IMHO:
>> 
>> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply 
>> the
>>  core-scheduling patchset, such tasks will share a core and sniff on each
>>  other. So let us not pretend that such tasks are not trusted).
>> 
>> 2. All kernel threads and idle task would have a cookie 0 (so that will 
>> cover
>>  ksoftirqd reported in your original issue).
>> 
>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
>> Default
>>  enable it. Setting this option would tag all tasks that are forked from 
>> a
>>  cookie-0 task with their own cookie. Later on, such tasks can be added 
>> to
>>  a group. This cover's PeterZ's ask about having 'default untrusted').
>>  (Users like ChromeOS that don't want to userspace system processes to be
>>  tagged can disable this option so such tasks will be cookie-0).
>> 
>> 4. Allow prctl/cgroup interfaces to create groups of tasks and override 
>> the
>>  above behaviors.
> 
> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
> uperf's
> cookie to be cookie-0 via prctl?
 
 Yes, but let me try to understand better. There are 2 problems here I 
 think:
 
 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
 core with it: This should not be any worse than SMT OFF, because even SMT 
 OFF
 would also reduce ksoftirqd's CPU time just core sched is doing. Sure
 core-scheduling adds some overhead with IPIs but such a huge drop of perf 
 is
 strange. Peter any thoughts on that?
 
 2. Interface: To solve the performance problem, you are saying you want 
 uperf
 to share a core with ksoftirqd so that it is not forced into idle.  Why not
 just keep uperf out of the cgroup?
>>> 
>>> I guess this is unacceptable for who runs their apps in container and vm.
>> IMHO,  just as Joel proposed, 
>> 1. Consider all cookie-0 task as trusted.
>> 2. All kernel threads and idle task would have a cookie 0 
>> In that way, all tasks with cookies(including uperf in a cgroup) could run
>> concurrently with kernel threads.
>> That could be a good solution for the issue. :)
> 
> From uperf point of review, it can trust cookie-0(I assume we still need
> some modifications to change cookie-match to cookie-compatible to allow
> ZERO and NONZERO run together).
> 
> But from kernel thread point of review, it can NOT trust uperf, unless
> we set uperf's cookie to 0.
That’s right. :)
Could we set the cookie of cgroup where uperf lies to 0?

Thx.
Regards,
Jiang

> 
> Thanks,
> -Aubrey
> 



Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread Li, Aubrey
On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
> 
> 
>> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:
>>
>> On 2020/8/13 7:08, Joel Fernandes wrote:
>>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
 Hi Joel,

 On 2020/8/10 0:44, Joel Fernandes wrote:
> Hi Aubrey,
>
> Apologies for replying late as I was still looking into the details.
>
> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
> [...]
>> +/*
>> + * Core scheduling policy:
>> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>> + * on the same core concurrently.
>> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>>  thread on the same core concurrently. 
>> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>> + * with idle thread on the same core.
>> + */
>> +enum coresched_policy {
>> +   CORE_SCHED_DISABLED,
>> +   CORE_SCHED_COOKIE_MATCH,
>> +CORE_SCHED_COOKIE_TRUST,
>> +   CORE_SCHED_COOKIE_LONELY,
>> +};
>>
>> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
>> of performance regression. Not sure if this sounds attractive?
>
> Instead of this, I think it can be something simpler IMHO:
>
> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply 
> the
>   core-scheduling patchset, such tasks will share a core and sniff on each
>   other. So let us not pretend that such tasks are not trusted).
>
> 2. All kernel threads and idle task would have a cookie 0 (so that will 
> cover
>   ksoftirqd reported in your original issue).
>
> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
> Default
>   enable it. Setting this option would tag all tasks that are forked from 
> a
>   cookie-0 task with their own cookie. Later on, such tasks can be added 
> to
>   a group. This cover's PeterZ's ask about having 'default untrusted').
>   (Users like ChromeOS that don't want to userspace system processes to be
>   tagged can disable this option so such tasks will be cookie-0).
>
> 4. Allow prctl/cgroup interfaces to create groups of tasks and override 
> the
>   above behaviors.

 How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
 uperf's
 cookie to be cookie-0 via prctl?
>>>
>>> Yes, but let me try to understand better. There are 2 problems here I think:
>>>
>>> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
>>> core with it: This should not be any worse than SMT OFF, because even SMT 
>>> OFF
>>> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
>>> core-scheduling adds some overhead with IPIs but such a huge drop of perf is
>>> strange. Peter any thoughts on that?
>>>
>>> 2. Interface: To solve the performance problem, you are saying you want 
>>> uperf
>>> to share a core with ksoftirqd so that it is not forced into idle.  Why not
>>> just keep uperf out of the cgroup?
>>
>> I guess this is unacceptable for who runs their apps in container and vm.
> IMHO,  just as Joel proposed, 
> 1. Consider all cookie-0 task as trusted.
> 2. All kernel threads and idle task would have a cookie 0 
> In that way, all tasks with cookies(including uperf in a cgroup) could run
> concurrently with kernel threads.
> That could be a good solution for the issue. :)

>From uperf point of review, it can trust cookie-0(I assume we still need
some modifications to change cookie-match to cookie-compatible to allow
ZERO and NONZERO run together).

But from kernel thread point of review, it can NOT trust uperf, unless
we set uperf's cookie to 0.

Thanks,
-Aubrey


Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread 蒋彪


> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:
> 
> On 2020/8/13 7:08, Joel Fernandes wrote:
>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>>> Hi Joel,
>>> 
>>> On 2020/8/10 0:44, Joel Fernandes wrote:
 Hi Aubrey,
 
 Apologies for replying late as I was still looking into the details.
 
 On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
 [...]
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> of performance regression. Not sure if this sounds attractive?
 
 Instead of this, I think it can be something simpler IMHO:
 
 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
   core-scheduling patchset, such tasks will share a core and sniff on each
   other. So let us not pretend that such tasks are not trusted).
 
 2. All kernel threads and idle task would have a cookie 0 (so that will 
 cover
   ksoftirqd reported in your original issue).
 
 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
   enable it. Setting this option would tag all tasks that are forked from a
   cookie-0 task with their own cookie. Later on, such tasks can be added to
   a group. This cover's PeterZ's ask about having 'default untrusted').
   (Users like ChromeOS that don't want to userspace system processes to be
   tagged can disable this option so such tasks will be cookie-0).
 
 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
   above behaviors.
>>> 
>>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
>>> uperf's
>>> cookie to be cookie-0 via prctl?
>> 
>> Yes, but let me try to understand better. There are 2 problems here I think:
>> 
>> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
>> core with it: This should not be any worse than SMT OFF, because even SMT OFF
>> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
>> core-scheduling adds some overhead with IPIs but such a huge drop of perf is
>> strange. Peter any thoughts on that?
>> 
>> 2. Interface: To solve the performance problem, you are saying you want uperf
>> to share a core with ksoftirqd so that it is not forced into idle.  Why not
>> just keep uperf out of the cgroup?
> 
> I guess this is unacceptable for who runs their apps in container and vm.
IMHO,  just as Joel proposed, 
1. Consider all cookie-0 task as trusted.
2. All kernel threads and idle task would have a cookie 0 
In that way, all tasks with cookies(including uperf in a cgroup) could run
concurrently with kernel threads.
That could be a good solution for the issue. :)

If with CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED enabled,
maybe we should set ksoftirqd’s cookie to be cookie-0 to solve the issue. 

Thx.
Regards,
Jiang
> 
> Thanks,
> -Aubrey
> 
>> Then it will have cookie 0 and be able to
>> share core with kernel threads. About user-user isolation that you need, if
>> you tag any "untrusted" threads by adding it to CGroup, then there will
>> automatically isolated from uperf while allowing uperf to share CPU with
>> kernel threads.
>> 
>> Please let me know your thoughts and thanks,
>> 
>> - Joel
>> 
>>> 
>>> Thanks,
>>> -Aubrey
 
 5. Document everything clearly so the semantics are clear both to the
   developers of core scheduling and to system administrators.
 
 Note that, with the concept of "system trusted cookie", we can also do
 optimizations like:
 1. Disable STIBP when switching into trusted tasks.
 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching 
 into
   trusted tasks.
 
 At least #1 seems to be biting enabling HT on ChromeOS right now, and one
 other engineer requested I do something like #2 already.
 
 Once we get full-syscall isolation working, threads belonging to a process
 can also share a core so those can just share a core with the task-group
 leader.
 
>> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> 
> This is a good question, from the data we measured by uperf,
> SMT+core-scheduling is 28.2% worse than no-SMT, :(
 
 This is worrying for sure.

Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-12 Thread Li, Aubrey
On 2020/8/13 7:08, Joel Fernandes wrote:
> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>> Hi Joel,
>>
>> On 2020/8/10 0:44, Joel Fernandes wrote:
>>> Hi Aubrey,
>>>
>>> Apologies for replying late as I was still looking into the details.
>>>
>>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
>>> [...]
 +/*
 + * Core scheduling policy:
 + * - CORE_SCHED_DISABLED: core scheduling is disabled.
 + * - CORE_COOKIE_MATCH: tasks with same cookie can run
 + * on the same core concurrently.
 + * - CORE_COOKIE_TRUST: trusted task can run with kernel
thread on the same core concurrently. 
 + * - CORE_COOKIE_LONELY: tasks with cookie can run only
 + * with idle thread on the same core.
 + */
 +enum coresched_policy {
 +   CORE_SCHED_DISABLED,
 +   CORE_SCHED_COOKIE_MATCH,
 +  CORE_SCHED_COOKIE_TRUST,
 +   CORE_SCHED_COOKIE_LONELY,
 +};

 We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
 of performance regression. Not sure if this sounds attractive?
>>>
>>> Instead of this, I think it can be something simpler IMHO:
>>>
>>> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
>>>core-scheduling patchset, such tasks will share a core and sniff on each
>>>other. So let us not pretend that such tasks are not trusted).
>>>
>>> 2. All kernel threads and idle task would have a cookie 0 (so that will 
>>> cover
>>>ksoftirqd reported in your original issue).
>>>
>>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
>>>enable it. Setting this option would tag all tasks that are forked from a
>>>cookie-0 task with their own cookie. Later on, such tasks can be added to
>>>a group. This cover's PeterZ's ask about having 'default untrusted').
>>>(Users like ChromeOS that don't want to userspace system processes to be
>>>tagged can disable this option so such tasks will be cookie-0).
>>>
>>> 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
>>>above behaviors.
>>
>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
>> uperf's
>> cookie to be cookie-0 via prctl?
> 
> Yes, but let me try to understand better. There are 2 problems here I think:
> 
> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
> core with it: This should not be any worse than SMT OFF, because even SMT OFF
> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
> core-scheduling adds some overhead with IPIs but such a huge drop of perf is
> strange. Peter any thoughts on that?
> 
> 2. Interface: To solve the performance problem, you are saying you want uperf
> to share a core with ksoftirqd so that it is not forced into idle.  Why not
> just keep uperf out of the cgroup?

I guess this is unacceptable for who runs their apps in container and vm.

Thanks,
-Aubrey

> Then it will have cookie 0 and be able to
> share core with kernel threads. About user-user isolation that you need, if
> you tag any "untrusted" threads by adding it to CGroup, then there will
> automatically isolated from uperf while allowing uperf to share CPU with
> kernel threads.
> 
> Please let me know your thoughts and thanks,
> 
>  - Joel
> 
>>
>> Thanks,
>> -Aubrey
>>>
>>> 5. Document everything clearly so the semantics are clear both to the
>>>developers of core scheduling and to system administrators.
>>>
>>> Note that, with the concept of "system trusted cookie", we can also do
>>> optimizations like:
>>> 1. Disable STIBP when switching into trusted tasks.
>>> 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching 
>>> into
>>>trusted tasks.
>>>
>>> At least #1 seems to be biting enabling HT on ChromeOS right now, and one
>>> other engineer requested I do something like #2 already.
>>>
>>> Once we get full-syscall isolation working, threads belonging to a process
>>> can also share a core so those can just share a core with the task-group
>>> leader.
>>>
> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?

 This is a good question, from the data we measured by uperf,
 SMT+core-scheduling is 28.2% worse than no-SMT, :(
>>>
>>> This is worrying for sure. :-(. We ought to debug/profile it more to see 
>>> what
>>> is causing the overhead. Me/Vineeth added it as a topic for LPC as well.
>>>
>>> Any other thoughts from others on this?
>>>
>>> thanks,
>>>
>>>  - Joel
>>>
>>>
> thanks,
>
>  - Joel
> PS: I am planning to write a patch behind a CONFIG option that tags
> all processes (default untrusted) so everything gets a cookie which
> some folks said was how they wanted (have a whitelist instead of
> blacklist).
>

>>



Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-12 Thread Joel Fernandes
On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
> Hi Joel,
> 
> On 2020/8/10 0:44, Joel Fernandes wrote:
> > Hi Aubrey,
> > 
> > Apologies for replying late as I was still looking into the details.
> > 
> > On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
> > [...]
> >> +/*
> >> + * Core scheduling policy:
> >> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> >> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> >> + * on the same core concurrently.
> >> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
> >>thread on the same core concurrently. 
> >> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> >> + * with idle thread on the same core.
> >> + */
> >> +enum coresched_policy {
> >> +   CORE_SCHED_DISABLED,
> >> +   CORE_SCHED_COOKIE_MATCH,
> >> +  CORE_SCHED_COOKIE_TRUST,
> >> +   CORE_SCHED_COOKIE_LONELY,
> >> +};
> >>
> >> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> >> of performance regression. Not sure if this sounds attractive?
> > 
> > Instead of this, I think it can be something simpler IMHO:
> > 
> > 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
> >core-scheduling patchset, such tasks will share a core and sniff on each
> >other. So let us not pretend that such tasks are not trusted).
> > 
> > 2. All kernel threads and idle task would have a cookie 0 (so that will 
> > cover
> >ksoftirqd reported in your original issue).
> > 
> > 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
> >enable it. Setting this option would tag all tasks that are forked from a
> >cookie-0 task with their own cookie. Later on, such tasks can be added to
> >a group. This cover's PeterZ's ask about having 'default untrusted').
> >(Users like ChromeOS that don't want to userspace system processes to be
> >tagged can disable this option so such tasks will be cookie-0).
> > 
> > 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
> >above behaviors.
> 
> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
> uperf's
> cookie to be cookie-0 via prctl?

Yes, but let me try to understand better. There are 2 problems here I think:

1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
core with it: This should not be any worse than SMT OFF, because even SMT OFF
would also reduce ksoftirqd's CPU time just core sched is doing. Sure
core-scheduling adds some overhead with IPIs but such a huge drop of perf is
strange. Peter any thoughts on that?

2. Interface: To solve the performance problem, you are saying you want uperf
to share a core with ksoftirqd so that it is not forced into idle.  Why not
just keep uperf out of the cgroup? Then it will have cookie 0 and be able to
share core with kernel threads. About user-user isolation that you need, if
you tag any "untrusted" threads by adding it to CGroup, then there will
automatically isolated from uperf while allowing uperf to share CPU with
kernel threads.

Please let me know your thoughts and thanks,

 - Joel

> 
> Thanks,
> -Aubrey
> > 
> > 5. Document everything clearly so the semantics are clear both to the
> >developers of core scheduling and to system administrators.
> > 
> > Note that, with the concept of "system trusted cookie", we can also do
> > optimizations like:
> > 1. Disable STIBP when switching into trusted tasks.
> > 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching 
> > into
> >trusted tasks.
> > 
> > At least #1 seems to be biting enabling HT on ChromeOS right now, and one
> > other engineer requested I do something like #2 already.
> > 
> > Once we get full-syscall isolation working, threads belonging to a process
> > can also share a core so those can just share a core with the task-group
> > leader.
> > 
> >>> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> >>
> >> This is a good question, from the data we measured by uperf,
> >> SMT+core-scheduling is 28.2% worse than no-SMT, :(
> > 
> > This is worrying for sure. :-(. We ought to debug/profile it more to see 
> > what
> > is causing the overhead. Me/Vineeth added it as a topic for LPC as well.
> > 
> > Any other thoughts from others on this?
> > 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> >>> thanks,
> >>>
> >>>  - Joel
> >>> PS: I am planning to write a patch behind a CONFIG option that tags
> >>> all processes (default untrusted) so everything gets a cookie which
> >>> some folks said was how they wanted (have a whitelist instead of
> >>> blacklist).
> >>>
> >>
> 


Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-11 Thread Li, Aubrey
Hi Joel,

On 2020/8/10 0:44, Joel Fernandes wrote:
> Hi Aubrey,
> 
> Apologies for replying late as I was still looking into the details.
> 
> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
> [...]
>> +/*
>> + * Core scheduling policy:
>> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>> + * on the same core concurrently.
>> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>>  thread on the same core concurrently. 
>> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>> + * with idle thread on the same core.
>> + */
>> +enum coresched_policy {
>> +   CORE_SCHED_DISABLED,
>> +   CORE_SCHED_COOKIE_MATCH,
>> +CORE_SCHED_COOKIE_TRUST,
>> +   CORE_SCHED_COOKIE_LONELY,
>> +};
>>
>> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
>> of performance regression. Not sure if this sounds attractive?
> 
> Instead of this, I think it can be something simpler IMHO:
> 
> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
>core-scheduling patchset, such tasks will share a core and sniff on each
>other. So let us not pretend that such tasks are not trusted).
> 
> 2. All kernel threads and idle task would have a cookie 0 (so that will cover
>ksoftirqd reported in your original issue).
> 
> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
>enable it. Setting this option would tag all tasks that are forked from a
>cookie-0 task with their own cookie. Later on, such tasks can be added to
>a group. This cover's PeterZ's ask about having 'default untrusted').
>(Users like ChromeOS that don't want to userspace system processes to be
>tagged can disable this option so such tasks will be cookie-0).
> 
> 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
>above behaviors.

How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set uperf's
cookie to be cookie-0 via prctl?

Thanks,
-Aubrey
> 
> 5. Document everything clearly so the semantics are clear both to the
>developers of core scheduling and to system administrators.
> 
> Note that, with the concept of "system trusted cookie", we can also do
> optimizations like:
> 1. Disable STIBP when switching into trusted tasks.
> 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching into
>trusted tasks.
> 
> At least #1 seems to be biting enabling HT on ChromeOS right now, and one
> other engineer requested I do something like #2 already.
> 
> Once we get full-syscall isolation working, threads belonging to a process
> can also share a core so those can just share a core with the task-group
> leader.
> 
>>> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
>>
>> This is a good question, from the data we measured by uperf,
>> SMT+core-scheduling is 28.2% worse than no-SMT, :(
> 
> This is worrying for sure. :-(. We ought to debug/profile it more to see what
> is causing the overhead. Me/Vineeth added it as a topic for LPC as well.
> 
> Any other thoughts from others on this?
> 
> thanks,
> 
>  - Joel
> 
> 
>>> thanks,
>>>
>>>  - Joel
>>> PS: I am planning to write a patch behind a CONFIG option that tags
>>> all processes (default untrusted) so everything gets a cookie which
>>> some folks said was how they wanted (have a whitelist instead of
>>> blacklist).
>>>
>>



Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-09 Thread Joel Fernandes
Hi Aubrey,

Apologies for replying late as I was still looking into the details.

On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
[...]
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> of performance regression. Not sure if this sounds attractive?

Instead of this, I think it can be something simpler IMHO:

1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
   core-scheduling patchset, such tasks will share a core and sniff on each
   other. So let us not pretend that such tasks are not trusted).

2. All kernel threads and idle task would have a cookie 0 (so that will cover
   ksoftirqd reported in your original issue).

3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
   enable it. Setting this option would tag all tasks that are forked from a
   cookie-0 task with their own cookie. Later on, such tasks can be added to
   a group. This cover's PeterZ's ask about having 'default untrusted').
   (Users like ChromeOS that don't want to userspace system processes to be
   tagged can disable this option so such tasks will be cookie-0).

4. Allow prctl/cgroup interfaces to create groups of tasks and override the
   above behaviors.

5. Document everything clearly so the semantics are clear both to the
   developers of core scheduling and to system administrators.

Note that, with the concept of "system trusted cookie", we can also do
optimizations like:
1. Disable STIBP when switching into trusted tasks.
2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching into
   trusted tasks.

At least #1 seems to be biting enabling HT on ChromeOS right now, and one
other engineer requested I do something like #2 already.

Once we get full-syscall isolation working, threads belonging to a process
can also share a core so those can just share a core with the task-group
leader.

> > Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> 
> This is a good question, from the data we measured by uperf,
> SMT+core-scheduling is 28.2% worse than no-SMT, :(

This is worrying for sure. :-(. We ought to debug/profile it more to see what
is causing the overhead. Me/Vineeth added it as a topic for LPC as well.

Any other thoughts from others on this?

thanks,

 - Joel


> > thanks,
> > 
> >  - Joel
> > PS: I am planning to write a patch behind a CONFIG option that tags
> > all processes (default untrusted) so everything gets a cookie which
> > some folks said was how they wanted (have a whitelist instead of
> > blacklist).
> > 
> 


Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-04 Thread 蒋彪
Hi,

> On Aug 5, 2020, at 11:57 AM, Li, Aubrey  wrote:
> 
> On 2020/8/4 0:53, Joel Fernandes wrote:
>> Hi Aubrey,
>> 
>> On Mon, Aug 3, 2020 at 4:23 AM Li, Aubrey  wrote:
>>> 
>>> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
 Sixth iteration of the Core-Scheduling feature.
 
 Core scheduling is a feature that allows only trusted tasks to run
 concurrently on cpus sharing compute resources (eg: hyperthreads on a
 core). The goal is to mitigate the core-level side-channel attacks
 without requiring to disable SMT (which has a significant impact on
 performance in some situations). Core scheduling (as of v6) mitigates
 user-space to user-space attacks and user to kernel attack when one of
 the siblings enters the kernel via interrupts. It is still possible to
 have a task attack the sibling thread when it enters the kernel via
 syscalls.
 
 By default, the feature doesn't change any of the current scheduler
 behavior. The user decides which tasks can run simultaneously on the
 same core (for now by having them in the same tagged cgroup). When a
 tag is enabled in a cgroup and a task from that cgroup is running on a
 hardware thread, the scheduler ensures that only idle or trusted tasks
 run on the other sibling(s). Besides security concerns, this feature
 can also be beneficial for RT and performance applications where we
 want to control how tasks make use of SMT dynamically.
 
 This iteration is mostly a cleanup of v5 except for a major feature of
 pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
 introducing documentation and includes minor crash fixes.
 
 One major cleanup was removing the hotplug support and related code.
 The hotplug related crashes were not documented and the fixes piled up
 over time leading to complex code. We were not able to reproduce the
 crashes in the limited testing done. But if they are reroducable, we
 don't want to hide them. We should document them and design better
 fixes if any.
 
 In terms of performance, the results in this release are similar to
 v5. On a x86 system with N hardware threads:
 - if only N/2 hardware threads are busy, the performance is similar
  between baseline, corescheduling and nosmt
 - if N hardware threads are busy with N different corescheduling
  groups, the impact of corescheduling is similar to nosmt
 - if N hardware threads are busy and multiple active threads share the
  same corescheduling cookie, they gain a performance improvement over
  nosmt.
  The specific performance impact depends on the workload, but for a
  really busy database 12-vcpu VM (1 coresched tag) running on a 36
  hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
  their own coresched tag), the performance drops by 54% with
  corescheduling and drops by 90% with nosmt.
 
>>> 
>>> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
>>> 
>>> The problem is, uperf triggered a lot of softirq and offloaded softirq
>>> service to *ksoftirqd* thread.
>>> 
>>> - default, ksoftirqd thread can run with uperf on the same core, we saw
>>>  100% CPU utilization.
>>> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>>>  they can't run concurrently on the same core, we saw ~15% forced idle.
>>> 
>>> I guess this kind of performance drop can be replicated by other similar
>>> (a lot of softirq activities) workloads.
>>> 
>>> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
>>> it make sense we add a policy to allow cookie-compatible task running 
>>> together?
>>> For example, if a task is trusted(set by admin), it can work with kernel 
>>> thread.
>>> The difference from corescheduling disabled is that we still have user to 
>>> user
>>> isolation.
>> 
>> In ChromeOS we are considering all cookie-0 tasks as trusted.
>> Basically if you don't trust a task, then that is when you assign the
>> task a tag. We do this for the sandboxed processes.
> 
> I have a proposal of this, by changing cpu.tag to cpu.coresched_policy,
> something like the following:
> 
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
How about other OS tasks(like systemd) except kernel thread? :)

Thx.
Regards,
Jiang
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST 

Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-04 Thread Li, Aubrey
On 2020/8/4 0:53, Joel Fernandes wrote:
> Hi Aubrey,
> 
> On Mon, Aug 3, 2020 at 4:23 AM Li, Aubrey  wrote:
>>
>> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
>>> Sixth iteration of the Core-Scheduling feature.
>>>
>>> Core scheduling is a feature that allows only trusted tasks to run
>>> concurrently on cpus sharing compute resources (eg: hyperthreads on a
>>> core). The goal is to mitigate the core-level side-channel attacks
>>> without requiring to disable SMT (which has a significant impact on
>>> performance in some situations). Core scheduling (as of v6) mitigates
>>> user-space to user-space attacks and user to kernel attack when one of
>>> the siblings enters the kernel via interrupts. It is still possible to
>>> have a task attack the sibling thread when it enters the kernel via
>>> syscalls.
>>>
>>> By default, the feature doesn't change any of the current scheduler
>>> behavior. The user decides which tasks can run simultaneously on the
>>> same core (for now by having them in the same tagged cgroup). When a
>>> tag is enabled in a cgroup and a task from that cgroup is running on a
>>> hardware thread, the scheduler ensures that only idle or trusted tasks
>>> run on the other sibling(s). Besides security concerns, this feature
>>> can also be beneficial for RT and performance applications where we
>>> want to control how tasks make use of SMT dynamically.
>>>
>>> This iteration is mostly a cleanup of v5 except for a major feature of
>>> pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
>>> introducing documentation and includes minor crash fixes.
>>>
>>> One major cleanup was removing the hotplug support and related code.
>>> The hotplug related crashes were not documented and the fixes piled up
>>> over time leading to complex code. We were not able to reproduce the
>>> crashes in the limited testing done. But if they are reroducable, we
>>> don't want to hide them. We should document them and design better
>>> fixes if any.
>>>
>>> In terms of performance, the results in this release are similar to
>>> v5. On a x86 system with N hardware threads:
>>> - if only N/2 hardware threads are busy, the performance is similar
>>>   between baseline, corescheduling and nosmt
>>> - if N hardware threads are busy with N different corescheduling
>>>   groups, the impact of corescheduling is similar to nosmt
>>> - if N hardware threads are busy and multiple active threads share the
>>>   same corescheduling cookie, they gain a performance improvement over
>>>   nosmt.
>>>   The specific performance impact depends on the workload, but for a
>>>   really busy database 12-vcpu VM (1 coresched tag) running on a 36
>>>   hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
>>>   their own coresched tag), the performance drops by 54% with
>>>   corescheduling and drops by 90% with nosmt.
>>>
>>
>> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
>>
>> The problem is, uperf triggered a lot of softirq and offloaded softirq
>> service to *ksoftirqd* thread.
>>
>> - default, ksoftirqd thread can run with uperf on the same core, we saw
>>   100% CPU utilization.
>> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>>   they can't run concurrently on the same core, we saw ~15% forced idle.
>>
>> I guess this kind of performance drop can be replicated by other similar
>> (a lot of softirq activities) workloads.
>>
>> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
>> it make sense we add a policy to allow cookie-compatible task running 
>> together?
>> For example, if a task is trusted(set by admin), it can work with kernel 
>> thread.
>> The difference from corescheduling disabled is that we still have user to 
>> user
>> isolation.
> 
> In ChromeOS we are considering all cookie-0 tasks as trusted.
> Basically if you don't trust a task, then that is when you assign the
> task a tag. We do this for the sandboxed processes.

I have a proposal of this, by changing cpu.tag to cpu.coresched_policy,
something like the following:

+/*
+ * Core scheduling policy:
+ * - CORE_SCHED_DISABLED: core scheduling is disabled.
+ * - CORE_COOKIE_MATCH: tasks with same cookie can run
+ * on the same core concurrently.
+ * - CORE_COOKIE_TRUST: trusted task can run with kernel
thread on the same core concurrently. 
+ * - CORE_COOKIE_LONELY: tasks with cookie can run only
+ * with idle thread on the same core.
+ */
+enum coresched_policy {
+   CORE_SCHED_DISABLED,
+   CORE_SCHED_COOKIE_MATCH,
+   CORE_SCHED_COOKIE_TRUST,
+   CORE_SCHED_COOKIE_LONELY,
+};

We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
of performance regression. Not sure if this sounds attractive?

> 
> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?

This is a good question, from the data we measured by uperf,
SMT+core-scheduling is 28.2%

Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-03 Thread Joel Fernandes
Hi Aubrey,

On Mon, Aug 3, 2020 at 4:23 AM Li, Aubrey  wrote:
>
> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
> > Sixth iteration of the Core-Scheduling feature.
> >
> > Core scheduling is a feature that allows only trusted tasks to run
> > concurrently on cpus sharing compute resources (eg: hyperthreads on a
> > core). The goal is to mitigate the core-level side-channel attacks
> > without requiring to disable SMT (which has a significant impact on
> > performance in some situations). Core scheduling (as of v6) mitigates
> > user-space to user-space attacks and user to kernel attack when one of
> > the siblings enters the kernel via interrupts. It is still possible to
> > have a task attack the sibling thread when it enters the kernel via
> > syscalls.
> >
> > By default, the feature doesn't change any of the current scheduler
> > behavior. The user decides which tasks can run simultaneously on the
> > same core (for now by having them in the same tagged cgroup). When a
> > tag is enabled in a cgroup and a task from that cgroup is running on a
> > hardware thread, the scheduler ensures that only idle or trusted tasks
> > run on the other sibling(s). Besides security concerns, this feature
> > can also be beneficial for RT and performance applications where we
> > want to control how tasks make use of SMT dynamically.
> >
> > This iteration is mostly a cleanup of v5 except for a major feature of
> > pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
> > introducing documentation and includes minor crash fixes.
> >
> > One major cleanup was removing the hotplug support and related code.
> > The hotplug related crashes were not documented and the fixes piled up
> > over time leading to complex code. We were not able to reproduce the
> > crashes in the limited testing done. But if they are reroducable, we
> > don't want to hide them. We should document them and design better
> > fixes if any.
> >
> > In terms of performance, the results in this release are similar to
> > v5. On a x86 system with N hardware threads:
> > - if only N/2 hardware threads are busy, the performance is similar
> >   between baseline, corescheduling and nosmt
> > - if N hardware threads are busy with N different corescheduling
> >   groups, the impact of corescheduling is similar to nosmt
> > - if N hardware threads are busy and multiple active threads share the
> >   same corescheduling cookie, they gain a performance improvement over
> >   nosmt.
> >   The specific performance impact depends on the workload, but for a
> >   really busy database 12-vcpu VM (1 coresched tag) running on a 36
> >   hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
> >   their own coresched tag), the performance drops by 54% with
> >   corescheduling and drops by 90% with nosmt.
> >
>
> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
>
> The problem is, uperf triggered a lot of softirq and offloaded softirq
> service to *ksoftirqd* thread.
>
> - default, ksoftirqd thread can run with uperf on the same core, we saw
>   100% CPU utilization.
> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>   they can't run concurrently on the same core, we saw ~15% forced idle.
>
> I guess this kind of performance drop can be replicated by other similar
> (a lot of softirq activities) workloads.
>
> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
> it make sense we add a policy to allow cookie-compatible task running 
> together?
> For example, if a task is trusted(set by admin), it can work with kernel 
> thread.
> The difference from corescheduling disabled is that we still have user to user
> isolation.

In ChromeOS we are considering all cookie-0 tasks as trusted.
Basically if you don't trust a task, then that is when you assign the
task a tag. We do this for the sandboxed processes.

Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?

thanks,

 - Joel
PS: I am planning to write a patch behind a CONFIG option that tags
all processes (default untrusted) so everything gets a cookie which
some folks said was how they wanted (have a whitelist instead of
blacklist).


Re: [RFC PATCH 00/16] Core scheduling v6

2020-08-03 Thread Li, Aubrey
On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
> Sixth iteration of the Core-Scheduling feature.
> 
> Core scheduling is a feature that allows only trusted tasks to run
> concurrently on cpus sharing compute resources (eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). Core scheduling (as of v6) mitigates
> user-space to user-space attacks and user to kernel attack when one of
> the siblings enters the kernel via interrupts. It is still possible to
> have a task attack the sibling thread when it enters the kernel via
> syscalls.
> 
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a
> tag is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature
> can also be beneficial for RT and performance applications where we
> want to control how tasks make use of SMT dynamically.
> 
> This iteration is mostly a cleanup of v5 except for a major feature of
> pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
> introducing documentation and includes minor crash fixes.
> 
> One major cleanup was removing the hotplug support and related code.
> The hotplug related crashes were not documented and the fixes piled up
> over time leading to complex code. We were not able to reproduce the
> crashes in the limited testing done. But if they are reroducable, we
> don't want to hide them. We should document them and design better
> fixes if any.
> 
> In terms of performance, the results in this release are similar to
> v5. On a x86 system with N hardware threads:
> - if only N/2 hardware threads are busy, the performance is similar
>   between baseline, corescheduling and nosmt
> - if N hardware threads are busy with N different corescheduling
>   groups, the impact of corescheduling is similar to nosmt
> - if N hardware threads are busy and multiple active threads share the
>   same corescheduling cookie, they gain a performance improvement over
>   nosmt.
>   The specific performance impact depends on the workload, but for a
>   really busy database 12-vcpu VM (1 coresched tag) running on a 36
>   hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
>   their own coresched tag), the performance drops by 54% with
>   corescheduling and drops by 90% with nosmt.
> 

We found uperf(in cgroup) throughput drops by ~50% with corescheduling.

The problem is, uperf triggered a lot of softirq and offloaded softirq
service to *ksoftirqd* thread. 

- default, ksoftirqd thread can run with uperf on the same core, we saw
  100% CPU utilization.
- coresched enabled, ksoftirqd's core cookie is different from uperf, so
  they can't run concurrently on the same core, we saw ~15% forced idle.

I guess this kind of performance drop can be replicated by other similar
(a lot of softirq activities) workloads.

Currently core scheduler picks cookie-match tasks for all SMT siblings, does
it make sense we add a policy to allow cookie-compatible task running together?
For example, if a task is trusted(set by admin), it can work with kernel thread.
The difference from corescheduling disabled is that we still have user to user
isolation.

Thanks,
-Aubrey








Re: [RFC PATCH 00/16] Core scheduling v6

2020-07-31 Thread Vineeth Pillai
On 20/07/26 06:49AM, Vineeth Pillai wrote:
> 
> 
> Sixth iteration of the Core-Scheduling feature.
>
I am no longer with DigitalOcean. Kindly use this email address for all
future responses.

Thanks,
Vineeth