subject:"Re\: \[PATCH v8 \-tip 00\/26\] Core scheduling"

Re: [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

2020-11-13 Thread Ning, Hongyu



On 2020/11/13 17:22, Ning, Hongyu wrote:
> On 2020/11/7 4:55, Joel Fernandes wrote:
>> All,
>>
>> I am getting ready to send the next v9 series based on tip/master
>> branch. Could you please give the below tree a try and report any results in
>> your testing?
>> git tree:
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch 
>> coresched)
>> git log:
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched
>>
>> The major changes in this series are the improvements:
>> (1)
>> "sched: Make snapshotting of min_vruntime more CGroup-friendly"
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=9a20a6652b3c50fd51faa829f7947004239a04eb
>>
>> (2)
>> "sched: Simplify the core pick loop for optimized case"
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=0370117b4fd418cdaaa6b1489bfc14f305691152
>>
>> And a bug fix:
>> (1)
>> "sched: Enqueue task into core queue only after vruntime is updated"
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=401dad5536e7e05d1299d0864e6fc5072029f492
>>
>> There are also 2 more bug fixes that I squashed-in related to kernel
>> protection and a crash seen on the tip/master branch.
>>
>> Hoping to send the series next week out to the list.
>>
>> Have a great weekend, and Thanks!
>>
>>  - Joel
>>
>>
>> On Mon, Oct 19, 2020 at 09:43:10PM -0400, Joel Fernandes (Google) wrote:
> 
> Adding 4 workloads test results for core scheduling v9 candidate: 
> 
> - kernel under test: 
>   -- coresched community v9 candidate from 
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch 
> coresched)
>   -- latest commit: 2e8591a330ff (HEAD -> coresched, origin/coresched) 
> NEW: sched: Add a coresched command line option
>   -- coresched=on kernel parameter applied
> - workloads: 
>   -- A. sysbench cpu (192 threads) + sysbench cpu (192 threads)
>   -- B. sysbench cpu (192 threads) + sysbench mysql (192 threads, mysqld 
> forced into the same cgroup)
>   -- C. uperf netperf.xml (192 threads over TCP or UDP protocol 
> separately)
>   -- D. will-it-scale context_switch via pipe (192 threads)
> - test machine setup: 
>   CPU(s):  192
>   On-line CPU(s) list: 0-191
>   Thread(s) per core:  2
>   Core(s) per socket:  48
>   Socket(s):   2
>   NUMA node(s):4
> - test results, no obvious performance drop compared to community v8 build:
>   -- workload A:
>   
> +--+--+--++
>   |  | **   | sysbench cpu * 192   | sysbench cpu * 
> 192 |
>   
> +==+==+==++
>   | cgroup   | **   | cg_sysbench_cpu_0| 
> cg_sysbench_cpu_1  |
>   
> +--+--+--++
>   | record_item  | **   | Tput_avg (events/s)  | Tput_avg 
> (events/s)|
>   
> +--+--+--++
>   | coresched_normalized | **   | 0.98 | 1.01 
>   |
>   
> +--+--+--++
>   | default_normalized   | **   | 1| 1
>   |
>   
> +--+--+--++
>   | smtoff_normalized| **   | 0.59 | 0.6  
>   |
>   
> +--+--+--++
> 
>   -- workload B:
>   
> +--+--+--++
>   |  | **   | sysbench cpu * 192   | sysbench mysql * 
> 192   |
>   
> +==+==+==++
>   | cgroup   | **   | cg_sysbench_cpu_0| 
> cg_sysbench_mysql_0|
>   
> +--+--+--++
>   | record_item  | **   | Tput_avg (events/s)  | Tput_avg 
> (events/s)|
>   
> +--+--+--++
>   | coresched_normalized | **   | 1.02 | 0.78 
>   |
>   
> +--+--+--++
>   | default_normalized   | **   | 1| 1
>   |
>   
> +--+--+--++
>   | smtoff_normalized| **   | 0.59 | 0.75 
>   |
>   
> +--+--+--++
> 
>

Re: [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

2020-11-13 Thread Ning, Hongyu

On 2020/11/7 4:55, Joel Fernandes wrote:
> All,
> 
> I am getting ready to send the next v9 series based on tip/master
> branch. Could you please give the below tree a try and report any results in
> your testing?
> git tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch 
> coresched)
> git log:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched
> 
> The major changes in this series are the improvements:
> (1)
> "sched: Make snapshotting of min_vruntime more CGroup-friendly"
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=9a20a6652b3c50fd51faa829f7947004239a04eb
> 
> (2)
> "sched: Simplify the core pick loop for optimized case"
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=0370117b4fd418cdaaa6b1489bfc14f305691152
> 
> And a bug fix:
> (1)
> "sched: Enqueue task into core queue only after vruntime is updated"
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=401dad5536e7e05d1299d0864e6fc5072029f492
> 
> There are also 2 more bug fixes that I squashed-in related to kernel
> protection and a crash seen on the tip/master branch.
> 
> Hoping to send the series next week out to the list.
> 
> Have a great weekend, and Thanks!
> 
>  - Joel
> 
> 
> On Mon, Oct 19, 2020 at 09:43:10PM -0400, Joel Fernandes (Google) wrote:

Adding 4 workloads test results for core scheduling v9 candidate: 

- kernel under test: 
-- coresched community v9 candidate from 
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch 
coresched)
-- latest commit: 2e8591a330ff (HEAD -> coresched, origin/coresched) 
NEW: sched: Add a coresched command line option
-- coresched=on kernel parameter applied
- workloads: 
-- A. sysbench cpu (192 threads) + sysbench cpu (192 threads)
-- B. sysbench cpu (192 threads) + sysbench mysql (192 threads, mysqld 
forced into the same cgroup)
-- C. uperf netperf.xml (192 threads over TCP or UDP protocol 
separately)
-- D. will-it-scale context_switch via pipe (192 threads)
- test machine setup: 
CPU(s):  192
On-line CPU(s) list: 0-191
Thread(s) per core:  2
Core(s) per socket:  48
Socket(s):   2
NUMA node(s):4
- test results, no obvious performance drop compared to community v8 build:
-- workload A:

+--+--+--++
|  | **   | sysbench cpu * 192   | sysbench cpu * 
192 |

+==+==+==++
| cgroup   | **   | cg_sysbench_cpu_0| 
cg_sysbench_cpu_1  |

+--+--+--++
| record_item  | **   | Tput_avg (events/s)  | Tput_avg 
(events/s)|

+--+--+--++
| coresched_normalized | **   | 0.98 | 1.01 
  |

+--+--+--++
| default_normalized   | **   | 1| 1
  |

+--+--+--++
| smtoff_normalized| **   | 0.59 | 0.6  
  |

+--+--+--++

-- workload B:

+--+--+--++
|  | **   | sysbench cpu * 192   | sysbench mysql * 
192   |

+==+==+==++
| cgroup   | **   | cg_sysbench_cpu_0| 
cg_sysbench_mysql_0|

+--+--+--++
| record_item  | **   | Tput_avg (events/s)  | Tput_avg 
(events/s)|

+--+--+--++
| coresched_normalized | **   | 1.02 | 0.78 
  |

+--+--+--++
| default_normalized   | **   | 1| 1
  |

+--+--+--++
| smtoff_normalized| **   | 0.59 | 0.75 
  |

+--+--+--++

-- workload C:

+--+--+---+---+
|  |

Re: [PATCH v8 -tip 00/26] Core scheduling

2020-11-08 Thread Li, Aubrey

On 2020/11/7 1:54, Joel Fernandes wrote:
> On Fri, Nov 06, 2020 at 10:58:58AM +0800, Li, Aubrey wrote:
> 
>>>
>>> -- workload D, new added syscall workload, performance drop in cs_on:
>>> +--+--+---+
>>> |  | **   | will-it-scale  * 192  |
>>> |  |  | (pipe based context_switch)   |
>>> +==+==+===+
>>> | cgroup   | **   | cg_will-it-scale  |
>>> +--+--+---+
>>> | record_item  | **   | threads_avg   |
>>> +--+--+---+
>>> | coresched_normalized | **   | 0.2   |
>>> +--+--+---+
>>> | default_normalized   | **   | 1 |
>>> +--+--+---+
>>> | smtoff_normalized| **   | 0.89  |
>>> +--+--+---+
>>
>> will-it-scale may be a very extreme case. The story here is,
>> - On one sibling reader/writer gets blocked and tries to schedule another 
>> reader/writer in.
>> - The other sibling tries to wake up reader/writer.
>>
>> Both CPUs are acquiring rq->__lock,
>>
>> So when coresched off, they are two different locks, lock stat(1 second 
>> delta) below:
>>
>> class namecon-bouncescontentions   waittime-min   waittime-max 
>> waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min   
>> holdtime-max holdtime-total   holdtime-avg
>> &rq->__lock:  210210   0.10   3.04   
>>   180.87   0.86797   79165021   0.03 
>>  20.6960650198.34   0.77
>>
>> But when coresched on, they are actually one same lock, lock stat(1 second 
>> delta) below:
>>
>> class namecon-bouncescontentions   waittime-min   waittime-max 
>> waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min   
>> holdtime-max holdtime-total   holdtime-avg
>> &rq->__lock:  64794596484857   0.05 216.46
>> 60829776.85   9.388346319   15399739   0.03  
>> 95.5681119515.38   5.27
>>
>> This nature of core scheduling may degrade the performance of similar 
>> workloads with frequent context switching.
> 
> When core sched is off, is SMT off as well? From the above table, it seems to
> be. So even for core sched off, there will be a single lock per physical CPU
> core (assuming SMT is also off) right? Or did I miss something?
> 

The table includes 3 cases:
- default:  SMT on,  coresched off
- coresched:SMT on,  coresched on
- smtoff:   SMT off, coresched off

I was comparing the default(coresched off & SMT on) case with (coresched
on & SMT on) case.

If SMT off, then reader and writer on the different cores have different 
rq->lock,
so the lock contention is not that serious.

class namecon-bouncescontentions   waittime-min   waittime-max 
waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min   
holdtime-max holdtime-total   holdtime-avg
&rq->__lock:   60 60   0.11   1.92  
41.33   0.69127   67184172   0.03  
22.9533160428.37   0.49

Does this address your concern?

Thanks,
-Aubrey

[RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

2020-11-06 Thread Joel Fernandes

All,

I am getting ready to send the next v9 series based on tip/master
branch. Could you please give the below tree a try and report any results in
your testing?
git tree:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch 
coresched)
git log:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched

The major changes in this series are the improvements:
(1)
"sched: Make snapshotting of min_vruntime more CGroup-friendly"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=9a20a6652b3c50fd51faa829f7947004239a04eb

(2)
"sched: Simplify the core pick loop for optimized case"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=0370117b4fd418cdaaa6b1489bfc14f305691152

And a bug fix:
(1)
"sched: Enqueue task into core queue only after vruntime is updated"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=401dad5536e7e05d1299d0864e6fc5072029f492

There are also 2 more bug fixes that I squashed-in related to kernel
protection and a crash seen on the tip/master branch.

Hoping to send the series next week out to the list.

Have a great weekend, and Thanks!

 - Joel


On Mon, Oct 19, 2020 at 09:43:10PM -0400, Joel Fernandes (Google) wrote:
> Eighth iteration of the Core-Scheduling feature.
> 
> Core scheduling is a feature that allows only trusted tasks to run
> concurrently on cpus sharing compute resources (eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). Core scheduling (as of v7) mitigates
> user-space to user-space attacks and user to kernel attack when one of
> the siblings enters the kernel via interrupts or system call.
> 
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a tag
> is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature can
> also be beneficial for RT and performance applications where we want to
> control how tasks make use of SMT dynamically.
> 
> This iteration focuses on the the following stuff:
> - Redesigned API.
> - Rework of Kernel Protection feature based on Thomas's entry work.
> - Rework of hotplug fixes.
> - Address review comments in v7
> 
> Joel: Both a CGroup and Per-task interface via prctl(2) are provided for
> configuring core sharing. More details are provided in documentation patch.
> Kselftests are provided to verify the correctness/rules of the interface.
> 
> Julien: TPCC tests showed improvements with core-scheduling. With kernel
> protection enabled, it does not show any regression. Possibly ASI will improve
> the performance for those who choose kernel protection (can be toggled through
> sched_core_protect_kernel sysctl). Results:
> v8average stdev   diff
> baseline (SMT on) 1197.27244.78312824 
> core sched (   kernel protect)412.989545.42734343 -65.51%
> core sched (no kernel protect)686.651571.77756931 -42.65%
> nosmt 408.667 39.39042872 -65.87%
> 
> v8 is rebased on tip/master.
> 
> Future work
> ===
> - Load balancing/Migration fixes for core scheduling.
>   With v6, Load balancing is partially coresched aware, but has some
>   issues w.r.t process/taskgroup weights:
>   https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z...
> - Core scheduling test framework: kselftests, torture tests etc
> 
> Changes in v8
> =
> - New interface/API implementation
>   - Joel
> - Revised kernel protection patch
>   - Joel
> - Revised Hotplug fixes
>   - Joel
> - Minor bug fixes and address review comments
>   - Vineeth
> 
> Changes in v7
> =
> - Kernel protection from untrusted usermode tasks
>   - Joel, Vineeth
> - Fix for hotplug crashes and hangs
>   - Joel, Vineeth
> 
> Changes in v6
> =
> - Documentation
>   - Joel
> - Pause siblings on entering nmi/irq/softirq
>   - Joel, Vineeth
> - Fix for RCU crash
>   - Joel
> - Fix for a crash in pick_next_task
>   - Yu Chen, Vineeth
> - Minor re-write of core-wide vruntime comparison
>   - Aaron Lu
> - Cleanup: Address Review comments
> - Cleanup: Remove hotplug support (for now)
> - Build fixes: 32 bit, SMT=n, AUTOGROUP=n etc
>   - Joel, Vineeth
> 
> Changes in v5
> =
> - Fixes for cgroup/process tagging during corner cases like cgroup
>   destroy, task moving across cgroups etc
>   - Tim Chen
> - Coresched aware task migrations
>   - Aubrey Li
> - Other minor stability fixes.
> 
> Changes i

Re: [PATCH v8 -tip 00/26] Core scheduling

2020-11-06 Thread Joel Fernandes

On Fri, Nov 06, 2020 at 10:58:58AM +0800, Li, Aubrey wrote:

> > 
> > -- workload D, new added syscall workload, performance drop in cs_on:
> > +--+--+---+
> > |  | **   | will-it-scale  * 192  |
> > |  |  | (pipe based context_switch)   |
> > +==+==+===+
> > | cgroup   | **   | cg_will-it-scale  |
> > +--+--+---+
> > | record_item  | **   | threads_avg   |
> > +--+--+---+
> > | coresched_normalized | **   | 0.2   |
> > +--+--+---+
> > | default_normalized   | **   | 1 |
> > +--+--+---+
> > | smtoff_normalized| **   | 0.89  |
> > +--+--+---+
> 
> will-it-scale may be a very extreme case. The story here is,
> - On one sibling reader/writer gets blocked and tries to schedule another 
> reader/writer in.
> - The other sibling tries to wake up reader/writer.
> 
> Both CPUs are acquiring rq->__lock,
> 
> So when coresched off, they are two different locks, lock stat(1 second 
> delta) below:
> 
> class namecon-bouncescontentions   waittime-min   waittime-max 
> waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min   
> holdtime-max holdtime-total   holdtime-avg
> &rq->__lock:  210210   0.10   3.04
>  180.87   0.86797   79165021   0.03  
> 20.6960650198.34   0.77
> 
> But when coresched on, they are actually one same lock, lock stat(1 second 
> delta) below:
> 
> class namecon-bouncescontentions   waittime-min   waittime-max 
> waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min   
> holdtime-max holdtime-total   holdtime-avg
> &rq->__lock:  64794596484857   0.05 216.46
> 60829776.85   9.388346319   15399739   0.03   
>95.5681119515.38   5.27
> 
> This nature of core scheduling may degrade the performance of similar 
> workloads with frequent context switching.

When core sched is off, is SMT off as well? From the above table, it seems to
be. So even for core sched off, there will be a single lock per physical CPU
core (assuming SMT is also off) right? Or did I miss something?

thanks,

 - Joel

Re: [PATCH v8 -tip 00/26] Core scheduling

2020-11-05 Thread Li, Aubrey

On 2020/10/30 21:26, Ning, Hongyu wrote:
> On 2020/10/20 9:43, Joel Fernandes (Google) wrote:
>> Eighth iteration of the Core-Scheduling feature.
>>
>> Core scheduling is a feature that allows only trusted tasks to run
>> concurrently on cpus sharing compute resources (eg: hyperthreads on a
>> core). The goal is to mitigate the core-level side-channel attacks
>> without requiring to disable SMT (which has a significant impact on
>> performance in some situations). Core scheduling (as of v7) mitigates
>> user-space to user-space attacks and user to kernel attack when one of
>> the siblings enters the kernel via interrupts or system call.
>>
>> By default, the feature doesn't change any of the current scheduler
>> behavior. The user decides which tasks can run simultaneously on the
>> same core (for now by having them in the same tagged cgroup). When a tag
>> is enabled in a cgroup and a task from that cgroup is running on a
>> hardware thread, the scheduler ensures that only idle or trusted tasks
>> run on the other sibling(s). Besides security concerns, this feature can
>> also be beneficial for RT and performance applications where we want to
>> control how tasks make use of SMT dynamically.
>>
>> This iteration focuses on the the following stuff:
>> - Redesigned API.
>> - Rework of Kernel Protection feature based on Thomas's entry work.
>> - Rework of hotplug fixes.
>> - Address review comments in v7
>>
>> Joel: Both a CGroup and Per-task interface via prctl(2) are provided for
>> configuring core sharing. More details are provided in documentation patch.
>> Kselftests are provided to verify the correctness/rules of the interface.
>>
>> Julien: TPCC tests showed improvements with core-scheduling. With kernel
>> protection enabled, it does not show any regression. Possibly ASI will 
>> improve
>> the performance for those who choose kernel protection (can be toggled 
>> through
>> sched_core_protect_kernel sysctl). Results:
>> v8   average stdev   diff
>> baseline (SMT on)1197.27244.78312824 
>> core sched (   kernel protect)   412.989545.42734343 -65.51%
>> core sched (no kernel protect)   686.651571.77756931 -42.65%
>> nosmt408.667 39.39042872 -65.87%
>>
>> v8 is rebased on tip/master.
>>
>> Future work
>> ===
>> - Load balancing/Migration fixes for core scheduling.
>>   With v6, Load balancing is partially coresched aware, but has some
>>   issues w.r.t process/taskgroup weights:
>>   https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z...
>> - Core scheduling test framework: kselftests, torture tests etc
>>
>> Changes in v8
>> =
>> - New interface/API implementation
>>   - Joel
>> - Revised kernel protection patch
>>   - Joel
>> - Revised Hotplug fixes
>>   - Joel
>> - Minor bug fixes and address review comments
>>   - Vineeth
>>
> 
>> create mode 100644 tools/testing/selftests/sched/config
>> create mode 100644 tools/testing/selftests/sched/test_coresched.c
>>
> 
> Adding 4 workloads test results for Core Scheduling v8: 
> 
> - kernel under test: coresched community v8 from 
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched-v5.9
> - workloads: 
>   -- A. sysbench cpu (192 threads) + sysbench cpu (192 threads)
>   -- B. sysbench cpu (192 threads) + sysbench mysql (192 threads, mysqld 
> forced into the same cgroup)
>   -- C. uperf netperf.xml (192 threads over TCP or UDP protocol 
> separately)
>   -- D. will-it-scale context_switch via pipe (192 threads)
> - test machine setup: 
>   CPU(s):  192
>   On-line CPU(s) list: 0-191
>   Thread(s) per core:  2
>   Core(s) per socket:  48
>   Socket(s):   2
>   NUMA node(s):4
> - test results:
>   -- workload A, no obvious performance drop in cs_on:
>   
> +--+--+--++
>   |  | **   | sysbench cpu * 192   | sysbench mysql * 
> 192   |
>   
> +==+==+==++
>   | cgroup   | **   | cg_sysbench_cpu_0| 
> cg_sysbench_mysql_0|
>   
> +--+--+--++
>   | record_item  | **   | Tput_avg (events/s)  | Tput_avg 
> (events/s)|
>   
> +--+--+--++
>   | coresched_normalized | **   | 1.01 | 0.87 
>   |
>   
> +--+--+--++
>   | default_normalized   | **   | 1| 1
>   |
>   
> +--+--+--++
>   | smtoff_normalized| **   | 0.59 | 0

Re: [PATCH v8 -tip 00/26] Core scheduling

2020-10-30 Thread Ning, Hongyu

On 2020/10/20 9:43, Joel Fernandes (Google) wrote:
> Eighth iteration of the Core-Scheduling feature.
> 
> Core scheduling is a feature that allows only trusted tasks to run
> concurrently on cpus sharing compute resources (eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). Core scheduling (as of v7) mitigates
> user-space to user-space attacks and user to kernel attack when one of
> the siblings enters the kernel via interrupts or system call.
> 
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a tag
> is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature can
> also be beneficial for RT and performance applications where we want to
> control how tasks make use of SMT dynamically.
> 
> This iteration focuses on the the following stuff:
> - Redesigned API.
> - Rework of Kernel Protection feature based on Thomas's entry work.
> - Rework of hotplug fixes.
> - Address review comments in v7
> 
> Joel: Both a CGroup and Per-task interface via prctl(2) are provided for
> configuring core sharing. More details are provided in documentation patch.
> Kselftests are provided to verify the correctness/rules of the interface.
> 
> Julien: TPCC tests showed improvements with core-scheduling. With kernel
> protection enabled, it does not show any regression. Possibly ASI will improve
> the performance for those who choose kernel protection (can be toggled through
> sched_core_protect_kernel sysctl). Results:
> v8average stdev   diff
> baseline (SMT on) 1197.27244.78312824 
> core sched (   kernel protect)412.989545.42734343 -65.51%
> core sched (no kernel protect)686.651571.77756931 -42.65%
> nosmt 408.667 39.39042872 -65.87%
> 
> v8 is rebased on tip/master.
> 
> Future work
> ===
> - Load balancing/Migration fixes for core scheduling.
>   With v6, Load balancing is partially coresched aware, but has some
>   issues w.r.t process/taskgroup weights:
>   https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z...
> - Core scheduling test framework: kselftests, torture tests etc
> 
> Changes in v8
> =
> - New interface/API implementation
>   - Joel
> - Revised kernel protection patch
>   - Joel
> - Revised Hotplug fixes
>   - Joel
> - Minor bug fixes and address review comments
>   - Vineeth
> 

> create mode 100644 tools/testing/selftests/sched/config
> create mode 100644 tools/testing/selftests/sched/test_coresched.c
> 

Adding 4 workloads test results for Core Scheduling v8: 

- kernel under test: coresched community v8 from 
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched-v5.9
- workloads: 
-- A. sysbench cpu (192 threads) + sysbench cpu (192 threads)
-- B. sysbench cpu (192 threads) + sysbench mysql (192 threads, mysqld 
forced into the same cgroup)
-- C. uperf netperf.xml (192 threads over TCP or UDP protocol 
separately)
-- D. will-it-scale context_switch via pipe (192 threads)
- test machine setup: 
CPU(s):  192
On-line CPU(s) list: 0-191
Thread(s) per core:  2
Core(s) per socket:  48
Socket(s):   2
NUMA node(s):4
- test results:
-- workload A, no obvious performance drop in cs_on:

+--+--+--++
|  | **   | sysbench cpu * 192   | sysbench mysql * 
192   |

+==+==+==++
| cgroup   | **   | cg_sysbench_cpu_0| 
cg_sysbench_mysql_0|

+--+--+--++
| record_item  | **   | Tput_avg (events/s)  | Tput_avg 
(events/s)|

+--+--+--++
| coresched_normalized | **   | 1.01 | 0.87 
  |

+--+--+--++
| default_normalized   | **   | 1| 1
  |

+--+--+--++
| smtoff_normalized| **   | 0.59 | 0.82 
  |

+--+--+--++

-- workload B, no obvious pe

Re: [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

Re: [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

Re: [PATCH v8 -tip 00/26] Core scheduling

[RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)

Re: [PATCH v8 -tip 00/26] Core scheduling

Re: [PATCH v8 -tip 00/26] Core scheduling

Re: [PATCH v8 -tip 00/26] Core scheduling

7 matches

Site Navigation

Mail list logo

Footer information