Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Mon, 18 May 2015, Thomas Gleixner wrote: On Mon, 18 May 2015, Vikas Shivappa wrote: On Fri, 15 May 2015, Thomas Gleixner wrote: On Mon, 11 May 2015, Vikas Shivappa wrote: + /* +* This needs to be fixed +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. And why is this not fixed __BEFORE__ this patch? You can do the changes to struct intel_cqm_state in a seperate patch and then do the proper implementation from the beginning instead of providing a half broken variant which gets replaced in the next patch. Ok , can fix both items in your comments. Reason I had it seperately is that the cache affects both cmt and cache allocation patches. And that's the wrong reason. Sure it affects both, but we first prepare the changes to the existing code and then build new stuff on top of it not the other way round. Building the roof before the basement is almost never a good idea. Ok , will merge all scheduing changes to one patch if you think thats better. Thanks, Vikas Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Mon, 18 May 2015, Vikas Shivappa wrote: > On Fri, 15 May 2015, Thomas Gleixner wrote: > > On Mon, 11 May 2015, Vikas Shivappa wrote: > > > + /* > > > + * This needs to be fixed > > > + * to cache the whole PQR instead of just CLOSid. > > > + * PQR has closid in high 32 bits and CQM-RMID in low 10 bits. > > > + * Should not write a 0 to the low 10 bits of PQR > > > + * and corrupt RMID. > > > > And why is this not fixed __BEFORE__ this patch? You can do the > > changes to struct intel_cqm_state in a seperate patch and then do the > > proper implementation from the beginning instead of providing a half > > broken variant which gets replaced in the next patch. > > Ok , can fix both items in your comments. Reason I had it seperately is that > the cache affects both cmt and cache allocation patches. And that's the wrong reason. Sure it affects both, but we first prepare the changes to the existing code and then build new stuff on top of it not the other way round. Building the roof before the basement is almost never a good idea. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Fri, 15 May 2015, Thomas Gleixner wrote: On Mon, 11 May 2015, Vikas Shivappa wrote: struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -24,6 +30,11 @@ struct clos_cbm_map { unsigned int clos_refcnt; }; +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); So again, why this useless helper function for a single call site? +static inline void intel_rdt_sched_in(void) +{ + if (rdt_enabled()) + __rdt_sched_in(); +} +void __rdt_sched_in(void) +{ + struct task_struct *task = current; + struct intel_rdt *ir; + unsigned int clos; + + /* +* This needs to be fixed +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. And why is this not fixed __BEFORE__ this patch? You can do the changes to struct intel_cqm_state in a seperate patch and then do the proper implementation from the beginning instead of providing a half broken variant which gets replaced in the next patch. Ok , can fix both items in your comments. Reason I had it seperately is that the cache affects both cmt and cache allocation patches. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Mon, 11 May 2015, Vikas Shivappa wrote: > struct rdt_subsys_info { > /* Clos Bitmap to keep track of available CLOSids.*/ > @@ -24,6 +30,11 @@ struct clos_cbm_map { > unsigned int clos_refcnt; > }; > > +static inline bool rdt_enabled(void) > +{ > + return static_key_false(&rdt_enable_key); So again, why this useless helper function for a single call site? > +static inline void intel_rdt_sched_in(void) > +{ > + if (rdt_enabled()) > + __rdt_sched_in(); > +} > +void __rdt_sched_in(void) > +{ > + struct task_struct *task = current; > + struct intel_rdt *ir; > + unsigned int clos; > + > + /* > + * This needs to be fixed > + * to cache the whole PQR instead of just CLOSid. > + * PQR has closid in high 32 bits and CQM-RMID in low 10 bits. > + * Should not write a 0 to the low 10 bits of PQR > + * and corrupt RMID. And why is this not fixed __BEFORE__ this patch? You can do the changes to struct intel_cqm_state in a seperate patch and then do the proper implementation from the beginning instead of providing a half broken variant which gets replaced in the next patch. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For Cache Allocation, MSR write would let the task fill in the cache 'subset' represented by the cgroup's cache_mask. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. The following considerations are done for the PQR MSR write so that it minimally impacts scheduler hot path: - This path does not exist on any non-intel platforms. - On Intel platforms, this would not exist by default unless CGROUP_RDT is enabled. - remains a no-op when CGROUP_RDT is enabled and intel SKU does not support the feature. - When feature is available and enabled, never does MSR write till the user manually creates a cgroup directory *and* assigns a cache_mask different from root cgroup directory. Since the child node inherits the parents cache mask , by cgroup creation there is no scheduling hot path impact from the new cgroup. - MSR write is only done when there is a task with different Closid is scheduled on the CPU. Typically if the task groups are bound to be scheduled on a set of CPUs , the number of MSR writes is greatly reduced. - For cgroup directories having same cache_mask the CLOSids are reused. This minimizes the number of CLOSids used and hence reduces the MSR write frequency. Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 44 arch/x86/include/asm/switch_to.h | 3 +++ arch/x86/kernel/cpu/intel_rdt.c | 30 +++ 3 files changed, 77 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 9e9dbbe..589394b 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,9 +4,15 @@ #ifdef CONFIG_CGROUP_RDT #include + +#define MSR_IA32_PQR_ASSOC 0xc8f #define MAX_CBM_LENGTH 32 #define IA32_L3_CBM_BASE 0xc90 #define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) +DECLARE_PER_CPU(unsigned int, x86_cpu_clos); +extern struct static_key rdt_enable_key; +extern void __rdt_sched_in(void); + struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -24,6 +30,11 @@ struct clos_cbm_map { unsigned int clos_refcnt; }; +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} + /* * Return rdt group corresponding to this container. */ @@ -37,5 +48,38 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir) return css_rdt(ir->css.parent); } +/* + * Return rdt group to which this task belongs. + */ +static inline struct intel_rdt *task_rdt(struct task_struct *task) +{ + return css_rdt(task_css(task, intel_rdt_cgrp_id)); +} + +/* + * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * + * Following considerations are made so that this has minimal impact + * on scheduler hot path: + * - This will stay as no-op unless we are running on an Intel SKU + * which supports L3 cache allocation. + * - When support is present and enabled, does not do any + * IA32_PQR_MSR writes until the user starts really using the feature + * ie creates a rdt cgroup directory and assigns a cache_mask thats + * different from the root cgroup's cache_mask. + * - Closids are allocated so that different cgroup directories + * with same cache_mask gets the same CLOSid. This minimizes CLOSids + * used and reduces MSR write frequency. + */ +static inline void intel_rdt_sched_in(void) +{ + if (rdt_enabled()) + __rdt_sched_in(); +} + +#else + +static inline void intel_rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..9149577 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define finish_arch_switch(prev) intel_rdt_sched_in() + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 125318d..fe3ce4e 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -35,6 +35,8 @@ static struct clos_cbm_map *ccmap; static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE; +DEFINE_PER_CPU(unsigned int, x86_cpu_clos); /* * Mask of CPUs for writing CBM values. We only need one per-socket. @@ -79,6 +81,33 @@ static void intel_rdt_free_closid(unsigned int clos) clear_bit(clos, rdtss_info.closmap); } +void __rdt_sched
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Fri, 8 May 2015, Peter Zijlstra wrote: On Thu, May 07, 2015 at 04:15:41PM -0700, Vikas Shivappa wrote: No. 1) two arch hooks right after one another is FAIL 1a) just 'fix' the existing hook 2) current is cheap and easily obtainable without passing it as an argument will fix to just use an existing hook in finish_task_switch and current(get_current) since the stack would already be changed .. Thanks, Vikas 3) why do you need the hook in the first place? 3a) why can't you put this in __switch_to()? This is very much x86 only code. ^ please also answer 3, why can't this go in __swtich_to()? perf uses similar(#1a) hook to update its MSRs (including for cache monitoring ). Also since switch_to is for registers state and stack , may be a safer option to use it in the finish_arch_switch? Thats kind of why we had it there at first but had a seperate hook. #define finish_arch_switch(prev)\ do {\ intel_rdt_sched_in(); \ } while (0) Hpa , any comments ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Thu, May 07, 2015 at 04:15:41PM -0700, Vikas Shivappa wrote: > >No. > > > >1) two arch hooks right after one another is FAIL > >1a) just 'fix' the existing hook > >2) current is cheap and easily obtainable without passing it as > > an argument > > will fix to just use an existing hook in finish_task_switch and > current(get_current) since the stack would already be changed .. > > Thanks, > Vikas > > >3) why do you need the hook in the first place? > >3a) why can't you put this in __switch_to()? This is very much x86 only > >code. ^ please also answer 3, why can't this go in __swtich_to()? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Wed, 6 May 2015, Peter Zijlstra wrote: On Mon, May 04, 2015 at 11:39:21AM -0700, Vikas Shivappa wrote: --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a8..cacb490 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); Not a word in the Changelog on this hook; that's double fail. will add the changelog. we want the current task which no other existing hook provides. No. 1) two arch hooks right after one another is FAIL 1a) just 'fix' the existing hook 2) current is cheap and easily obtainable without passing it as an argument will fix to just use an existing hook in finish_task_switch and current(get_current) since the stack would already be changed .. Thanks, Vikas 3) why do you need the hook in the first place? 3a) why can't you put this in __switch_to()? This is very much x86 only code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Tue, May 05, 2015 at 05:19:40PM -0700, Vikas Shivappa wrote: > However Matt pointed out I could improve this to > if (static_key_false) > { rdt_sched_in(); } > instead of a static inline which i will update. Will update the commit > message to include these details. Indeed, that causes minimal I$ bloat / impact for people not using this feature. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Mon, May 04, 2015 at 11:39:21AM -0700, Vikas Shivappa wrote: > >>--- a/arch/x86/include/asm/switch_to.h > >>+++ b/arch/x86/include/asm/switch_to.h > >>@@ -8,6 +8,9 @@ struct tss_struct; > >> void __switch_to_xtra(struct task_struct *prev_p, struct task_struct > >> *next_p, > >> struct tss_struct *tss); > >> > >>+#include > >>+#define post_arch_switch(current) rdt_sched_in(current) > >>+ > >> #ifdef CONFIG_X86_32 > >> > >> #ifdef CONFIG_CC_STACKPROTECTOR > > > >>diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >>index f9123a8..cacb490 100644 > >>--- a/kernel/sched/core.c > >>+++ b/kernel/sched/core.c > >>@@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct > >>task_struct *prev) > >>prev_state = prev->state; > >>vtime_task_switch(prev); > >>finish_arch_switch(prev); > >>+ post_arch_switch(current); > >>perf_event_task_sched_in(prev, current); > >>finish_lock_switch(rq, prev); > >>finish_arch_post_lock_switch(); > > > >Not a word in the Changelog on this hook; that's double fail. > > will add the changelog. we want the current task which no other existing > hook provides. No. 1) two arch hooks right after one another is FAIL 1a) just 'fix' the existing hook 2) current is cheap and easily obtainable without passing it as an argument 3) why do you need the hook in the first place? 3a) why can't you put this in __switch_to()? This is very much x86 only code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Sat, 2 May 2015, Peter Zijlstra wrote: On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote: Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. For Cache Allocation, this would let the task fill in the cache 'subset' represented by the cgroup's Cache bit mask(CBM). Are you guys for real? Have you even looked at the trainwreck this makes? The h/w tags the cache lines with the closid and hence we associate the id with the tasks scheduled. The following considerations are done for the PQR MSR write to have minimal effect on scheduling hot path: - This would not exist on any non-intel platforms. - On Intel platforms, this would not exist by default unless CGROUP_RDT is enabled. - remains a no-op when CGROUP_RDT is enabled and intel hardware does not support the feature. - When feature is available and RDT is enabled, does not do msr write till the user manually creates a cgroup *and* assigns a new cache mask. Since the child node inherits the parents cache mask , by cgroup creation there is no scheduling hot path impact from the new cgroup. - per cpu PQR values are cached and the MSR write is only done when there is a task with different PQR is scheduled on the CPU. Typically if the task groups are bound to be scheduled on a set of CPUs , the number of MSR writes is greatly reduced. However Matt pointed out I could improve this to if (static_key_false) { rdt_sched_in(); } instead of a static inline which i will update. Will update the commit message to include these details. And can improve to not enable the feature till a user creates a new cgroup and creates a new the bitmask. Thanks, Vikas +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} +/* + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * if the current Closid is different than the new one. + */ +static inline void rdt_sched_in(struct task_struct *task) +{ + struct intel_rdt *ir; + unsigned int clos; + + if (!rdt_enabled()) + return; + + /* +* This needs to be fixed +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. +*/ + clos = this_cpu_read(x86_cpu_clos); + + rcu_read_lock(); + ir = task_rdt(task); + if (ir->clos == clos) { + rcu_read_unlock(); + return; + } + + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); + this_cpu_write(x86_cpu_clos, ir->clos); + rcu_read_unlock(); +} You inject _ALL_ that into the scheduler hot path. Insane much? + +#else + +static inline void rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..82ef4b3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a8..cacb490 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); Not a word in the Changelog on this hook; that's double fail. Please _THINK_ before writing code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Sat, 2 May 2015, Peter Zijlstra wrote: On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote: Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. For Cache Allocation, this would let the task fill in the cache 'subset' represented by the cgroup's Cache bit mask(CBM). Are you guys for real? Have you even looked at the trainwreck this makes? +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} +/* + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * if the current Closid is different than the new one. + */ +static inline void rdt_sched_in(struct task_struct *task) +{ + struct intel_rdt *ir; + unsigned int clos; + + if (!rdt_enabled()) + return; + + /* +* This needs to be fixed +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. +*/ + clos = this_cpu_read(x86_cpu_clos); + + rcu_read_lock(); + ir = task_rdt(task); + if (ir->clos == clos) { + rcu_read_unlock(); + return; + } + + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); + this_cpu_write(x86_cpu_clos, ir->clos); + rcu_read_unlock(); +} You inject _ALL_ that into the scheduler hot path. Insane much? At some point I had a #ifdef for the rdt_sched_in , will fix this. + +#else + +static inline void rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..82ef4b3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a8..cacb490 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); Not a word in the Changelog on this hook; that's double fail. will add the changelog. we want the current task which no other existing hook provides. Thanks, Vikas Please _THINK_ before writing code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote: > Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. > > The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the > CLOSid. During context switch kernel implements this by writing the > CLOSid of the cgroup to which the task belongs to the CPU's > IA32_PQR_ASSOC MSR. > > For Cache Allocation, this would let the task fill in the cache 'subset' > represented by the cgroup's Cache bit mask(CBM). > Are you guys for real? Have you even looked at the trainwreck this makes? > +static inline bool rdt_enabled(void) > +{ > + return static_key_false(&rdt_enable_key); > +} > +/* > + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR > + * if the current Closid is different than the new one. > + */ > +static inline void rdt_sched_in(struct task_struct *task) > +{ > + struct intel_rdt *ir; > + unsigned int clos; > + > + if (!rdt_enabled()) > + return; > + > + /* > + * This needs to be fixed > + * to cache the whole PQR instead of just CLOSid. > + * PQR has closid in high 32 bits and CQM-RMID in low 10 bits. > + * Should not write a 0 to the low 10 bits of PQR > + * and corrupt RMID. > + */ > + clos = this_cpu_read(x86_cpu_clos); > + > + rcu_read_lock(); > + ir = task_rdt(task); > + if (ir->clos == clos) { > + rcu_read_unlock(); > + return; > + } > + > + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); > + this_cpu_write(x86_cpu_clos, ir->clos); > + rcu_read_unlock(); > +} You inject _ALL_ that into the scheduler hot path. Insane much? > + > +#else > + > +static inline void rdt_sched_in(struct task_struct *task) {} > + > #endif > #endif > diff --git a/arch/x86/include/asm/switch_to.h > b/arch/x86/include/asm/switch_to.h > index 751bf4b..82ef4b3 100644 > --- a/arch/x86/include/asm/switch_to.h > +++ b/arch/x86/include/asm/switch_to.h > @@ -8,6 +8,9 @@ struct tss_struct; > void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, > struct tss_struct *tss); > > +#include > +#define post_arch_switch(current)rdt_sched_in(current) > + > #ifdef CONFIG_X86_32 > > #ifdef CONFIG_CC_STACKPROTECTOR > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index f9123a8..cacb490 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct > *prev) > prev_state = prev->state; > vtime_task_switch(prev); > finish_arch_switch(prev); > + post_arch_switch(current); > perf_event_task_sched_in(prev, current); > finish_lock_switch(rq, prev); > finish_arch_post_lock_switch(); Not a word in the Changelog on this hook; that's double fail. Please _THINK_ before writing code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. For Cache Allocation, this would let the task fill in the cache 'subset' represented by the cgroup's Cache bit mask(CBM). Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 54 arch/x86/include/asm/switch_to.h | 3 +++ arch/x86/kernel/cpu/intel_rdt.c | 3 +++ kernel/sched/core.c | 1 + kernel/sched/sched.h | 3 +++ 5 files changed, 64 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 9e9dbbe..2fc496f 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,9 +4,13 @@ #ifdef CONFIG_CGROUP_RDT #include + +#define MSR_IA32_PQR_ASSOC 0xc8f #define MAX_CBM_LENGTH 32 #define IA32_L3_CBM_BASE 0xc90 #define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) +DECLARE_PER_CPU(unsigned int, x86_cpu_clos); +extern struct static_key rdt_enable_key; struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -24,6 +28,11 @@ struct clos_cbm_map { unsigned int clos_refcnt; }; +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} + /* * Return rdt group corresponding to this container. */ @@ -37,5 +46,50 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir) return css_rdt(ir->css.parent); } +/* + * Return rdt group to which this task belongs. + */ +static inline struct intel_rdt *task_rdt(struct task_struct *task) +{ + return css_rdt(task_css(task, rdt_cgrp_id)); +} + +/* + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * if the current Closid is different than the new one. + */ +static inline void rdt_sched_in(struct task_struct *task) +{ + struct intel_rdt *ir; + unsigned int clos; + + if (!rdt_enabled()) + return; + + /* +* This needs to be fixed +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. +*/ + clos = this_cpu_read(x86_cpu_clos); + + rcu_read_lock(); + ir = task_rdt(task); + if (ir->clos == clos) { + rcu_read_unlock(); + return; + } + + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); + this_cpu_write(x86_cpu_clos, ir->clos); + rcu_read_unlock(); +} + +#else + +static inline void rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..82ef4b3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 58b39d6..74b1e28 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -34,6 +34,8 @@ static struct clos_cbm_map *ccmap; static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE; +DEFINE_PER_CPU(unsigned int, x86_cpu_clos); /* * Mask of CPUs for writing CBM values. We only need one per-socket. @@ -433,6 +435,7 @@ static int __init rdt_late_init(void) __hotcpu_notifier(rdt_cpu_notifier, 0); cpu_notifier_register_done(); + static_key_slow_inc(&rdt_enable_key); pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid); return 0; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a8..cacb490 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e0e1299..9153747 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1045,6 +1045,9 @@ static inline int task_on_rq_migrating(struct task_struct *p) #ifndef finish_arch_switch # define finish_arch_switch(pr
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. For Cache Allocation, this would let the task fill in the cache 'subset' represented by the cgroup's Cache bit mask(CBM). Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 55 arch/x86/include/asm/switch_to.h | 3 +++ arch/x86/kernel/cpu/intel_rdt.c | 4 ++- kernel/sched/core.c | 1 + kernel/sched/sched.h | 3 +++ 5 files changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 0ed28d9..6383a24 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,9 +4,13 @@ #ifdef CONFIG_CGROUP_RDT #include + +#define MSR_IA32_PQR_ASSOC 0xc8f #define MAX_CBM_LENGTH 32 #define IA32_L3_CBM_BASE 0xc90 #define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) +DECLARE_PER_CPU(unsigned int, x86_cpu_clos); +extern struct static_key rdt_enable_key; struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -24,6 +28,11 @@ struct clos_cbm_map { unsigned int cgrp_count; }; +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} + /* * Return rdt group corresponding to this container. */ @@ -37,5 +46,51 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir) return css_rdt(ir->css.parent); } +/* + * Return rdt group to which this task belongs. + */ +static inline struct intel_rdt *task_rdt(struct task_struct *task) +{ + return css_rdt(task_css(task, rdt_cgrp_id)); +} + +/* + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * if the current Closid is different than the new one. + */ + +static inline void rdt_sched_in(struct task_struct *task) +{ + struct intel_rdt *ir; + unsigned int clos; + + if (!rdt_enabled()) + return; + + /* +* This needs to be fixed after CQM code stabilizes +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. +*/ + clos = this_cpu_read(x86_cpu_clos); + + rcu_read_lock(); + ir = task_rdt(task); + if (ir->clos == clos) { + rcu_read_unlock(); + return; + } + + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); + this_cpu_write(x86_cpu_clos, ir->clos); + rcu_read_unlock(); +} + +#else + +static inline void rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..82ef4b3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 495497a..0330791 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -32,6 +32,8 @@ static struct clos_cbm_map *ccmap; static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE; +DEFINE_PER_CPU(unsigned int, x86_cpu_clos); #define rdt_for_each_child(pos_css, parent_ir) \ css_for_each_child((pos_css), &(parent_ir)->css) @@ -76,7 +78,7 @@ static int __init rdt_late_init(void) ccm = &ccmap[0]; ccm->cbm = (u32)((u64)(1 << cbm_len) - 1); ccm->cgrp_count++; - + static_key_slow_inc(&rdt_enable_key); pr_info("cbmlength:%u,Closs: %u\n", cbm_len, maxid); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f0f831e..93ff61b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2206,6 +2206,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435..0b3c191 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1026,6 +1026,9 @@ static
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the cgroup to which the task belongs to the CPU's IA32_PQR_ASSOC MSR. For Cache Allocation, this would let the task fill in the cache 'subset' represented by the cgroup's Cache bit mask(CBM). Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 55 arch/x86/include/asm/switch_to.h | 3 +++ arch/x86/kernel/cpu/intel_rdt.c | 4 ++- kernel/sched/core.c | 1 + kernel/sched/sched.h | 3 +++ 5 files changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index a414771..bc57b56 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,9 +4,13 @@ #ifdef CONFIG_CGROUP_RDT #include + +#define MSR_IA32_PQR_ASSOC 0xc8f #define MAX_CBM_LENGTH 32 #define IA32_L3_CBM_BASE 0xc90 #define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) +DECLARE_PER_CPU(unsigned int, x86_cpu_clos); +extern struct static_key rdt_enable_key; struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -26,6 +30,11 @@ struct clos_cbm_map { unsigned int cgrp_count; }; +static inline bool rdt_enabled(void) +{ + return static_key_false(&rdt_enable_key); +} + /* * Return rdt group corresponding to this container. */ @@ -39,5 +48,51 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir) return css_rdt(ir->css.parent); } +/* + * Return rdt group to which this task belongs. + */ +static inline struct intel_rdt *task_rdt(struct task_struct *task) +{ + return css_rdt(task_css(task, rdt_cgrp_id)); +} + +/* + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * if the current Closid is different than the new one. + */ + +static inline void rdt_sched_in(struct task_struct *task) +{ + struct intel_rdt *ir; + unsigned int clos; + + if (!rdt_enabled()) + return; + + /* +* This needs to be fixed after CQM code stabilizes +* to cache the whole PQR instead of just CLOSid. +* PQR has closid in high 32 bits and CQM-RMID in low 10 bits. +* Should not write a 0 to the low 10 bits of PQR +* and corrupt RMID. +*/ + clos = this_cpu_read(x86_cpu_clos); + + rcu_read_lock(); + ir = task_rdt(task); + if (ir->clos == clos) { + rcu_read_unlock(); + return; + } + + wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos); + this_cpu_write(x86_cpu_clos, ir->clos); + rcu_read_unlock(); +} + +#else + +static inline void rdt_sched_in(struct task_struct *task) {} + #endif #endif diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 751bf4b..82ef4b3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -8,6 +8,9 @@ struct tss_struct; void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss); +#include +#define post_arch_switch(current) rdt_sched_in(current) + #ifdef CONFIG_X86_32 #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index dd090a7..602c580 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -32,6 +32,8 @@ static struct clos_cbm_map *ccmap; static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE; +DEFINE_PER_CPU(unsigned int, x86_cpu_clos); #define rdt_for_each_child(pos_css, parent_ir) \ css_for_each_child((pos_css), &(parent_ir)->css) @@ -77,7 +79,7 @@ static int __init rdt_late_init(void) ccm->cbm = (u32)((u64)(1 << cbm_len) - 1); rdt_root_group.cbm = &(ccm->cbm); ccm->cgrp_count++; - + static_key_slow_inc(&rdt_enable_key); pr_info("cbmlength:%u,Closs: %u\n", cbm_len, maxid); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d22fb16..a5c4d87 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2249,6 +2249,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); finish_arch_switch(prev); + post_arch_switch(current); perf_event_task_sched_in(prev, current); finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9a2a45c..49e77d7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1008,6 +