Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On Wed, Mar 22, 2023 at 06:22:28PM +, Valentin Schneider wrote: > On 22/03/23 18:22, Peter Zijlstra wrote: > >>hackbench-157 [001]10.894320: ipi_send_cpu: cpu=3 > >> callsite=check_preempt_curr+0x37 callback=0x0 > > > > Arguably we should be setting callback to scheduler_ipi(), except > > ofcourse, that's not an actual function... > > > > Maybe we can do "extern inline" for the actual users and provide a dummy > > function for the symbol when tracing. > > > > Huh, I wasn't aware that was an option, I'll look into that. I did scribble > down a comment next to smp_send_reschedule(), but having a decodable > function name would be better! So clang-15 builds the below (and generates the expected code), but gcc-12 vomits nonsense about a non-static inline calling a static inline or somesuch bollocks :-/ --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1991,7 +1991,7 @@ extern char *__get_task_comm(char *to, s }) #ifdef CONFIG_SMP -static __always_inline void scheduler_ipi(void) +extern __always_inline void scheduler_ipi(void) { /* * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -130,9 +130,9 @@ extern void arch_smp_send_reschedule(int * scheduler_ipi() is inline so can't be passed as callback reason, but the * callsite IP should be sufficient for root-causing IPIs sent from here. */ -#define smp_send_reschedule(cpu) ({ \ - trace_ipi_send_cpu(cpu, _RET_IP_, NULL); \ - arch_smp_send_reschedule(cpu);\ +#define smp_send_reschedule(cpu) ({\ + trace_ipi_send_cpu(cpu, _RET_IP_, &scheduler_ipi); \ + arch_smp_send_reschedule(cpu); \ }) /* --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3790,6 +3790,15 @@ static int ttwu_runnable(struct task_str } #ifdef CONFIG_SMP +void scheduler_ipi(void) +{ + /* +* Actual users should end up using the extern inline, this is only +* here for the symbol. +*/ + BUG(); +} + void sched_ttwu_pending(void *arg) { struct llist_node *llist = arg; ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On 22/03/23 18:22, Peter Zijlstra wrote: > On Wed, Mar 22, 2023 at 05:01:13PM +, Valentin Schneider wrote: > >> > So I was thinking something like this: > >> Hm, this does get rid of the func being passed down the helpers, but this >> means the trace events are now stateful, i.e. I need the first and last >> events in a CSD stack to figure out which one actually caused the IPI. > > Isn't much of tracing stateful? I mean, why am I always writing awk > programs to parse trace output? > > The one that is directly followed by > generic_smp_call_function_single_interrupt() (horrible name that), is > the one that tripped the IPI. > Right. >> It also requires whoever is looking at the trace to be aware of which IPIs >> are attached to a CSD, and which ones aren't. ATM that's only the resched >> IPI, but per the cover letter there's more to come (e.g. tick_broadcast() >> for arm64/riscv and a few others). For instance: >> >>hackbench-157 [001]10.894320: ipi_send_cpu: cpu=3 >> callsite=check_preempt_curr+0x37 callback=0x0 > > Arguably we should be setting callback to scheduler_ipi(), except > ofcourse, that's not an actual function... > > Maybe we can do "extern inline" for the actual users and provide a dummy > function for the symbol when tracing. > Huh, I wasn't aware that was an option, I'll look into that. I did scribble down a comment next to smp_send_reschedule(), but having a decodable function name would be better! >>hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 >> callsite=try_to_wake_up+0x29e callback=sched_ttwu_pending+0x0 >>hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 >> callsite=try_to_wake_up+0x29e >> callback=generic_smp_call_function_single_interrupt+0x0 >> >> That first one sent a RESCHEDULE IPI, the second one a CALL_FUNCTION one, >> but you really have to know what you're looking at... > > But you have to know that anyway, you can't do tracing and not know wtf > you're doing. Or rather, if you do, I don't give a crap and you can keep > the pieces :-) > > Grepping the callback should be pretty quick resolution at to what trips > it, no? > > (also, if you *really* can't manage, we can always add yet another > argument that gives a type thingy) > Ah, I was a bit unclear here - I don't care too much about the IPI type being used, but rather being able to figure out on IRQ entry where that IPI came from - thinking some more about now, I don't think logging *all* CSDs causes an issue there, as you'd look at the earliest-not-seen-yet event targeting this CPU anyway. That'll be made easy once I get to having cpumask filters for ftrace, so I can just issue something like: trace-cmd record -e 'ipi_send_cpu' -f "cpu == 3" -e 'ipi_send_cpumask' -f "cpus \in {3}" -T hackbench (it's somewhere on the todolist...) TL;DR: I *think* I've convinced myself logging all of them isn't an issue - I'm going to play with this on something "smarter" than just hackbench under QEMU just to drill it in. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On Wed, Mar 22, 2023 at 05:01:13PM +, Valentin Schneider wrote: > > So I was thinking something like this: > Hm, this does get rid of the func being passed down the helpers, but this > means the trace events are now stateful, i.e. I need the first and last > events in a CSD stack to figure out which one actually caused the IPI. Isn't much of tracing stateful? I mean, why am I always writing awk programs to parse trace output? The one that is directly followed by generic_smp_call_function_single_interrupt() (horrible name that), is the one that tripped the IPI. > It also requires whoever is looking at the trace to be aware of which IPIs > are attached to a CSD, and which ones aren't. ATM that's only the resched > IPI, but per the cover letter there's more to come (e.g. tick_broadcast() > for arm64/riscv and a few others). For instance: > >hackbench-157 [001]10.894320: ipi_send_cpu: cpu=3 > callsite=check_preempt_curr+0x37 callback=0x0 Arguably we should be setting callback to scheduler_ipi(), except ofcourse, that's not an actual function... Maybe we can do "extern inline" for the actual users and provide a dummy function for the symbol when tracing. >hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 > callsite=try_to_wake_up+0x29e callback=sched_ttwu_pending+0x0 >hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 > callsite=try_to_wake_up+0x29e > callback=generic_smp_call_function_single_interrupt+0x0 > > That first one sent a RESCHEDULE IPI, the second one a CALL_FUNCTION one, > but you really have to know what you're looking at... But you have to know that anyway, you can't do tracing and not know wtf you're doing. Or rather, if you do, I don't give a crap and you can keep the pieces :-) Grepping the callback should be pretty quick resolution at to what trips it, no? (also, if you *really* can't manage, we can always add yet another argument that gives a type thingy) > Are you worried about the @func being pushed down? Not really, I was finding it odd that only the first csd was being logged. Either you should log them all (after all, the target CPU will run them all and you might still wonder where the heck they came from) or it should log none and always report that hideous long function name I can't be arsed to type again :-) > Staring at x86 asm is not good for the soul, Scarred for life :-) What's worse, due to being exposed to Intel syntax at a young age, I'm now permantently confused as to the argument order of x86 asm. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On 22/03/23 15:04, Peter Zijlstra wrote: > On Wed, Mar 22, 2023 at 12:20:28PM +, Valentin Schneider wrote: >> On 22/03/23 10:53, Peter Zijlstra wrote: > >> > Hurmph... so we only really consume @func when we IPI. Would it not be >> > more useful to trace this thing for *every* csd enqeued? >> >> It's true that any CSD enqueued on that CPU's call_single_queue in the >> [first CSD llist_add()'ed, IPI IRQ hits] timeframe is a potential source of >> interference. >> >> However, can we be sure that first CSD isn't an indirect cause for the >> following ones? say the target CPU exits RCU EQS due to the IPI, there's a >> bit of time before it gets to flush_smp_call_function_queue() where some >> other CSD >> could be enqueued *because* of that change in state. >> >> I couldn't find a easy example of that, I might be biased as this is where >> I'd like to go wrt IPI'ing isolated CPUs in usermode. But regardless, when >> correlating an IPI IRQ with its source, we'd always have to look at the >> first CSD in that CSD stack. > > So I was thinking something like this: > > --- > Subject: trace,smp: Trace all smp_function_call*() invocations > From: Peter Zijlstra > Date: Wed Mar 22 14:58:36 CET 2023 > > (Ab)use the trace_ipi_send_cpu*() family to trace all > smp_function_call*() invocations, not only those that result in an > actual IPI. > > The queued entries log their callback function while the actual IPIs > are traced on generic_smp_call_function_single_interrupt(). > > Signed-off-by: Peter Zijlstra (Intel) > --- > kernel/smp.c | 58 > ++ > 1 file changed, 30 insertions(+), 28 deletions(-) > > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -106,18 +106,20 @@ void __init call_function_init(void) > } > > static __always_inline void > -send_call_function_single_ipi(int cpu, smp_call_func_t func) > +send_call_function_single_ipi(int cpu) > { > if (call_function_single_prep_ipi(cpu)) { > - trace_ipi_send_cpu(cpu, _RET_IP_, func); > + trace_ipi_send_cpu(cpu, _RET_IP_, > +generic_smp_call_function_single_interrupt); Hm, this does get rid of the func being passed down the helpers, but this means the trace events are now stateful, i.e. I need the first and last events in a CSD stack to figure out which one actually caused the IPI. It also requires whoever is looking at the trace to be aware of which IPIs are attached to a CSD, and which ones aren't. ATM that's only the resched IPI, but per the cover letter there's more to come (e.g. tick_broadcast() for arm64/riscv and a few others). For instance: hackbench-157 [001]10.894320: ipi_send_cpu: cpu=3 callsite=check_preempt_curr+0x37 callback=0x0 hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 callsite=try_to_wake_up+0x29e callback=sched_ttwu_pending+0x0 hackbench-157 [001]10.895068: ipi_send_cpu: cpu=3 callsite=try_to_wake_up+0x29e callback=generic_smp_call_function_single_interrupt+0x0 That first one sent a RESCHEDULE IPI, the second one a CALL_FUNCTION one, but you really have to know what you're looking at... Are you worried about the @func being pushed down? Staring at x86 asm is not good for the soul, but AFAICT this does cause an extra register to be popped in the prologue because all of the helpers are __always_inline, so both paths of the static key(s) are in the same stackframe. I can "improve" this with: --- diff --git a/kernel/smp.c b/kernel/smp.c index 5cd680a7e78ef..55f120dae1713 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -511,6 +511,26 @@ raw_smp_call_single_queue(int cpu, struct llist_node *node, smp_call_func_t func static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); +static noinline void __smp_call_single_queue_trace(int cpu, struct llist_node *node) +{ + call_single_data_t *csd; + smp_call_func_t func; + + + /* +* We have to check the type of the CSD before queueing it, because +* once queued it can have its flags cleared by +* flush_smp_call_function_queue() +* even if we haven't sent the smp_call IPI yet (e.g. the stopper +* executes migration_cpu_stop() on the remote CPU). +*/ + csd = container_of(node, call_single_data_t, node.llist); + func = CSD_TYPE(csd) == CSD_TYPE_TTWU ? + sched_ttwu_pending : csd->func; + + raw_smp_call_single_queue(cpu, node, func); +} + void __smp_call_single_queue(int cpu, struct llist_node *node) { #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG @@ -525,25 +545,10 @@ void __smp_call_single_queue(int cpu, struct llist_node *node) } } #endif - /* -* We have to check the type of the CSD before queueing it, because -* once queued it can have its flags cleared by -* flush_smp_call_function_queue() -* even if we haven't sent the smp_call IPI yet (e.g. th
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On Wed, Mar 22, 2023 at 12:20:28PM +, Valentin Schneider wrote: > On 22/03/23 10:53, Peter Zijlstra wrote: > > Hurmph... so we only really consume @func when we IPI. Would it not be > > more useful to trace this thing for *every* csd enqeued? > > It's true that any CSD enqueued on that CPU's call_single_queue in the > [first CSD llist_add()'ed, IPI IRQ hits] timeframe is a potential source of > interference. > > However, can we be sure that first CSD isn't an indirect cause for the > following ones? say the target CPU exits RCU EQS due to the IPI, there's a > bit of time before it gets to flush_smp_call_function_queue() where some > other CSD > could be enqueued *because* of that change in state. > > I couldn't find a easy example of that, I might be biased as this is where > I'd like to go wrt IPI'ing isolated CPUs in usermode. But regardless, when > correlating an IPI IRQ with its source, we'd always have to look at the > first CSD in that CSD stack. So I was thinking something like this: --- Subject: trace,smp: Trace all smp_function_call*() invocations From: Peter Zijlstra Date: Wed Mar 22 14:58:36 CET 2023 (Ab)use the trace_ipi_send_cpu*() family to trace all smp_function_call*() invocations, not only those that result in an actual IPI. The queued entries log their callback function while the actual IPIs are traced on generic_smp_call_function_single_interrupt(). Signed-off-by: Peter Zijlstra (Intel) --- kernel/smp.c | 58 ++ 1 file changed, 30 insertions(+), 28 deletions(-) --- a/kernel/smp.c +++ b/kernel/smp.c @@ -106,18 +106,20 @@ void __init call_function_init(void) } static __always_inline void -send_call_function_single_ipi(int cpu, smp_call_func_t func) +send_call_function_single_ipi(int cpu) { if (call_function_single_prep_ipi(cpu)) { - trace_ipi_send_cpu(cpu, _RET_IP_, func); + trace_ipi_send_cpu(cpu, _RET_IP_, + generic_smp_call_function_single_interrupt); arch_send_call_function_single_ipi(cpu); } } static __always_inline void -send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func) +send_call_function_ipi_mask(const struct cpumask *mask) { - trace_ipi_send_cpumask(mask, _RET_IP_, func); + trace_ipi_send_cpumask(mask, _RET_IP_, + generic_smp_call_function_single_interrupt); arch_send_call_function_ipi_mask(mask); } @@ -318,25 +320,6 @@ static __always_inline void csd_unlock(s smp_store_release(&csd->node.u_flags, 0); } -static __always_inline void -raw_smp_call_single_queue(int cpu, struct llist_node *node, smp_call_func_t func) -{ - /* -* The list addition should be visible to the target CPU when it pops -* the head of the list to pull the entry off it in the IPI handler -* because of normal cache coherency rules implied by the underlying -* llist ops. -* -* If IPIs can go out of order to the cache coherency protocol -* in an architecture, sufficient synchronisation should be added -* to arch code to make it appear to obey cache coherency WRT -* locking and barrier primitives. Generic code isn't really -* equipped to do the right thing... -*/ - if (llist_add(node, &per_cpu(call_single_queue, cpu))) - send_call_function_single_ipi(cpu, func); -} - static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); void __smp_call_single_queue(int cpu, struct llist_node *node) @@ -356,10 +339,23 @@ void __smp_call_single_queue(int cpu, st func = CSD_TYPE(csd) == CSD_TYPE_TTWU ? sched_ttwu_pending : csd->func; - raw_smp_call_single_queue(cpu, node, func); - } else { - raw_smp_call_single_queue(cpu, node, NULL); + trace_ipi_send_cpu(cpu, _RET_IP_, func); } + + /* +* The list addition should be visible to the target CPU when it pops +* the head of the list to pull the entry off it in the IPI handler +* because of normal cache coherency rules implied by the underlying +* llist ops. +* +* If IPIs can go out of order to the cache coherency protocol +* in an architecture, sufficient synchronisation should be added +* to arch code to make it appear to obey cache coherency WRT +* locking and barrier primitives. Generic code isn't really +* equipped to do the right thing... +*/ + if (llist_add(node, &per_cpu(call_single_queue, cpu))) + send_call_function_single_ipi(cpu); } /* @@ -798,14 +794,20 @@ static void smp_call_function_many_cond( } /* +* Trace each smp_function_call_*() as an IPI, actual IPIs +* will be traced with func==generic_smp_call_fu
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On 22/03/23 10:53, Peter Zijlstra wrote: > On Tue, Mar 07, 2023 at 02:35:58PM +, Valentin Schneider wrote: > >> @@ -477,6 +490,25 @@ static __always_inline void csd_unlock(struct >> __call_single_data *csd) >> smp_store_release(&csd->node.u_flags, 0); >> } >> >> +static __always_inline void >> +raw_smp_call_single_queue(int cpu, struct llist_node *node, smp_call_func_t >> func) >> +{ >> +/* >> + * The list addition should be visible to the target CPU when it pops >> + * the head of the list to pull the entry off it in the IPI handler >> + * because of normal cache coherency rules implied by the underlying >> + * llist ops. >> + * >> + * If IPIs can go out of order to the cache coherency protocol >> + * in an architecture, sufficient synchronisation should be added >> + * to arch code to make it appear to obey cache coherency WRT >> + * locking and barrier primitives. Generic code isn't really >> + * equipped to do the right thing... >> + */ >> +if (llist_add(node, &per_cpu(call_single_queue, cpu))) >> +send_call_function_single_ipi(cpu, func); >> +} >> + >> static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); >> >> void __smp_call_single_queue(int cpu, struct llist_node *node) >> @@ -493,21 +525,25 @@ void __smp_call_single_queue(int cpu, struct >> llist_node *node) >> } >> } >> #endif >> /* >> + * We have to check the type of the CSD before queueing it, because >> + * once queued it can have its flags cleared by >> + * flush_smp_call_function_queue() >> + * even if we haven't sent the smp_call IPI yet (e.g. the stopper >> + * executes migration_cpu_stop() on the remote CPU). >> */ >> +if (trace_ipi_send_cpumask_enabled()) { >> +call_single_data_t *csd; >> +smp_call_func_t func; >> + >> +csd = container_of(node, call_single_data_t, node.llist); >> +func = CSD_TYPE(csd) == CSD_TYPE_TTWU ? >> +sched_ttwu_pending : csd->func; >> + >> +raw_smp_call_single_queue(cpu, node, func); >> +} else { >> +raw_smp_call_single_queue(cpu, node, NULL); >> +} >> } > > Hurmph... so we only really consume @func when we IPI. Would it not be > more useful to trace this thing for *every* csd enqeued? It's true that any CSD enqueued on that CPU's call_single_queue in the [first CSD llist_add()'ed, IPI IRQ hits] timeframe is a potential source of interference. However, can we be sure that first CSD isn't an indirect cause for the following ones? say the target CPU exits RCU EQS due to the IPI, there's a bit of time before it gets to flush_smp_call_function_queue() where some other CSD could be enqueued *because* of that change in state. I couldn't find a easy example of that, I might be biased as this is where I'd like to go wrt IPI'ing isolated CPUs in usermode. But regardless, when correlating an IPI IRQ with its source, we'd always have to look at the first CSD in that CSD stack. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 1/7] trace: Add trace_ipi_send_cpumask()
On 22/03/23 11:30, Peter Zijlstra wrote: > On Wed, Mar 22, 2023 at 10:39:55AM +0100, Peter Zijlstra wrote: >> On Tue, Mar 07, 2023 at 02:35:52PM +, Valentin Schneider wrote: >> > +TRACE_EVENT(ipi_send_cpumask, >> > + >> > + TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void >> > *callback), >> > + >> > + TP_ARGS(cpumask, callsite, callback), >> > + >> > + TP_STRUCT__entry( >> > + __cpumask(cpumask) >> > + __field(void *, callsite) >> > + __field(void *, callback) >> > + ), >> > + >> > + TP_fast_assign( >> > + __assign_cpumask(cpumask, cpumask_bits(cpumask)); >> > + __entry->callsite = (void *)callsite; >> > + __entry->callback = callback; >> > + ), >> > + >> > + TP_printk("cpumask=%s callsite=%pS callback=%pS", >> > +__get_cpumask(cpumask), __entry->callsite, __entry->callback) >> > +); >> >> Would it make sense to add a variant like: ipi_send_cpu() that records a >> single cpu instead of a cpumask. A lot of sites seems to do: >> cpumask_of(cpu) for that first argument, and it seems to me it is quite >> daft to have to memcpy a full multi-word cpumask in those cases. >> >> Remember, nr_possible_cpus > 64 is quite common these days. > > Something we litte bit like so... > I was wondering whether we could stick with a single trace event, but let ftrace be aware of weight=1 vs weight>1 cpumasks. For weight>1, it would memcpy() as usual, for weight=1, it could write a pointer to a cpu_bit_bitmap[] equivalent embedded in the trace itself. Unfortunately, Ftrace bitmasks are represented as a u32 made of two 16 bit values: [offset in event record, size], so there isn't a straightforward way to point to a "reusable" cpumask. AFAICT the only alternative would be to do that via a different trace event, but then we should just go with a plain old uint - i.e. do what you're doing here, so: Tested-and-reviewed-by: Valentin Schneider (with the tiny typo fix below) > @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise, > TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), > __entry->reason) > ); > > +TRACE_EVENT(ipi_send_cpu, > + > + TP_PROTO(const unsigned int cpu, unsigned long callsite, void > *callback), > + > + TP_ARGS(cpu, callsite, callback), > + > + TP_STRUCT__entry( > + __field(unsigned int, cpu) > + __field(void *, callsite) > + __field(void *, callback) > + ), > + > + TP_fast_assign( > + __entry->cpu = cpu; > + __entry->callsite = (void *)callsite; > + __entry->callback = callback; > + ), > + > + TP_printk("cpu=%s callsite=%pS callback=%pS", ^ s/s/u/ > + __entry->cpu, __entry->callsite, __entry->callback) > +); > + ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 1/7] trace: Add trace_ipi_send_cpumask()
On Wed, Mar 22, 2023 at 10:39:55AM +0100, Peter Zijlstra wrote: > On Tue, Mar 07, 2023 at 02:35:52PM +, Valentin Schneider wrote: > > trace_ipi_raise() is unsuitable for generically tracing IPI sources due to > > its "reason" argument being an uninformative string (on arm64 all you get > > is "Function call interrupts" for SMP calls). > > > > Add a variant of it that exports a target cpumask, a callsite and a > > callback. > > > > Signed-off-by: Valentin Schneider > > Reviewed-by: Steven Rostedt (Google) > > --- > > include/trace/events/ipi.h | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h > > index 0be71dad6ec03..b1125dc27682c 100644 > > --- a/include/trace/events/ipi.h > > +++ b/include/trace/events/ipi.h > > @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise, > > TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), > > __entry->reason) > > ); > > > > +TRACE_EVENT(ipi_send_cpumask, > > + > > + TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void > > *callback), > > + > > + TP_ARGS(cpumask, callsite, callback), > > + > > + TP_STRUCT__entry( > > + __cpumask(cpumask) > > + __field(void *, callsite) > > + __field(void *, callback) > > + ), > > + > > + TP_fast_assign( > > + __assign_cpumask(cpumask, cpumask_bits(cpumask)); > > + __entry->callsite = (void *)callsite; > > + __entry->callback = callback; > > + ), > > + > > + TP_printk("cpumask=%s callsite=%pS callback=%pS", > > + __get_cpumask(cpumask), __entry->callsite, __entry->callback) > > +); > > Would it make sense to add a variant like: ipi_send_cpu() that records a > single cpu instead of a cpumask. A lot of sites seems to do: > cpumask_of(cpu) for that first argument, and it seems to me it is quite > daft to have to memcpy a full multi-word cpumask in those cases. > > Remember, nr_possible_cpus > 64 is quite common these days. Something we litte bit like so... --- Subject: trace: Add trace_ipi_send_cpu() From: Peter Zijlstra Date: Wed Mar 22 11:28:36 CET 2023 Because copying cpumasks around when targeting a single CPU is a bit daft... Signed-off-by: Peter Zijlstra (Intel) --- include/linux/smp.h|6 +++--- include/trace/events/ipi.h | 22 ++ kernel/irq_work.c |6 ++ kernel/smp.c |4 ++-- 4 files changed, 29 insertions(+), 9 deletions(-) --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -130,9 +130,9 @@ extern void arch_smp_send_reschedule(int * scheduler_ipi() is inline so can't be passed as callback reason, but the * callsite IP should be sufficient for root-causing IPIs sent from here. */ -#define smp_send_reschedule(cpu) ({ \ - trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL); \ - arch_smp_send_reschedule(cpu);\ +#define smp_send_reschedule(cpu) ({ \ + trace_ipi_send_cpu(cpu, _RET_IP_, NULL); \ + arch_smp_send_reschedule(cpu);\ }) /* --- a/include/trace/events/ipi.h +++ b/include/trace/events/ipi.h @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise, TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), __entry->reason) ); +TRACE_EVENT(ipi_send_cpu, + + TP_PROTO(const unsigned int cpu, unsigned long callsite, void *callback), + + TP_ARGS(cpu, callsite, callback), + + TP_STRUCT__entry( + __field(unsigned int, cpu) + __field(void *, callsite) + __field(void *, callback) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->callsite = (void *)callsite; + __entry->callback = callback; + ), + + TP_printk("cpu=%s callsite=%pS callback=%pS", + __entry->cpu, __entry->callsite, __entry->callback) +); + TRACE_EVENT(ipi_send_cpumask, TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void *callback), --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -78,10 +78,8 @@ void __weak arch_irq_work_raise(void) static __always_inline void irq_work_raise(struct irq_work *work) { - if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt()) - trace_ipi_send_cpumask(cpumask_of(smp_processor_id()), - _RET_IP_, - work->func); + if (trace_ipi_send_cpu_enabled() && arch_irq_work_has_interrupt()) + trace_ipi_send_cpu(smp_processor_id(), _RET_IP_, work->func); arch_irq_work_raise(); } --- a/kernel/smp.c +++ b/kernel/smp.c @@ -109,7 +109,7 @@ static __always_inline void send_call_function_single_ipi(int cpu, smp_call_func_t func) { if (call_function_single_prep_ipi(cpu)) { - trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, f
Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI
On Tue, Mar 07, 2023 at 02:35:58PM +, Valentin Schneider wrote: > @@ -477,6 +490,25 @@ static __always_inline void csd_unlock(struct > __call_single_data *csd) > smp_store_release(&csd->node.u_flags, 0); > } > > +static __always_inline void > +raw_smp_call_single_queue(int cpu, struct llist_node *node, smp_call_func_t > func) > +{ > + /* > + * The list addition should be visible to the target CPU when it pops > + * the head of the list to pull the entry off it in the IPI handler > + * because of normal cache coherency rules implied by the underlying > + * llist ops. > + * > + * If IPIs can go out of order to the cache coherency protocol > + * in an architecture, sufficient synchronisation should be added > + * to arch code to make it appear to obey cache coherency WRT > + * locking and barrier primitives. Generic code isn't really > + * equipped to do the right thing... > + */ > + if (llist_add(node, &per_cpu(call_single_queue, cpu))) > + send_call_function_single_ipi(cpu, func); > +} > + > static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); > > void __smp_call_single_queue(int cpu, struct llist_node *node) > @@ -493,21 +525,25 @@ void __smp_call_single_queue(int cpu, struct llist_node > *node) > } > } > #endif > /* > + * We have to check the type of the CSD before queueing it, because > + * once queued it can have its flags cleared by > + * flush_smp_call_function_queue() > + * even if we haven't sent the smp_call IPI yet (e.g. the stopper > + * executes migration_cpu_stop() on the remote CPU). >*/ > + if (trace_ipi_send_cpumask_enabled()) { > + call_single_data_t *csd; > + smp_call_func_t func; > + > + csd = container_of(node, call_single_data_t, node.llist); > + func = CSD_TYPE(csd) == CSD_TYPE_TTWU ? > + sched_ttwu_pending : csd->func; > + > + raw_smp_call_single_queue(cpu, node, func); > + } else { > + raw_smp_call_single_queue(cpu, node, NULL); > + } > } Hurmph... so we only really consume @func when we IPI. Would it not be more useful to trace this thing for *every* csd enqeued? ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 1/7] trace: Add trace_ipi_send_cpumask()
On Tue, Mar 07, 2023 at 02:35:52PM +, Valentin Schneider wrote: > trace_ipi_raise() is unsuitable for generically tracing IPI sources due to > its "reason" argument being an uninformative string (on arm64 all you get > is "Function call interrupts" for SMP calls). > > Add a variant of it that exports a target cpumask, a callsite and a callback. > > Signed-off-by: Valentin Schneider > Reviewed-by: Steven Rostedt (Google) > --- > include/trace/events/ipi.h | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h > index 0be71dad6ec03..b1125dc27682c 100644 > --- a/include/trace/events/ipi.h > +++ b/include/trace/events/ipi.h > @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise, > TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), > __entry->reason) > ); > > +TRACE_EVENT(ipi_send_cpumask, > + > + TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void > *callback), > + > + TP_ARGS(cpumask, callsite, callback), > + > + TP_STRUCT__entry( > + __cpumask(cpumask) > + __field(void *, callsite) > + __field(void *, callback) > + ), > + > + TP_fast_assign( > + __assign_cpumask(cpumask, cpumask_bits(cpumask)); > + __entry->callsite = (void *)callsite; > + __entry->callback = callback; > + ), > + > + TP_printk("cpumask=%s callsite=%pS callback=%pS", > + __get_cpumask(cpumask), __entry->callsite, __entry->callback) > +); Would it make sense to add a variant like: ipi_send_cpu() that records a single cpu instead of a cpumask. A lot of sites seems to do: cpumask_of(cpu) for that first argument, and it seems to me it is quite daft to have to memcpy a full multi-word cpumask in those cases. Remember, nr_possible_cpus > 64 is quite common these days. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc