On 11/24/2020 12:42 AM, Madhavan Srinivasan wrote:
On 11/24/20 10:21 AM, Namhyung Kim wrote:
Hello,
On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <m...@ellerman.id.au>
wrote:
Namhyung Kim <namhy...@kernel.org> writes:
Hi Peter and Kan,
(Adding PPC folks)
On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhy...@kernel.org>
wrote:
Hello,
On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan
<kan.li...@linux.intel.com> wrote:
On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
- When the large PEBS was introduced (9c964efa4330), the
sched_task() should
be invoked to flush the PEBS buffer in each context switch.
However, The
perf_sched_events in account_event() is not updated accordingly.
The
perf_event_task_sched_* never be invoked for a pure per-CPU
context. Only
per-task event works.
At that time, the perf_pmu_sched_task() is outside of
perf_event_context_sched_in/out. It means that perf has to double
perf_pmu_disable() for per-task event.
- The patch 1 tries to fix broken per-CPU events. The CPU
context cannot be
retrieved from the task->perf_event_ctxp. So it has to be
tracked in the
sched_cb_list. Yes, the code is very similar to the original
codes, but it
is actually the new code for per-CPU events. The optimization
for per-task
events is still kept.
For the case, which has both a CPU context and a task
context, yes, the
__perf_pmu_sched_task() in this patch is not invoked. Because the
sched_task() only need to be invoked once in a context switch. The
sched_task() will be eventually invoked in the task context.
The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB
and
only set that for large pebs. Are you sure the other users (Intel
LBR
and PowerPC BHRB) don't need it?
I didn't set it for LBR, because the perf_sched_events is always
enabled
for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
for LBR.
if (has_branch_stack(event))
inc = true;
If they indeed do not require the pmu::sched_task() callback for CPU
events, then I still think the whole perf_sched_cb_{inc,dec}()
interface
No, LBR requires the pmu::sched_task() callback for CPU events.
Now, The LBR registers have to be reset in sched in even for CPU
events.
To fix the shorter LBR callstack issue for CPU events, we also
need to
save/restore LBRs in pmu::sched_task().
https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.li...@linux.intel.com/
is confusing at best.
Can't we do something like this instead?
I think the below patch may have two issues.
- PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as
well) now.
- We may disable the large PEBS later if not all PEBS events support
large PEBS. The PMU need a way to notify the generic code to decrease
the nr_sched_task.
Any updates on this? I've reviewed and tested Kan's patches
and they all look good.
Maybe we can talk to PPC folks to confirm the BHRB case?
Can we move this forward? I saw patch 3/3 also adds
PERF_ATTACH_SCHED_CB
for PowerPC too. But it'd be nice if ppc folks can confirm the change.
Sorry I've read the whole thread, but I'm still not entirely sure I
understand the question.
Thanks for your time and sorry about not being clear enough.
We found per-cpu events are not calling pmu::sched_task()
on context switches. So PERF_ATTACH_SCHED_CB was
added to indicate the core logic that it needs to invoke the
callback.
The patch 3/3 added the flag to PPC (for BHRB) with other
changes (I think it should be split like in the patch 2/3) and
want to get ACKs from the PPC folks.
Sorry for delay.
I guess first it will be better to split the ppc change to a separate
patch,
Both PPC and X86 invokes the perf_sched_cb_inc() directly. The patch
changes the parameters of the perf_sched_cb_inc(). I think we have to
update the PPC and X86 codes together. Otherwise, there will be a
compile error, if someone may only applies the change for the
perf_sched_cb_inc() but forget to applies the changes in PPC or X86
specific codes.
secondly, we are missing the changes needed in the power_pmu_bhrb_disable()
where perf_sched_cb_dec() needs the "state" to be included.
Ah, right. The below patch should fix the issue.
diff --git a/arch/powerpc/perf/core-book3s.c
b/arch/powerpc/perf/core-book3s.c
index bced502f64a1..6756d1602a67 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -391,13 +391,18 @@ static void power_pmu_bhrb_enable(struct
perf_event *event)
static void power_pmu_bhrb_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+ int state = PERF_SCHED_CB_SW_IN;
if (!ppmu->bhrb_nr)
return;
WARN_ON_ONCE(!cpuhw->bhrb_users);
cpuhw->bhrb_users--;
- perf_sched_cb_dec(event->ctx->pmu);
+
+ if (!(event->attach_state & PERF_ATTACH_TASK))
+ state |= PERF_SCHED_CB_CPU;
+
+ perf_sched_cb_dec(event->ctx->pmu, state);
if (!cpuhw->disabled && !cpuhw->bhrb_users) {
/* BHRB cannot be turned off when other
Thanks,
Kan