On Mon, Feb 01, 2021 at 07:40:45PM +0200, James Clark wrote: > > On 31/01/2021 14:01, Leo Yan wrote: > > Option 1: by merging patches 07/08 and 08/08, we can firstly support PID > > tracing for root namespace, and later we can extend to support PID > > tracing in container (and in VMs). > > > > Option 2: we can use the software method to establish PID for SPE > > trace, which can base on kernel's events PERF_RECORD_SWITCH / > > PERF_RECORD_SWITCH_CPU_WIDE and check context switch ip. > > > > To be honest, I am a bit concern for option 1 for later might > > introduce regression when later support PID for containers (and VMs). > > If you have a plan for option 1, I think it's good to record current > > limitation and the plan for next step in the commit log, so we can merge > > this patch at this time and later extend for containers. > > > > Otherwise, we need to consider how to implement the PID tracing with > > option 2. If it is the case, we should firstly only merge patches > > 01 ~ 06 for data source enabling. How about you think for this? > > In my opinion we should do option 1 and use what is there at the moment. That > gets users 90% of the functionality right now. > > I plan to look at option 2 at some point, and it can always be added on top of > option 1 or replace what is there. But I don't know when I would get to it or > how long it will take.
Firstly, sorry for long replying. Have offline discussion with James and I took time to look into Intel PT implementation for PID tracing. AFAICT, for tracing root namespace, the option 1 with using CONTEXTIDR is the most reliable way for Arm SPE; at beginning I thought option 2 is better choice for Arm SPE, but after went through Intel PT's code, I think Arm SPE cannot achieve the same result with Intel PT, this is caused by its "statistical" character. Let me explain why Arm SPE has problem with the option 2. If we want to enable option 2 by using perf context switch events and switch_ip approach, it uses below logic: Step1: when event PERF_RECORD_SWITCH or PERF_RECORD_SWITCH_CPU_WIDE is coming, invokes below functions. So it tells the "machine" context that the process is switched to new one; at this step, it simply caches the new PID/TID into the "machine" context. But the samples doesn't really set the new value. intel_pt_context_switch() `> machine__set_current_tid() Step2: when detect the branch instruction's target address equals to the address of symbol "__switch_to", this means the the CPU really switches context to the next process in the low level code, afterwards it will retrieve the cached TID/PID from the "machine" context and set the correct PID for "ptq->pid" (see intel_pt_sample_set_pid_tid_cpu()), then "ptq->tid" is used for synthesizing samples. Arm SPE has the problem for step2, due to the trace uses statistical approach, it doesn't trace the complete branch instructions, so it cannot promise to capture all branches for the symbol "__switch_to". If we only use the events PERF_RECORD_SWITCH / PERF_RECORD_SWITCH_CPU_WIDE, then it will lead to the coarse result for PID tracing. For this reason, seems to me it's pragmatic to use CONTEXTIDR for PID tracing at current stage, at least it can allow the root domain tracing works accurately. But this will leave the issue for tracing PID in non root namespace, we need to figure out solution later. Hi Mark.R, Al, do you have any comments for this? Thanks, Leo > >> Signed-off-by: Leo Yan <leo....@linaro.org> > >> Signed-off-by: James Clark <james.cl...@arm.com> > > > > Besides for techinical question, you could add your "Co-developed-by" > > tags for patches 06, 07, 08/08, which you have took time to refin them. > > > > Thanks you for kindly efforts. > > > > [1] https://lore.kernel.org/patchwork/patch/1353286/ > > > >> Cc: Peter Zijlstra <pet...@infradead.org> > >> Cc: Ingo Molnar <mi...@redhat.com> > >> Cc: Arnaldo Carvalho de Melo <a...@kernel.org> > >> Cc: Mark Rutland <mark.rutl...@arm.com> > >> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> > >> Cc: Jiri Olsa <jo...@redhat.com> > >> Cc: Namhyung Kim <namhy...@kernel.org> > >> Cc: John Garry <john.ga...@huawei.com> > >> Cc: Will Deacon <w...@kernel.org> > >> Cc: Mathieu Poirier <mathieu.poir...@linaro.org> > >> Cc: Al Grant <al.gr...@arm.com> > >> Cc: Andre Przywara <andre.przyw...@arm.com> > >> Cc: Wei Li <liwei...@huawei.com> > >> Cc: Tan Xiaojun <tanxiao...@huawei.com> > >> Cc: Adrian Hunter <adrian.hun...@intel.com> > >> --- > >> tools/perf/util/arm-spe.c | 75 ++++++++++++++++++++++++++------------- > >> 1 file changed, 50 insertions(+), 25 deletions(-) > >> > >> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > >> index 27a0b9dfe22d..9828fad7e516 100644 > >> --- a/tools/perf/util/arm-spe.c > >> +++ b/tools/perf/util/arm-spe.c > >> @@ -223,6 +223,46 @@ static inline u8 arm_spe_cpumode(struct arm_spe *spe, > >> u64 ip) > >> PERF_RECORD_MISC_USER; > >> } > >> > >> +static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, > >> + struct auxtrace_queue *queue) > >> +{ > >> + struct arm_spe_queue *speq = queue->priv; > >> + pid_t tid; > >> + > >> + tid = machine__get_current_tid(spe->machine, speq->cpu); > >> + if (tid != -1) { > >> + speq->tid = tid; > >> + thread__zput(speq->thread); > >> + } else > >> + speq->tid = queue->tid; > >> + > >> + if ((!speq->thread) && (speq->tid != -1)) { > >> + speq->thread = machine__find_thread(spe->machine, -1, > >> + speq->tid); > >> + } > >> + > >> + if (speq->thread) { > >> + speq->pid = speq->thread->pid_; > >> + if (queue->cpu == -1) > >> + speq->cpu = speq->thread->cpu; > >> + } > >> +} > >> + > >> +static int arm_spe_set_tid(struct arm_spe_queue *speq, pid_t tid) > >> +{ > >> + int err; > >> + struct arm_spe *spe = speq->spe; > >> + struct auxtrace_queue *queue; > >> + > >> + err = machine__set_current_tid(spe->machine, speq->cpu, tid, tid); > >> + if (err) > >> + return err; > >> + > >> + queue = &speq->spe->queues.queue_array[speq->queue_nr]; > >> + arm_spe_set_pid_tid_cpu(speq->spe, queue); > >> + return 0; > >> +} > >> + > >> static void arm_spe_prep_sample(struct arm_spe *spe, > >> struct arm_spe_queue *speq, > >> union perf_event *event, > >> @@ -431,6 +471,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq) > >> static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp) > >> { > >> struct arm_spe *spe = speq->spe; > >> + const struct arm_spe_record *record; > >> int ret; > >> > >> if (!spe->kernel_start) > >> @@ -450,6 +491,11 @@ static int arm_spe_run_decoder(struct arm_spe_queue > >> *speq, u64 *timestamp) > >> if (ret < 0) > >> continue; > >> > >> + record = &speq->decoder->record; > >> + ret = arm_spe_set_tid(speq, record->context_id); > >> + if (ret) > >> + return ret; > >> + > >> ret = arm_spe_sample(speq); > >> if (ret) > >> return ret; > >> @@ -500,6 +546,10 @@ static int arm_spe__setup_queue(struct arm_spe *spe, > >> > >> record = &speq->decoder->record; > >> > >> + ret = arm_spe_set_tid(speq, record->context_id); > >> + if (ret) > >> + return ret; > >> + > >> speq->timestamp = record->timestamp; > >> ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp); > >> if (ret) > >> @@ -552,31 +602,6 @@ static bool arm_spe__is_timeless_decoding(struct > >> arm_spe *spe) > >> return timeless_decoding; > >> } > >> > >> -static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, > >> - struct auxtrace_queue *queue) > >> -{ > >> - struct arm_spe_queue *speq = queue->priv; > >> - pid_t tid; > >> - > >> - tid = machine__get_current_tid(spe->machine, speq->cpu); > >> - if (tid != -1) { > >> - speq->tid = tid; > >> - thread__zput(speq->thread); > >> - } else > >> - speq->tid = queue->tid; > >> - > >> - if ((!speq->thread) && (speq->tid != -1)) { > >> - speq->thread = machine__find_thread(spe->machine, -1, > >> - speq->tid); > >> - } > >> - > >> - if (speq->thread) { > >> - speq->pid = speq->thread->pid_; > >> - if (queue->cpu == -1) > >> - speq->cpu = speq->thread->cpu; > >> - } > >> -} > >> - > >> static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp) > >> { > >> unsigned int queue_nr; > >> -- > >> 2.28.0 > >>