On Tue, 4 Dec 2018 at 23:26, <leo....@linaro.org> wrote: > > On Mon, Nov 19, 2018 at 03:26:17PM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote: > > > The perf sample data contains flags to indicate the hardware trace data > > > is belonging to which type branch instruction, thus this can be used to > > > print out the human readable string. Arm CoreSight ETM sample data is > > > missed to set flags and it is always set to zeros, this results in perf > > > tool skips to print string for instruction types. > > > > > > Arm CoreSight ETM supports different kinds instruction of A64, A32 and > > > T32; this patch is to set branch instruction flags in packet for these > > > ISAs. > > > > > > The brief idea for patch implementation is describe as below: > > > > > > - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace > > > beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or > > > OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set > > > for trace end; > > > > > > As Mike suggested the packet stream might have more than one two > > > TRACE_ON packets, the first one TRACE_ON packet indicates trace end > > > and the second one is taken as trace restarting. We will handle this > > > special case in the upper layer with packet queue handling, which has > > > more context so it's more suitable fix up for it. This will be > > > accomplished in the sequential patch. > > > > > > - For instruction range packet, mainly base on three factors to decide > > > the branch instruction types: > > > > > > elem->last_i_type > > > elem->last_i_subtype > > > elem->last_instr_cond > > > > > > If the instruction is immediate branch but without link and return > > > flag, we consider it as function internal branch; in fact the > > > immediate branch also can be used to invoke the function entry, > > > usually this is only used in assembly code to directly call a symbol > > > and don't expect to return back; after reviewing kernel normal > > > functions and user space programs, both of them are very seldom to use > > > immediate branch for function call. On the other hand, if we want to > > > decide the immediate branch is for function branch jumping or for > > > function calling, we need to rely on the start address of next packet > > > and check the symbol offset for the start address, this will > > > introduce much complexity in the implementation. So for this version > > > we simply consider immediate branch as function internal branch. > > > Moreover, we rely on 'elem->last_instr_cond' to decide if the branch > > > instruction is a conditional branch or not. > > > > > > If the instruction is immediate branch with link, it's instruction > > > 'BL' and which is used for function call. > > > > > > If the instruction is indirect branch and with subtype > > > OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function > > > return for below cases related with A32/T32 instruction; set this > > > branch flag as function return (Thanks for Al's suggestion). > > > > > > BX R14 > > > MOV PC, LR > > > POP {…, PC} > > > LDR PC, [SP], #offset > > > > > > If the instruction is indirect branch without link, this is > > > corresponding to instruction 'BR', this instruction usually is used > > > for dynamic link lib with below usage; so we think it's a return > > > instruction. > > > > > > 0000000000000680 <.plt>: > > > 680: a9bf7bf0 stp x16, x30, [sp, #-16]! > > > 684: 90000090 adrp x16, 10000 <__FRAME_END__+0xf630> > > > 688: f947fe11 ldr x17, [x16, #4088] > > > 68c: 913fe210 add x16, x16, #0xff8 > > > 690: d61f0220 br x17 > > > > > > If the instruction is indirect branch with link, e.g BLR, we think > > > it's a function call. > > > > > > For function return, ARMv8 introduces a dedicated instruction 'ret', > > > which has flag of OCSD_S_INSTR_V8_RET. > > > > > > - For exception packets, this patch divides into three types: > > > > > > The first type of exception is caused by external logics like bus, > > > interrupt controller, debug module or PE reset or halt; this is > > > corresponding to flags "bcyi" which defined in doc perf-script.txt; > > > > > > The second type is for system call, this is set as "bcs" by following > > > definition in the doc; > > > > > > The third type is for CPU trap, data and instruction prefetch abort, > > > alignment abort; usually these exceptions are synchronous for CPU, so > > > set them as "bci" type. > > > > This is too long and needs to be broken down into pieces. I would split > > this > > patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one > > for > > ELEM_EXCEPTION/ELEM_EXCEPTION_RET. > > This makes sense, will split to 3 patches. > > > > > > > Cc: Mathieu Poirier <mathieu.poir...@linaro.org> > > > Cc: Mike Leach <mike.le...@linaro.org> > > > Cc: Robert Walker <robert.wal...@arm.com> > > > Cc: Al Grant <al.gr...@arm.com> > > > Cc: Andi Kleen <a...@firstfloor.org> > > > Cc: Adrian Hunter <adrian.hun...@intel.com> > > > Cc: Arnaldo Carvalho de Melo <a...@redhat.com> > > > Signed-off-by: Leo Yan <leo....@linaro.org> > > > --- > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 > > > ++++++++++++++++++++++++ > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 + > > > 2 files changed, 169 insertions(+) > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > index d1a6cbc..0e50c52 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > @@ -303,6 +303,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder > > > *decoder, > > > decoder->packet_buffer[et].instr_count = 0; > > > decoder->packet_buffer[et].last_instr_taken_branch = false; > > > decoder->packet_buffer[et].last_instr_size = 0; > > > + decoder->packet_buffer[et].flags = 0; > > > > Since PERF_IP_FLAG_BRANCH is '0', I would set this to UNINT32_MAX. > > PERF_IP_FLAG_BRANCH is bit 0 is set (so it's 1) but not 0. If > initialize value to UNINT32_MAX (0xFFFF,FFFF) that means all flags has > been set and this will introduce confusion. So will keep to init > flags to 0.
I just looked at this again - you are correct. > > > > > > > if (decoder->packet_count == MAX_BUFFER - 1) > > > return OCSD_RESP_WAIT; > > > @@ -437,6 +438,171 @@ cs_etm_decoder__buffer_exception_ret(struct > > > cs_etm_decoder *decoder, > > > CS_ETM_EXCEPTION_RET); > > > } > > > > > > +static void cs_etm_decoder__set_sample_flags( > > > + const void *context, > > > + const ocsd_generic_trace_elem *elem) > > > +{ > > > + struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; > > > + struct cs_etm_packet *packet; > > > + u32 exc_num; > > > + > > > + packet = &decoder->packet_buffer[decoder->tail]; > > > + > > > + switch (elem->elem_type) { > > > + case OCSD_GEN_TRC_ELEM_TRACE_ON: > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_TRACE_BEGIN; > > > + break; > > > + > > > + case OCSD_GEN_TRC_ELEM_NO_SYNC: > > > + case OCSD_GEN_TRC_ELEM_EO_TRACE: > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_TRACE_END; > > > + break; > > > + > > > + case OCSD_GEN_TRC_ELEM_INSTR_RANGE: > > > + /* > > > + * Immediate branch instruction without neither link nor > > > + * return flag, it's normal branch instruction within > > > + * the function. > > > + */ > > > + if (elem->last_i_type == OCSD_INSTR_BR && > > > + elem->last_i_subtype == OCSD_S_INSTR_NONE) { > > > + packet->flags = PERF_IP_FLAG_BRANCH; > > > + > > > + if (elem->last_instr_cond) > > > + packet->flags |= PERF_IP_FLAG_CONDITIONAL; > > > + } > > > + > > > + /* > > > + * Immediate branch instruction with link (e.g. BL), this is > > > + * branch instruction for function call. > > > + */ > > > + if (elem->last_i_type == OCSD_INSTR_BR && > > > + elem->last_i_subtype == OCSD_S_INSTR_BR_LINK) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_CALL; > > > + > > > + /* > > > + * Indirect branch instruction with subtype of > > > + * OCSD_S_INSTR_V7_IMPLIED_RET, this is explicit hint for > > > + * function return for A32/T32. > > > + */ > > > + if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT && > > > + elem->last_i_subtype == OCSD_S_INSTR_V7_IMPLIED_RET) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_RETURN; > > > + > > > + /* > > > + * Indirect branch instruction without link (e.g. BR), usually > > > + * this is used for function return, especially for functions > > > + * within dynamic link lib. > > > + */ > > > + if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT && > > > + elem->last_i_subtype == OCSD_S_INSTR_NONE) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_RETURN; > > > + > > > + /* > > > + * Indirect branch instruction with link (e.g. BLR), this is > > > + * branch instruction for function call. > > > + */ > > > + if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT && > > > + elem->last_i_subtype == OCSD_S_INSTR_BR_LINK) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_CALL; > > > + > > > + /* Return instruction for function return. */ > > > + if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT && > > > + elem->last_i_subtype == OCSD_S_INSTR_V8_RET) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_RETURN; > > > > I would swap the last to if() condition so that the (BRANCH | RETURN) flags > > are all at the same place. > > Will follow this in new patch. > > > > + > > > + break; > > > + > > > + case OCSD_GEN_TRC_ELEM_EXCEPTION: > > > + > > > +#define OCSD_EXC_RESET 0 > > > +#define OCSD_EXC_DEBUG_HALT 1 > > > +#define OCSD_EXC_CALL 2 > > > +#define OCSD_EXC_TRAP 3 > > > +#define OCSD_EXC_SYSTEM_ERROR 4 > > > +#define OCSD_EXC_INST_DEBUG 6 > > > +#define OCSD_EXC_DATA_DEBUG 7 > > > +#define OCSD_EXC_ALIGNMENT 10 > > > +#define OCSD_EXC_INST_FAULT 11 > > > +#define OCSD_EXC_DATA_FAULT 12 > > > +#define OCSD_EXC_IRQ 14 > > > +#define OCSD_EXC_FIQ 15 > > > > Where did you get the above? To me this is something that should come from > > the > > library. > > The concept is coming from OpenCSD lib but OpenCSD doesn't define > macros for them. > > Will polish this and add these macros into OpenCSD header file. > > > > + > > > + exc_num = decoder->exc_num[packet->cpu]; > > > + > > > + /* > > > + * The exceptions are triggered by external signals > > > + * from bus, interrupt controller, debug module, > > > + * PE reset or halt. > > > + */ > > > + if (exc_num == OCSD_EXC_RESET || > > > + exc_num == OCSD_EXC_DEBUG_HALT || > > > + exc_num == OCSD_EXC_SYSTEM_ERROR || > > > + exc_num == OCSD_EXC_INST_DEBUG || > > > + exc_num == OCSD_EXC_DATA_DEBUG || > > > + exc_num == OCSD_EXC_IRQ || > > > + exc_num == OCSD_EXC_FIQ) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_CALL | > > > + PERF_IP_FLAG_ASYNC | > > > + PERF_IP_FLAG_INTERRUPT; > > > + > > > + /* The exception is for system call. */ > > > + if (exc_num == OCSD_EXC_CALL) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_CALL | > > > + PERF_IP_FLAG_SYSCALLRET; > > > + > > > + /* > > > + * The exception is introduced by trap, instruction & > > > + * data fault or alignment errors. > > > + */ > > > + if (exc_num == OCSD_EXC_TRAP || > > > + exc_num == OCSD_EXC_ALIGNMENT || > > > + exc_num == OCSD_EXC_INST_FAULT || > > > + exc_num == OCSD_EXC_DATA_FAULT) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_CALL | > > > + PERF_IP_FLAG_INTERRUPT; > > > + > > > + break; > > > + > > > + case OCSD_GEN_TRC_ELEM_EXCEPTION_RET: > > > + > > > + exc_num = decoder->exc_num[packet->cpu]; > > > + > > > + if (exc_num == OCSD_EXC_CALL) > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_RETURN | > > > + PERF_IP_FLAG_SYSCALLRET; > > > + else > > > + packet->flags = PERF_IP_FLAG_BRANCH | > > > + PERF_IP_FLAG_RETURN | > > > + PERF_IP_FLAG_INTERRUPT; > > > + > > > + break; > > > + > > > + case OCSD_GEN_TRC_ELEM_UNKNOWN: > > > + case OCSD_GEN_TRC_ELEM_PE_CONTEXT: > > > + case OCSD_GEN_TRC_ELEM_ADDR_NACC: > > > + case OCSD_GEN_TRC_ELEM_TIMESTAMP: > > > + case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: > > > + case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: > > > + case OCSD_GEN_TRC_ELEM_EVENT: > > > + case OCSD_GEN_TRC_ELEM_SWTRACE: > > > + case OCSD_GEN_TRC_ELEM_CUSTOM: > > > + default: > > > + break; > > > + } > > > +} > > > + > > > static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( > > > const void *context, > > > const ocsd_trc_index_t indx __maybe_unused, > > > @@ -484,6 +650,8 @@ static ocsd_datapath_resp_t > > > cs_etm_decoder__gen_trace_elem_printer( > > > break; > > > } > > > > > > + cs_etm_decoder__set_sample_flags(context, elem); > > > + > > > > I was toying with the idea of setting the flags in each of the case > > statement > > found in cs_etm_decoder__gen_trace_elem_printer(). But that would move more > > code around and the end result would be the same so let's keep it that way > > until > > we have a good reason to split it. > > Do you sugguest to keep current implementation rather than to > split flags setting in each of the case statement in > cs_etm_decoder__gen_trace_elem_printer()? > > I am not 100% sure if I understand correctly for "split it" (split flags > setting vs split functions). So please correct me if I misunderstand > this. I find function cs_etm_decoder__set_sample_flags() overly long. Since the case statements in it are the same as the ones in cs_etm_decoder__gen_trace_elem_printer() a different way to proceed would be to do flag setting there rather than all in cs_etm_decoder__set_sample_flags(). But that would introduce more code modification and tighter coupling. Since I don't have another alternative I am suggesting to keep the current implementation. > > Thanks, > Leo Yan > > > > return resp; > > > } > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h > > > index 0d1c18d..71df908 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h > > > @@ -47,6 +47,7 @@ struct cs_etm_packet { > > > u8 last_instr_taken_branch; > > > u8 last_instr_size; > > > int cpu; > > > + u32 flags; > > > }; > > > > > > struct cs_etm_queue; > > > -- > > > 2.7.4 > > >