On 6/12/18 7:01 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, Nov 19, 2018 at 09:06:17PM -0800, Andi Kleen escreveu: >> From: Andi Kleen <a...@linux.intel.com> >> >> This is a fix for another instance of the skid problem Milian >> recently found [1] > > Milian, have you tested this? > > Adrian, can I have your Reviewed-by or Acked-by?
Acked-by: Adrian Hunter <adrian.hun...@intel.com> > > Thanks, > > - Arnaldo > >> The LBRs don't freeze at the exact same time as the PMI is triggered. >> The perf script brstackinsn code that dumps LBR assembler >> assumes that the last branch in the LBR leads to the sample point. >> But with skid it's possible that the CPU executes one or more branches >> before the sample, but which do not appear in the LBR. >> >> What happens then is either that the sample point is before >> the last LBR branch. In this case the dumper sees a negative >> length and ignores it. Or it the sample point is long after >> the last branch. Then the dumper sees a very long block and dumps >> it upto its block limit (16k bytes), which is noise in the output. >> >> On typical sample session this can happen regularly. >> >> This patch tries to detect and handle the situation. On the last >> block that is dumped by the LBR dumper we always stop on the first >> branch. If the block length is negative just scan forward to the >> first branch. Otherwise scan until a branch is found. >> >> The PT decoder already has a function that uses the instruction >> decoder to detect branches, so we can just reuse it here. >> >> Then when a terminating branch is found print an indication >> and stop dumping. This might miss a few instructions, but at least >> shows no runaway blocks. >> >> Cc: milian.wo...@kdab.com >> Cc: adrian.hun...@intel.com >> Signed-off-by: Andi Kleen <a...@linux.intel.com> >> --- >> tools/perf/builtin-script.c | 18 +++++++++++++++++- >> tools/perf/util/dump-insn.c | 8 ++++++++ >> tools/perf/util/dump-insn.h | 2 ++ >> .../intel-pt-decoder/intel-pt-insn-decoder.c | 8 ++++++++ >> 4 files changed, 35 insertions(+), 1 deletion(-) >> >> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c >> index 4da5e32b9e03..11868bf39e66 100644 >> --- a/tools/perf/builtin-script.c >> +++ b/tools/perf/builtin-script.c >> @@ -1049,9 +1049,18 @@ static int perf_sample__fprintf_brstackinsn(struct >> perf_sample *sample, >> >> /* >> * Print final block upto sample >> + * >> + * Due to pipeline delays the LBRs might be missing a branch >> + * or two, which can result in very large or negative blocks >> + * between final branch and sample. When this happens just >> + * continue walking after the last TO until we hit a branch. >> */ >> start = br->entries[0].to; >> end = sample->ip; >> + if (end < start) { >> + /* Missing jump. Scan 128 bytes for the next branch */ >> + end = start + 128; >> + } >> len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, >> &x.cpumode, true); >> printed += ip__fprintf_sym(start, thread, x.cpumode, x.cpu, &lastsym, >> attr, fp); >> if (len <= 0) { >> @@ -1060,7 +1069,6 @@ static int perf_sample__fprintf_brstackinsn(struct >> perf_sample *sample, >> machine, thread, &x.is64bit, &x.cpumode, false); >> if (len <= 0) >> goto out; >> - >> printed += fprintf(fp, "\t%016" PRIx64 "\t%s\n", sample->ip, >> dump_insn(&x, sample->ip, buffer, len, NULL)); >> goto out; >> @@ -1070,6 +1078,14 @@ static int perf_sample__fprintf_brstackinsn(struct >> perf_sample *sample, >> dump_insn(&x, start + off, buffer + off, len >> - off, &ilen)); >> if (ilen == 0) >> break; >> + if (arch_is_branch(buffer + off, len - off, x.is64bit) && >> + start + off != sample->ip) { >> + /* >> + * Hit a missing branch. Just stop. >> + */ >> + printed += fprintf(fp, "\t... not reaching sample >> ...\n"); >> + break; >> + } >> } >> out: >> return printed; >> diff --git a/tools/perf/util/dump-insn.c b/tools/perf/util/dump-insn.c >> index 10988d3de7ce..2bd8585db93c 100644 >> --- a/tools/perf/util/dump-insn.c >> +++ b/tools/perf/util/dump-insn.c >> @@ -13,3 +13,11 @@ const char *dump_insn(struct perf_insn *x __maybe_unused, >> *lenp = 0; >> return "?"; >> } >> + >> +__weak >> +int arch_is_branch(const unsigned char *buf __maybe_unused, >> + size_t len __maybe_unused, >> + int x86_64 __maybe_unused) >> +{ >> + return 0; >> +} >> diff --git a/tools/perf/util/dump-insn.h b/tools/perf/util/dump-insn.h >> index 0e06280a8860..650125061530 100644 >> --- a/tools/perf/util/dump-insn.h >> +++ b/tools/perf/util/dump-insn.h >> @@ -20,4 +20,6 @@ struct perf_insn { >> >> const char *dump_insn(struct perf_insn *x, u64 ip, >> u8 *inbuf, int inlen, int *lenp); >> +int arch_is_branch(const unsigned char *buf, size_t len, int x86_64); >> + >> #endif >> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c >> b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c >> index 54818828023b..1c0e289f01e6 100644 >> --- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c >> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c >> @@ -180,6 +180,14 @@ int intel_pt_get_insn(const unsigned char *buf, size_t >> len, int x86_64, >> return 0; >> } >> >> +int arch_is_branch(const unsigned char *buf, size_t len, int x86_64) >> +{ >> + struct intel_pt_insn in; >> + if (intel_pt_get_insn(buf, len, x86_64, &in) < 0) >> + return -1; >> + return in.branch != INTEL_PT_BR_NO_BRANCH; >> +} >> + >> const char *dump_insn(struct perf_insn *x, uint64_t ip __maybe_unused, >> u8 *inbuf, int inlen, int *lenp) >> { >> -- >> 2.17.2 >