Re: [PATCH 1/1] tcg: add perfmap and jitdump

2022-11-07 Thread Alex Bennée


Ilya Leoshkevich  writes:

> Add ability to dump /tmp/perf-.map and jit-.dump.
> The first one allows the perf tool to map samples to each individual
> translation block. The second one adds the ability to resolve symbol
> names, line numbers and inspect JITed code.
>
> Example of use:
>
> perf record qemu-x86_64 -perfmap ./a.out
> perf report
>
> or
>
> perf record -k 1 qemu-x86_64 -jitdump ./a.out
> perf inject -j -i perf.data -o perf.data.jitted
> perf report -i perf.data.jitted
>
> Co-developed-by: Vanderson M. do Rosario 
> Co-developed-by: Alex Bennée 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  accel/tcg/debuginfo.c | 108 +
>  accel/tcg/debuginfo.h |  54 +++
>  accel/tcg/meson.build |   2 +
>  accel/tcg/perf.c  | 333 ++
>  accel/tcg/perf.h  |  28 
>  accel/tcg/translate-all.c |   3 +
>  docs/devel/tcg.rst|  20 +++
>  linux-user/elfload.c  |   3 +
>  linux-user/exit.c |   2 +
>  linux-user/main.c |  15 ++
>  linux-user/meson.build|   1 +
>  meson.build   |   8 +
>  qemu-options.hx   |  20 +++
>  softmmu/vl.c  |  11 ++
>  tcg/tcg.c |   2 +
>  15 files changed, 610 insertions(+)
>  create mode 100644 accel/tcg/debuginfo.c
>  create mode 100644 accel/tcg/debuginfo.h
>  create mode 100644 accel/tcg/perf.c
>  create mode 100644 accel/tcg/perf.h
>
> diff --git a/accel/tcg/debuginfo.c b/accel/tcg/debuginfo.c
> new file mode 100644
> index 00..904eb23103
> --- /dev/null
> +++ b/accel/tcg/debuginfo.c
> @@ -0,0 +1,108 @@
> +/*
> + * Debug information support.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include 
> +
> +#include "debuginfo.h"
> +
> +static QemuMutex lock;
> +static Dwfl *dwfl;
> +static const Dwfl_Callbacks dwfl_callbacks = {
> +.find_elf = NULL,
> +.find_debuginfo = dwfl_standard_find_debuginfo,
> +.section_address = NULL,
> +.debuginfo_path = NULL,
> +};
> +
> +__attribute__((constructor))
> +static void debuginfo_init(void)
> +{
> +qemu_mutex_init();
> +}
> +
> +bool debuginfo_report_elf(const char *image_name, int image_fd,
> +  target_ulong load_bias)
> +{
> +qemu_mutex_lock();

You can wrap this up with a QEMU_LOCK_GUARD() { and avoid having to
catch all your exit cases.

> +
> +if (dwfl == NULL) {
> +dwfl = dwfl_begin(_callbacks);
> +} else {
> +dwfl_report_begin_add(dwfl);
> +}
> +
> +if (dwfl == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +
> +dwfl_report_elf(dwfl, image_name, image_name, image_fd, load_bias, true);
> +dwfl_report_end(dwfl, NULL, NULL);
> +qemu_mutex_unlock();
> +return true;
> +}
> +
> +bool debuginfo_get_symbol(target_ulong address,
> +  const char **symbol, target_ulong *offset)
> +{
> +Dwfl_Module *dwfl_module;
> +GElf_Off dwfl_offset;
> +GElf_Sym dwfl_sym;
> +
> +qemu_mutex_lock();
> +
> +if (dwfl == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +
> +dwfl_module = dwfl_addrmodule(dwfl, address);
> +if (dwfl_module == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +
> +*symbol = dwfl_module_addrinfo(dwfl_module, address, _offset,
> +   _sym, NULL, NULL, NULL);
> +if (*symbol == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +*offset = dwfl_offset;
> +qemu_mutex_unlock();
> +return true;
> +}
> +
> +bool debuginfo_get_line(target_ulong address,
> +const char **file, int *line)
> +{
> +Dwfl_Module *dwfl_module;
> +Dwfl_Line *dwfl_line;
> +
> +qemu_mutex_lock();

ditto.

> +
> +if (dwfl == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +
> +dwfl_module = dwfl_addrmodule(dwfl, address);
> +if (dwfl_module == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +
> +dwfl_line = dwfl_module_getsrc(dwfl_module, address);
> +if (dwfl_line == NULL) {
> +qemu_mutex_unlock();
> +return false;
> +}
> +*file = dwfl_lineinfo(dwfl_line, NULL, line, 0, NULL, NULL);
> +qemu_mutex_unlock();
> +return true;
> +}
> diff --git a/accel/tcg/debuginfo.h b/accel/tcg/debuginfo.h
> new file mode 100644
> index 00..f4f22aa786
> --- /dev/null
> +++ b/accel/tcg/debuginfo.h
> @@ -0,0 +1,54 @@
> +/*
> + * Debug information support.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef ACCEL_TCG_DEBUGINFO_H
> +#define ACCEL_TCG_DEBUGINFO_H
> +
> +#include "exec/cpu-defs.h"
> +
> +#ifdef CONFIG_LIBDW
> +/*
> + * Load debuginfo for the specified guest ELF image.
> + * Return true on success, false on failure.
> + */
> +bool debuginfo_report_elf(const char *image_name, int 

Re: [PATCH 1/1] tcg: add perfmap and jitdump

2022-10-28 Thread Christian Borntraeger

Am 12.10.22 um 07:18 schrieb Ilya Leoshkevich:

Add ability to dump /tmp/perf-.map and jit-.dump.
The first one allows the perf tool to map samples to each individual
translation block. The second one adds the ability to resolve symbol
names, line numbers and inspect JITed code.

Example of use:

 perf record qemu-x86_64 -perfmap ./a.out
 perf report

or

 perf record -k 1 qemu-x86_64 -jitdump ./a.out
 perf inject -j -i perf.data -o perf.data.jitted
 perf report -i perf.data.jitted

Co-developed-by: Vanderson M. do Rosario 
Co-developed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 


I think this would be awesome to have. I know our performance people do use 
this for Java a lot.



Re: [PATCH 1/1] tcg: add perfmap and jitdump

2022-10-17 Thread Ilya Leoshkevich
On Fri, 2022-10-14 at 07:35 +1100, Richard Henderson wrote:
> On 10/12/22 22:18, Ilya Leoshkevich wrote:
> > Add ability to dump /tmp/perf-.map and jit-.dump.
> > The first one allows the perf tool to map samples to each
> > individual
> > translation block. The second one adds the ability to resolve
> > symbol
> > names, line numbers and inspect JITed code.
> > 
> > Example of use:
> > 
> >  perf record qemu-x86_64 -perfmap ./a.out
> >  perf report
> > 
> > or
> > 
> >  perf record -k 1 qemu-x86_64 -jitdump ./a.out
> >  perf inject -j -i perf.data -o perf.data.jitted
> >  perf report -i perf.data.jitted
> > 
> > Co-developed-by: Vanderson M. do Rosario 
> > Co-developed-by: Alex Bennée 
> > Signed-off-by: Ilya Leoshkevich 
> 
> I think I remember this, and the question that was never answered
> was:
> 
> > @@ -1492,6 +1493,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >   }
> >   tb->tc.size = gen_code_size;
> >   
> > +    perf_report_code(gen_code_buf, gen_code_size, tb->icount, tb-
> > >pc);
> 
> When do_tb_flush is called, everything that is recorded in perfmap is
> invalidated.
> How do you tell perf about that?
> 
> 
> r~
> 

I don't think we should handle this altogether. Perf already knows how
to handle overlapping addresses:


$ cat main.c 
#include 
#include 
int main() 
{
void *d = dlopen("./lib1.so", RTLD_NOW);
void (*f)(void) = dlsym(d, "f");
printf("%p\n", f);
f();
dlclose(d);
d = dlopen("./lib2.so", RTLD_NOW);
f = dlsym(d, "f");
printf("%p\n", f);
f();
dlclose(d);
}

$ cat 1.c
static volatile int i = 0;

void f(void) {
while (i < 10) {
i++;
}
}

$ gcc -o main -O3 -g main.c -ldl
$ gcc -o lib1.so -shared -fPIC -O3 -g 1.c
$ cp 1.c 2.c
$ gcc -o lib2.so -shared -fPIC -O3 -g 2.c

$ perf record ./main
0x7f47925eb100
0x7f47925eb100

$ perf report
49.96%  main lib1.so   [.] f
49.92%  main lib2.so   [.] f


Note that there is "load" event - PERF_RECORD_MMAP2, but no "unload"
event. The old mappings are simply discarded by
maps__fixup_overlappings() when something new is mapped.

In our case this means that as soon as we JIT new code, perf will see
that it replaced the one that was flushed. Until then it will live with
stale mappings, but there is no harm in that, since they are not
executed.



Re: [PATCH 1/1] tcg: add perfmap and jitdump

2022-10-13 Thread Richard Henderson

On 10/12/22 22:18, Ilya Leoshkevich wrote:

Add ability to dump /tmp/perf-.map and jit-.dump.
The first one allows the perf tool to map samples to each individual
translation block. The second one adds the ability to resolve symbol
names, line numbers and inspect JITed code.

Example of use:

 perf record qemu-x86_64 -perfmap ./a.out
 perf report

or

 perf record -k 1 qemu-x86_64 -jitdump ./a.out
 perf inject -j -i perf.data -o perf.data.jitted
 perf report -i perf.data.jitted

Co-developed-by: Vanderson M. do Rosario 
Co-developed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 


I think I remember this, and the question that was never answered was:


@@ -1492,6 +1493,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  }
  tb->tc.size = gen_code_size;
  
+perf_report_code(gen_code_buf, gen_code_size, tb->icount, tb->pc);


When do_tb_flush is called, everything that is recorded in perfmap is 
invalidated.
How do you tell perf about that?


r~