Re: [PATCH 1/1] tcg: add perfmap and jitdump
Ilya Leoshkevich writes: > Add ability to dump /tmp/perf-.map and jit-.dump. > The first one allows the perf tool to map samples to each individual > translation block. The second one adds the ability to resolve symbol > names, line numbers and inspect JITed code. > > Example of use: > > perf record qemu-x86_64 -perfmap ./a.out > perf report > > or > > perf record -k 1 qemu-x86_64 -jitdump ./a.out > perf inject -j -i perf.data -o perf.data.jitted > perf report -i perf.data.jitted > > Co-developed-by: Vanderson M. do Rosario > Co-developed-by: Alex Bennée > Signed-off-by: Ilya Leoshkevich > --- > accel/tcg/debuginfo.c | 108 + > accel/tcg/debuginfo.h | 54 +++ > accel/tcg/meson.build | 2 + > accel/tcg/perf.c | 333 ++ > accel/tcg/perf.h | 28 > accel/tcg/translate-all.c | 3 + > docs/devel/tcg.rst| 20 +++ > linux-user/elfload.c | 3 + > linux-user/exit.c | 2 + > linux-user/main.c | 15 ++ > linux-user/meson.build| 1 + > meson.build | 8 + > qemu-options.hx | 20 +++ > softmmu/vl.c | 11 ++ > tcg/tcg.c | 2 + > 15 files changed, 610 insertions(+) > create mode 100644 accel/tcg/debuginfo.c > create mode 100644 accel/tcg/debuginfo.h > create mode 100644 accel/tcg/perf.c > create mode 100644 accel/tcg/perf.h > > diff --git a/accel/tcg/debuginfo.c b/accel/tcg/debuginfo.c > new file mode 100644 > index 00..904eb23103 > --- /dev/null > +++ b/accel/tcg/debuginfo.c > @@ -0,0 +1,108 @@ > +/* > + * Debug information support. > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > + > +#include "qemu/osdep.h" > + > +#include > + > +#include "debuginfo.h" > + > +static QemuMutex lock; > +static Dwfl *dwfl; > +static const Dwfl_Callbacks dwfl_callbacks = { > +.find_elf = NULL, > +.find_debuginfo = dwfl_standard_find_debuginfo, > +.section_address = NULL, > +.debuginfo_path = NULL, > +}; > + > +__attribute__((constructor)) > +static void debuginfo_init(void) > +{ > +qemu_mutex_init(); > +} > + > +bool debuginfo_report_elf(const char *image_name, int image_fd, > + target_ulong load_bias) > +{ > +qemu_mutex_lock(); You can wrap this up with a QEMU_LOCK_GUARD() { and avoid having to catch all your exit cases. > + > +if (dwfl == NULL) { > +dwfl = dwfl_begin(_callbacks); > +} else { > +dwfl_report_begin_add(dwfl); > +} > + > +if (dwfl == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > + > +dwfl_report_elf(dwfl, image_name, image_name, image_fd, load_bias, true); > +dwfl_report_end(dwfl, NULL, NULL); > +qemu_mutex_unlock(); > +return true; > +} > + > +bool debuginfo_get_symbol(target_ulong address, > + const char **symbol, target_ulong *offset) > +{ > +Dwfl_Module *dwfl_module; > +GElf_Off dwfl_offset; > +GElf_Sym dwfl_sym; > + > +qemu_mutex_lock(); > + > +if (dwfl == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > + > +dwfl_module = dwfl_addrmodule(dwfl, address); > +if (dwfl_module == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > + > +*symbol = dwfl_module_addrinfo(dwfl_module, address, _offset, > + _sym, NULL, NULL, NULL); > +if (*symbol == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > +*offset = dwfl_offset; > +qemu_mutex_unlock(); > +return true; > +} > + > +bool debuginfo_get_line(target_ulong address, > +const char **file, int *line) > +{ > +Dwfl_Module *dwfl_module; > +Dwfl_Line *dwfl_line; > + > +qemu_mutex_lock(); ditto. > + > +if (dwfl == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > + > +dwfl_module = dwfl_addrmodule(dwfl, address); > +if (dwfl_module == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > + > +dwfl_line = dwfl_module_getsrc(dwfl_module, address); > +if (dwfl_line == NULL) { > +qemu_mutex_unlock(); > +return false; > +} > +*file = dwfl_lineinfo(dwfl_line, NULL, line, 0, NULL, NULL); > +qemu_mutex_unlock(); > +return true; > +} > diff --git a/accel/tcg/debuginfo.h b/accel/tcg/debuginfo.h > new file mode 100644 > index 00..f4f22aa786 > --- /dev/null > +++ b/accel/tcg/debuginfo.h > @@ -0,0 +1,54 @@ > +/* > + * Debug information support. > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > + > +#ifndef ACCEL_TCG_DEBUGINFO_H > +#define ACCEL_TCG_DEBUGINFO_H > + > +#include "exec/cpu-defs.h" > + > +#ifdef CONFIG_LIBDW > +/* > + * Load debuginfo for the specified guest ELF image. > + * Return true on success, false on failure. > + */ > +bool debuginfo_report_elf(const char *image_name, int
Re: [PATCH 1/1] tcg: add perfmap and jitdump
Am 12.10.22 um 07:18 schrieb Ilya Leoshkevich: Add ability to dump /tmp/perf-.map and jit-.dump. The first one allows the perf tool to map samples to each individual translation block. The second one adds the ability to resolve symbol names, line numbers and inspect JITed code. Example of use: perf record qemu-x86_64 -perfmap ./a.out perf report or perf record -k 1 qemu-x86_64 -jitdump ./a.out perf inject -j -i perf.data -o perf.data.jitted perf report -i perf.data.jitted Co-developed-by: Vanderson M. do Rosario Co-developed-by: Alex Bennée Signed-off-by: Ilya Leoshkevich I think this would be awesome to have. I know our performance people do use this for Java a lot.
Re: [PATCH 1/1] tcg: add perfmap and jitdump
On Fri, 2022-10-14 at 07:35 +1100, Richard Henderson wrote: > On 10/12/22 22:18, Ilya Leoshkevich wrote: > > Add ability to dump /tmp/perf-.map and jit-.dump. > > The first one allows the perf tool to map samples to each > > individual > > translation block. The second one adds the ability to resolve > > symbol > > names, line numbers and inspect JITed code. > > > > Example of use: > > > > perf record qemu-x86_64 -perfmap ./a.out > > perf report > > > > or > > > > perf record -k 1 qemu-x86_64 -jitdump ./a.out > > perf inject -j -i perf.data -o perf.data.jitted > > perf report -i perf.data.jitted > > > > Co-developed-by: Vanderson M. do Rosario > > Co-developed-by: Alex Bennée > > Signed-off-by: Ilya Leoshkevich > > I think I remember this, and the question that was never answered > was: > > > @@ -1492,6 +1493,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, > > } > > tb->tc.size = gen_code_size; > > > > + perf_report_code(gen_code_buf, gen_code_size, tb->icount, tb- > > >pc); > > When do_tb_flush is called, everything that is recorded in perfmap is > invalidated. > How do you tell perf about that? > > > r~ > I don't think we should handle this altogether. Perf already knows how to handle overlapping addresses: $ cat main.c #include #include int main() { void *d = dlopen("./lib1.so", RTLD_NOW); void (*f)(void) = dlsym(d, "f"); printf("%p\n", f); f(); dlclose(d); d = dlopen("./lib2.so", RTLD_NOW); f = dlsym(d, "f"); printf("%p\n", f); f(); dlclose(d); } $ cat 1.c static volatile int i = 0; void f(void) { while (i < 10) { i++; } } $ gcc -o main -O3 -g main.c -ldl $ gcc -o lib1.so -shared -fPIC -O3 -g 1.c $ cp 1.c 2.c $ gcc -o lib2.so -shared -fPIC -O3 -g 2.c $ perf record ./main 0x7f47925eb100 0x7f47925eb100 $ perf report 49.96% main lib1.so [.] f 49.92% main lib2.so [.] f Note that there is "load" event - PERF_RECORD_MMAP2, but no "unload" event. The old mappings are simply discarded by maps__fixup_overlappings() when something new is mapped. In our case this means that as soon as we JIT new code, perf will see that it replaced the one that was flushed. Until then it will live with stale mappings, but there is no harm in that, since they are not executed.
Re: [PATCH 1/1] tcg: add perfmap and jitdump
On 10/12/22 22:18, Ilya Leoshkevich wrote: Add ability to dump /tmp/perf-.map and jit-.dump. The first one allows the perf tool to map samples to each individual translation block. The second one adds the ability to resolve symbol names, line numbers and inspect JITed code. Example of use: perf record qemu-x86_64 -perfmap ./a.out perf report or perf record -k 1 qemu-x86_64 -jitdump ./a.out perf inject -j -i perf.data -o perf.data.jitted perf report -i perf.data.jitted Co-developed-by: Vanderson M. do Rosario Co-developed-by: Alex Bennée Signed-off-by: Ilya Leoshkevich I think I remember this, and the question that was never answered was: @@ -1492,6 +1493,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, } tb->tc.size = gen_code_size; +perf_report_code(gen_code_buf, gen_code_size, tb->icount, tb->pc); When do_tb_flush is called, everything that is recorded in perfmap is invalidated. How do you tell perf about that? r~
[PATCH 1/1] tcg: add perfmap and jitdump
Add ability to dump /tmp/perf-.map and jit-.dump. The first one allows the perf tool to map samples to each individual translation block. The second one adds the ability to resolve symbol names, line numbers and inspect JITed code. Example of use: perf record qemu-x86_64 -perfmap ./a.out perf report or perf record -k 1 qemu-x86_64 -jitdump ./a.out perf inject -j -i perf.data -o perf.data.jitted perf report -i perf.data.jitted Co-developed-by: Vanderson M. do Rosario Co-developed-by: Alex Bennée Signed-off-by: Ilya Leoshkevich --- accel/tcg/debuginfo.c | 108 + accel/tcg/debuginfo.h | 54 +++ accel/tcg/meson.build | 2 + accel/tcg/perf.c | 333 ++ accel/tcg/perf.h | 28 accel/tcg/translate-all.c | 3 + docs/devel/tcg.rst| 20 +++ linux-user/elfload.c | 3 + linux-user/exit.c | 2 + linux-user/main.c | 15 ++ linux-user/meson.build| 1 + meson.build | 8 + qemu-options.hx | 20 +++ softmmu/vl.c | 11 ++ tcg/tcg.c | 2 + 15 files changed, 610 insertions(+) create mode 100644 accel/tcg/debuginfo.c create mode 100644 accel/tcg/debuginfo.h create mode 100644 accel/tcg/perf.c create mode 100644 accel/tcg/perf.h diff --git a/accel/tcg/debuginfo.c b/accel/tcg/debuginfo.c new file mode 100644 index 00..904eb23103 --- /dev/null +++ b/accel/tcg/debuginfo.c @@ -0,0 +1,108 @@ +/* + * Debug information support. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" + +#include + +#include "debuginfo.h" + +static QemuMutex lock; +static Dwfl *dwfl; +static const Dwfl_Callbacks dwfl_callbacks = { +.find_elf = NULL, +.find_debuginfo = dwfl_standard_find_debuginfo, +.section_address = NULL, +.debuginfo_path = NULL, +}; + +__attribute__((constructor)) +static void debuginfo_init(void) +{ +qemu_mutex_init(); +} + +bool debuginfo_report_elf(const char *image_name, int image_fd, + target_ulong load_bias) +{ +qemu_mutex_lock(); + +if (dwfl == NULL) { +dwfl = dwfl_begin(_callbacks); +} else { +dwfl_report_begin_add(dwfl); +} + +if (dwfl == NULL) { +qemu_mutex_unlock(); +return false; +} + +dwfl_report_elf(dwfl, image_name, image_name, image_fd, load_bias, true); +dwfl_report_end(dwfl, NULL, NULL); +qemu_mutex_unlock(); +return true; +} + +bool debuginfo_get_symbol(target_ulong address, + const char **symbol, target_ulong *offset) +{ +Dwfl_Module *dwfl_module; +GElf_Off dwfl_offset; +GElf_Sym dwfl_sym; + +qemu_mutex_lock(); + +if (dwfl == NULL) { +qemu_mutex_unlock(); +return false; +} + +dwfl_module = dwfl_addrmodule(dwfl, address); +if (dwfl_module == NULL) { +qemu_mutex_unlock(); +return false; +} + +*symbol = dwfl_module_addrinfo(dwfl_module, address, _offset, + _sym, NULL, NULL, NULL); +if (*symbol == NULL) { +qemu_mutex_unlock(); +return false; +} +*offset = dwfl_offset; +qemu_mutex_unlock(); +return true; +} + +bool debuginfo_get_line(target_ulong address, +const char **file, int *line) +{ +Dwfl_Module *dwfl_module; +Dwfl_Line *dwfl_line; + +qemu_mutex_lock(); + +if (dwfl == NULL) { +qemu_mutex_unlock(); +return false; +} + +dwfl_module = dwfl_addrmodule(dwfl, address); +if (dwfl_module == NULL) { +qemu_mutex_unlock(); +return false; +} + +dwfl_line = dwfl_module_getsrc(dwfl_module, address); +if (dwfl_line == NULL) { +qemu_mutex_unlock(); +return false; +} +*file = dwfl_lineinfo(dwfl_line, NULL, line, 0, NULL, NULL); +qemu_mutex_unlock(); +return true; +} diff --git a/accel/tcg/debuginfo.h b/accel/tcg/debuginfo.h new file mode 100644 index 00..f4f22aa786 --- /dev/null +++ b/accel/tcg/debuginfo.h @@ -0,0 +1,54 @@ +/* + * Debug information support. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#ifndef ACCEL_TCG_DEBUGINFO_H +#define ACCEL_TCG_DEBUGINFO_H + +#include "exec/cpu-defs.h" + +#ifdef CONFIG_LIBDW +/* + * Load debuginfo for the specified guest ELF image. + * Return true on success, false on failure. + */ +bool debuginfo_report_elf(const char *image_name, int image_fd, + target_ulong load_bias); + +/* + * Find a symbol name associated with the specified guest PC. + * Return true on success, false if there is no associated symbol. + */ +bool debuginfo_get_symbol(target_ulong address, + const char **symbol, target_ulong *offset); + +/* + * Find a line number associated with the specified guest PC. + * Return true on success, false if there is no associated line number. + */ +bool