[tip:perf/core] perf event-parse: Use fixed size string for comms
Commit-ID: c9f23d2bc21cb263ae931f3e264d003d746107bb Gitweb: https://git.kernel.org/tip/c9f23d2bc21cb263ae931f3e264d003d746107bb Author: Chris Phlipot AuthorDate: Wed, 29 Aug 2018 19:19:50 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Thu, 30 Aug 2018 14:51:45 -0300 perf event-parse: Use fixed size string for comms Some implementations of libc do not support the 'm' width modifier as part of the scanf string format specifier. This can cause the parsing to fail. Since the parser never checks if the scanf parsing was successesful, this can result in a crash. Change the comm string to be allocated as a fixed size instead of dynamically using 'm' scanf width modifier. This can be safely done since comm size is limited to 16 bytes by TASK_COMM_LEN within the kernel. This change prevents perf from crashing when linked against bionic as well as reduces the total number of heap allocations and frees invoked while accomplishing the same task. Signed-off-by: Chris Phlipot Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180830021950.15563-1-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/trace-event-parse.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c index 920b1d58a068..e76214f8d596 100644 --- a/tools/perf/util/trace-event-parse.c +++ b/tools/perf/util/trace-event-parse.c @@ -164,16 +164,15 @@ void parse_ftrace_printk(struct tep_handle *pevent, void parse_saved_cmdline(struct tep_handle *pevent, char *file, unsigned int size __maybe_unused) { - char *comm; + char comm[17]; /* Max comm length in the kernel is 16. */ char *line; char *next = NULL; int pid; line = strtok_r(file, "\n", &next); while (line) { - sscanf(line, "%d %ms", &pid, &comm); - tep_register_comm(pevent, comm, pid); - free(comm); + if (sscanf(line, "%d %16s", &pid, comm) == 2) + tep_register_comm(pevent, comm, pid); line = strtok_r(NULL, "\n", &next); } }
[tip:perf/core] perf util: Fix bad memory access in trace info.
Commit-ID: a72f64261359b7451f8478f2a2bf357b4e6c757f Gitweb: https://git.kernel.org/tip/a72f64261359b7451f8478f2a2bf357b4e6c757f Author: Chris Phlipot AuthorDate: Tue, 28 Aug 2018 23:19:54 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Thu, 30 Aug 2018 14:50:50 -0300 perf util: Fix bad memory access in trace info. In the write to the output_fd in the error condition of record_saved_cmdline(), we are writing 8 bytes from a memory location on the stack that contains a primitive that is only 4 bytes in size. Change the primitive to 8 bytes in size to match the size of the write in order to avoid reading unknown memory from the stack. Signed-off-by: Chris Phlipot Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180829061954.18871-1-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/trace-event-info.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c index c85d0d1a65ed..7b0ca7cbb7de 100644 --- a/tools/perf/util/trace-event-info.c +++ b/tools/perf/util/trace-event-info.c @@ -377,7 +377,7 @@ out: static int record_saved_cmdline(void) { - unsigned int size; + unsigned long long size; char *path; struct stat st; int ret, err = 0;
[tip:perf/core] perf tools: Update android build documentation
Commit-ID: 3d0376113ed9cf92b86885bf5102944b61523f5b Gitweb: http://git.kernel.org/tip/3d0376113ed9cf92b86885bf5102944b61523f5b Author: Chris Phlipot AuthorDate: Thu, 30 Jun 2016 22:12:35 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 4 Jul 2016 20:27:27 -0300 perf tools: Update android build documentation Update the android build documentation according to recent android build fixes. The instructions for step 1a and step 2 were updated to work with NDK version 11(oldest supported version) and NDK version 12(current version). Signed-off-by: Chris Phlipot Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1467349955-1135-5-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/android.txt | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/perf/Documentation/android.txt b/tools/perf/Documentation/android.txt index 8484c3a..24a5999 100644 --- a/tools/perf/Documentation/android.txt +++ b/tools/perf/Documentation/android.txt @@ -12,14 +12,14 @@ Set the NDK variable to point to the path where you installed the NDK: 2. Set cross-compiling environment variables for NDK toolchain and sysroot. For arm: - export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.6/prebuilt/linux-x86/bin/arm-linux-androideabi- - export NDK_SYSROOT=${NDK}/platforms/android-9/arch-arm + export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- + export NDK_SYSROOT=${NDK}/platforms/android-24/arch-arm For x86: - export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.6/prebuilt/linux-x86/bin/i686-linux-android- - export NDK_SYSROOT=${NDK}/platforms/android-9/arch-x86 + export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.9/prebuilt/linux-x86_64/bin/i686-linux-android- + export NDK_SYSROOT=${NDK}/platforms/android-24/arch-x86 -This method is not working for Android NDK versions up to Revision 8b. -perf uses some bionic enhancements that are not included in these NDK versions. +This method is only tested for Android NDK versions Revision 11b and later. +perf uses some bionic enhancements that are not included in prior NDK versions. You can use method (b) described below instead. (b). Use the Android source tree @@ -49,9 +49,9 @@ II. Compile perf for Android You need to run make with the NDK toolchain and sysroot defined above: For arm: - make ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} CFLAGS="--sysroot=${NDK_SYSROOT}" + make WERROR=0 ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" For x86: - make ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} CFLAGS="--sysroot=${NDK_SYSROOT}" + make WERROR=0 ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" III. Install perf ---
[tip:perf/core] tools lib api: Respect WERROR=0 for build
Commit-ID: b983d54473344a9ef524a231943478047a779796 Gitweb: http://git.kernel.org/tip/b983d54473344a9ef524a231943478047a779796 Author: Chris Phlipot AuthorDate: Thu, 30 Jun 2016 22:12:32 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 4 Jul 2016 20:27:26 -0300 tools lib api: Respect WERROR=0 for build This enables the workaround for compilers that generate warnings when compiling libapi. Signed-off-by: Chris Phlipot Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1467349955-1135-2-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/lib/api/Makefile | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile index 67ff93e..c7ceea6 100644 --- a/tools/lib/api/Makefile +++ b/tools/lib/api/Makefile @@ -17,7 +17,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libapi.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +# Treat warnings as errors unless directed not to +ifneq ($(WERROR),0) + CFLAGS += -Werror +endif + CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 CFLAGS += -I$(srctree)/tools/lib/api
[tip:perf/core] tools lib subcmd: Respect WERROR=0 for build
Commit-ID: fd01d06ae33be63cff7d133e650cd1eb32f1d548 Gitweb: http://git.kernel.org/tip/fd01d06ae33be63cff7d133e650cd1eb32f1d548 Author: Chris Phlipot AuthorDate: Thu, 30 Jun 2016 22:12:33 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 4 Jul 2016 20:27:26 -0300 tools lib subcmd: Respect WERROR=0 for build this enables the workaround for compilers that generate warnings when compiling libsubcmd. Signed-off-by: Chris Phlipot Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1467349955-1135-3-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/lib/subcmd/Makefile | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index a810370..ce4b7e5 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -19,7 +19,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libsubcmd.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +# Treat warnings as errors unless directed not to +ifneq ($(WERROR),0) + CFLAGS += -Werror +endif + CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE CFLAGS += -I$(srctree)/tools/include/
Re: [PATCH 3/4] perf tool: Fix build when sysconf doesn't support cache line size
On 07/04/2016 05:26 PM, Arnaldo Carvalho de Melo wrote: Em Mon, Jul 04, 2016 at 05:19:20PM -0700, Chris Phlipot escreveu: On 07/04/2016 03:48 PM, Arnaldo Carvalho de Melo wrote: Em Thu, Jun 30, 2016 at 10:12:34PM -0700, Chris Phlipot escreveu: Enable perf to build on libc implementations where sysconf() doesn't support _SC_LEVEL1_DCACHE_LINESIZE as a parameter. For example, the Bionic implementation does not support this as a paremter. Older versions of Bionic will throw an error when this is passed in as a parameter, and more recent versions will just return 0 as the cache line size. Signed-off-by: Chris Phlipot --- tools/perf/perf.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 8f21922..113ca5b 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -509,7 +509,11 @@ int main(int argc, const char **argv) /* The page_size is placed in util object. */ page_size = sysconf(_SC_PAGE_SIZE); +#ifdef _SC_LEVEL1_DCACHE_LINESIZE cacheline_size = sysconf(_SC_LEVEL1_DCACHE_LINESIZE); +#else + cacheline_size = 0; +#endif Couldn't we instead fallback to: sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", &cacheline_size) ? I agree that in general this would be a better fallback, but in all Android images I have tested so far, "devices/system/cpu/cpu0/cache" does not exist. I know not know of a good way to retrieve cache line size in this case. I would be ok with attempting to get cacheline size using using the following methods, unless you have other ideas: 1. attempt to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE) 2. attempt to use sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", &cacheline_size) 3. set to zero if both of the above fail. Ok, but perhaps we should have some sort of warning in places using this? - Arnaldo Such as printing a warning when cacheline_size is set to zero, or simply adding comments to the code in areas where cacheline_size is used? -Chris if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0) sysctl_perf_event_max_stack = value; -- 2.7.4
Re: [PATCH 3/4] perf tool: Fix build when sysconf doesn't support cache line size
On 07/04/2016 03:48 PM, Arnaldo Carvalho de Melo wrote: Em Thu, Jun 30, 2016 at 10:12:34PM -0700, Chris Phlipot escreveu: Enable perf to build on libc implementations where sysconf() doesn't support _SC_LEVEL1_DCACHE_LINESIZE as a parameter. For example, the Bionic implementation does not support this as a paremter. Older versions of Bionic will throw an error when this is passed in as a parameter, and more recent versions will just return 0 as the cache line size. Signed-off-by: Chris Phlipot --- tools/perf/perf.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 8f21922..113ca5b 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -509,7 +509,11 @@ int main(int argc, const char **argv) /* The page_size is placed in util object. */ page_size = sysconf(_SC_PAGE_SIZE); +#ifdef _SC_LEVEL1_DCACHE_LINESIZE cacheline_size = sysconf(_SC_LEVEL1_DCACHE_LINESIZE); +#else + cacheline_size = 0; +#endif Couldn't we instead fallback to: sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", &cacheline_size) ? I agree that in general this would be a better fallback, but in all Android images I have tested so far, "devices/system/cpu/cpu0/cache" does not exist. I know not know of a good way to retrieve cache line size in this case. I would be ok with attempting to get cacheline size using using the following methods, unless you have other ideas: 1. attempt to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE) 2. attempt to use sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", &cacheline_size) 3. set to zero if both of the above fail. if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0) sysctl_perf_event_max_stack = value; -- 2.7.4
[PATCH 3/4] perf tool: Fix build when sysconf doesn't support cache line size
Enable perf to build on libc implementations where sysconf() doesn't support _SC_LEVEL1_DCACHE_LINESIZE as a parameter. For example, the Bionic implementation does not support this as a paremter. Older versions of Bionic will throw an error when this is passed in as a parameter, and more recent versions will just return 0 as the cache line size. Signed-off-by: Chris Phlipot --- tools/perf/perf.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 8f21922..113ca5b 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -509,7 +509,11 @@ int main(int argc, const char **argv) /* The page_size is placed in util object. */ page_size = sysconf(_SC_PAGE_SIZE); +#ifdef _SC_LEVEL1_DCACHE_LINESIZE cacheline_size = sysconf(_SC_LEVEL1_DCACHE_LINESIZE); +#else + cacheline_size = 0; +#endif if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0) sysctl_perf_event_max_stack = value; -- 2.7.4
[PATCH 1/4] tools lib api: Respect WERROR=0 for build
This enables the workaround for compilers that generate warnings when compiling libapi. Signed-off-by: Chris Phlipot --- tools/lib/api/Makefile | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile index 67ff93e..c7ceea6 100644 --- a/tools/lib/api/Makefile +++ b/tools/lib/api/Makefile @@ -17,7 +17,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libapi.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +# Treat warnings as errors unless directed not to +ifneq ($(WERROR),0) + CFLAGS += -Werror +endif + CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 CFLAGS += -I$(srctree)/tools/lib/api -- 2.7.4
[PATCH 2/4] tools lib subcmd: Respect WERROR=0 for build
this enables the workaround for compilers that generate warnings when compiling libsubcmd. Signed-off-by: Chris Phlipot --- tools/lib/subcmd/Makefile | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index a810370..ce4b7e5 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -19,7 +19,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libsubcmd.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +# Treat warnings as errors unless directed not to +ifneq ($(WERROR),0) + CFLAGS += -Werror +endif + CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE CFLAGS += -I$(srctree)/tools/include/ -- 2.7.4
[PATCH 0/4] perf tool: Fix Android build
It looks like the tools/perf/Documentation/android.txt hasn't been updated in a while. Following the instructions in this document to cross-compile perf for Android results in several build errors. This patch-set aims to fix/workaround the incompatibilities introduced since the android perf build was last tested. The changes were tested to build for ubuntu 16.04 as well as cross compile for android using NDK Versions 11 and 12. to test android arm cross compile: $ wget http://dl.google.com/android/repository/android-ndk-r12-linux-x86_64.zip $ unzip android-ndk-r12-linux-x86_64.zip $ export NDK_TOOLCHAIN=`pwd`/android-ndk-r12/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- $ export NDK_SYSROOT=`pwd`/android-ndk-r12/platforms/android-24/arch-arm $ make WERROR=0 ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" Chris Phlipot (4): tools lib api: Respect WERROR=0 for build tools lib subcmd: Respect WERROR=0 for build perf tool: Fix build when sysconf doesn't support cache line size perf tool: Update android build documentation tools/lib/api/Makefile | 8 +++- tools/lib/subcmd/Makefile| 8 +++- tools/perf/Documentation/android.txt | 16 tools/perf/perf.c| 4 4 files changed, 26 insertions(+), 10 deletions(-) -- 2.7.4
[PATCH 4/4] perf tool: Update android build documentation
Update the android build documentation according to recent android build fixes. The instructions for step 1a and step 2 were updated to work with NDK version 11(oldest supported version) and NDK version 12(current version). Signed-off-by: Chris Phlipot --- tools/perf/Documentation/android.txt | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/perf/Documentation/android.txt b/tools/perf/Documentation/android.txt index 8484c3a..24a5999 100644 --- a/tools/perf/Documentation/android.txt +++ b/tools/perf/Documentation/android.txt @@ -12,14 +12,14 @@ Set the NDK variable to point to the path where you installed the NDK: 2. Set cross-compiling environment variables for NDK toolchain and sysroot. For arm: - export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.6/prebuilt/linux-x86/bin/arm-linux-androideabi- - export NDK_SYSROOT=${NDK}/platforms/android-9/arch-arm + export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- + export NDK_SYSROOT=${NDK}/platforms/android-24/arch-arm For x86: - export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.6/prebuilt/linux-x86/bin/i686-linux-android- - export NDK_SYSROOT=${NDK}/platforms/android-9/arch-x86 + export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.9/prebuilt/linux-x86_64/bin/i686-linux-android- + export NDK_SYSROOT=${NDK}/platforms/android-24/arch-x86 -This method is not working for Android NDK versions up to Revision 8b. -perf uses some bionic enhancements that are not included in these NDK versions. +This method is only tested for Android NDK versions Revision 11b and later. +perf uses some bionic enhancements that are not included in prior NDK versions. You can use method (b) described below instead. (b). Use the Android source tree @@ -49,9 +49,9 @@ II. Compile perf for Android You need to run make with the NDK toolchain and sysroot defined above: For arm: - make ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} CFLAGS="--sysroot=${NDK_SYSROOT}" + make WERROR=0 ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" For x86: - make ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} CFLAGS="--sysroot=${NDK_SYSROOT}" + make WERROR=0 ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" III. Install perf --- -- 2.7.4
[tip:perf/core] perf script: Fix export of callchains with recursion in db-export
Commit-ID: 83302e79b18f75266e4a44281e8432f61d57d441 Gitweb: http://git.kernel.org/tip/83302e79b18f75266e4a44281e8432f61d57d441 Author: Chris Phlipot AuthorDate: Tue, 10 May 2016 20:26:49 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 11 May 2016 12:24:58 -0300 perf script: Fix export of callchains with recursion in db-export When an IP with an unresolved symbol occurs in the callchain more than once (ie. recursion), then duplicate symbols can be created because the callchain nodes are never updated after they are first created. To fix this issue we call dso__find_symbol whenever we encounter a NULL symbol, in case we already added a symbol at that IP since we started traversing the callchain. This change prevents duplicate symbols from being exported when duplicate IPs are present in the callchain. Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462937209-6032-5-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/db-export.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 8ca4186..8d96c80 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -326,6 +326,10 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.machine = machine; al.addr = node->ip; + if (al.map && !al.sym) + al.sym = dso__find_symbol(al.map->dso, MAP__FUNCTION, + al.addr); + db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); /* add node to the call path tree if it doesn't exist */
[tip:perf/core] perf script: Fix callchain addresses in db-export
Commit-ID: 7a2544c004a6c576b1e307f30925b165affe6a22 Gitweb: http://git.kernel.org/tip/7a2544c004a6c576b1e307f30925b165affe6a22 Author: Chris Phlipot AuthorDate: Tue, 10 May 2016 20:26:48 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 11 May 2016 12:24:58 -0300 perf script: Fix callchain addresses in db-export Remove the call to map_ip() to adjust al.addr, because it has already been called when assembling the callchain, in: thread__resolve_callchain_sample(perf_sample) add_callchain_ip(ip = perf_sample->callchain->ips[j]) thread__find_addr_location(addr = ip) thread__find_addr_map(addr) { al->addr = addr if (al->map) al->addr = al->map->map_ip(al->map, al->addr); } Calling it a second time can result in incorrect addresses being used. This can have effects such as duplicate symbols being created and exported. Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462937209-6032-4-git-send-email-cphlip...@gmail.com [ Show the callchain where it is done, to help reviewing this change down the line ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/db-export.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 2ef1f69..8ca4186 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -324,10 +324,7 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.sym = node->sym; al.map = node->map; al.machine = machine; - if (al.map) - al.addr = al.map->map_ip(al.map, node->ip); - else - al.addr = node->ip; + al.addr = node->ip; db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset);
[tip:perf/core] perf script: Fix symbol insertion behavior in db-export
Commit-ID: bd0a51dd2794f1d17d4e7a34ad66db845cef3e5a Gitweb: http://git.kernel.org/tip/bd0a51dd2794f1d17d4e7a34ad66db845cef3e5a Author: Chris Phlipot AuthorDate: Tue, 10 May 2016 20:26:47 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 11 May 2016 12:24:57 -0300 perf script: Fix symbol insertion behavior in db-export Use the dso__insert_symbol function instead of symbols__insert() in order to properly update the dso symbol cache. If the cache is not updated, then duplicate symbols can be unintentionally created, inserted, and exported. This change prevents duplicate symbols from being exported due to dso__find_symbol() using a stale symbol cache. Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462937209-6032-3-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/db-export.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index f8e3057..2ef1f69 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -260,8 +260,7 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al, if (!al->sym) { al->sym = symbol__new(al->addr, 0, 0, "unknown"); if (al->sym) - symbols__insert(&dso->symbols[al->map->type], - al->sym); + dso__insert_symbol(dso, al->map->type, al->sym); } if (al->sym) {
[tip:perf/core] perf symbols: Add dso__insert_symbol function
Commit-ID: ae93a6c70838b87151ac12589dc507dbf4f2f067 Gitweb: http://git.kernel.org/tip/ae93a6c70838b87151ac12589dc507dbf4f2f067 Author: Chris Phlipot AuthorDate: Tue, 10 May 2016 20:26:46 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 11 May 2016 12:24:57 -0300 perf symbols: Add dso__insert_symbol function The current method for inserting symbols is to use the symbols__insert() function. However symbols__insert() does not update the dso symbol cache. This causes problems in the following scenario: 1. symbol not found at addr using dso__find_symbol 2. symbol inserted at addr using the existing symbols__insert function 3. symbol still not found at addr using dso__find_symbol() because cache isn't updated. This is undesired behavior. The undesired behavior in (3) is addressed by creating a new function, dso__insert_symbol() to both insert the symbol and update the symbol cache if necessary. If dso__insert_symbol() is used in (2) instead of symbols__insert(), then the undesired behavior in (3) is avoided. Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462937209-6032-2-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/symbol.c | 12 tools/perf/util/symbol.h | 3 +++ 2 files changed, 15 insertions(+) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 2946295..21af8e8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -413,6 +413,18 @@ void dso__reset_find_symbol_cache(struct dso *dso) } } +void dso__insert_symbol(struct dso *dso, enum map_type type, struct symbol *sym) +{ + symbols__insert(&dso->symbols[type], sym); + + /* update the symbol cache if necessary */ + if (dso->last_find_result[type].addr >= sym->start && + (dso->last_find_result[type].addr < sym->end || + sym->start == sym->end)) { + dso->last_find_result[type].symbol = sym; + } +} + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr) { diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 07211c2..2b5e4ed 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -246,6 +246,9 @@ int __dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, int dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, symbol_filter_t filter); +void dso__insert_symbol(struct dso *dso, enum map_type type, + struct symbol *sym); + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr); struct symbol *dso__find_symbol_by_name(struct dso *dso, enum map_type type,
[PATCH v2 0/4] perf script: fix duplicate symbols in db-export
Changes since v1: - fixed scripts/checkpatch.pl errors This patch set contains 3 fixes for duplicate symbol creation in the db-export implementation and one new symbol API required for the fixes. commit 9c7b37cd63d0 ("perf symbols: Fix handling of zero-length symbols.") already removed the majority of duplicates, but these fixes take care of the remaining corner cases. each patch (except for the 1st, which is a dependency for patch 2) reduces the number of duplicate symbols exported. When all patches are applied, my test workload has no more duplicate symbols being exported. Tests ran: $perf record --call-graph=dwarf stress -c 2 -t 20 $perf script -s scripts/python/export-to-postgresql.py test all callchains $psql test To show the effect of the changes we run the following query before/after the changes on a database created using the export-to-postgresql.py script with callchains enabled. If this query returns any value greater than 1, then it means that there are duplicates present. In the test workload, at least one symbol occurs 299 times before applying the fixes: test=# select count(*) as cnt from symbols group by sym_start,sym_end,dso_id order by cnt desc limit 1; cnt - 299 (1 row) After applying the fixes no symbol occurs more than once: test=# select count(*) as cnt from symbols group by sym_start,sym_end,dso_id order by cnt desc limit 1; cnt - 1 (1 row) Chris Phlipot (4): perf symbols: add dso__insert_symbol function perf script: fix symbol insertion behavior in db-export perf script: fix callchain addresses in db-export perf script: fix export of callchains with recursion in db-export tools/perf/util/db-export.c | 12 ++-- tools/perf/util/symbol.c| 12 tools/perf/util/symbol.h| 3 +++ 3 files changed, 21 insertions(+), 6 deletions(-) -- 2.7.4
[PATCH v2 3/4] perf script: fix callchain addresses in db-export
Remove the call to map_ip, because it has already been called when assembling the callchain. Calling it a second time can result in incorrect addresses being used. This can have effects such as duplicate symbols being created and exported. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 2ef1f69..8ca4186 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -324,10 +324,7 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.sym = node->sym; al.map = node->map; al.machine = machine; - if (al.map) - al.addr = al.map->map_ip(al.map, node->ip); - else - al.addr = node->ip; + al.addr = node->ip; db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); -- 2.7.4
[PATCH v2 2/4] perf script: fix symbol insertion behavior in db-export
Use the dso__insert_symbol function instead of symbols__insert in order to properly update the dso symbol cache. If the cache is not updated, then duplicate symbols can be unintentionally created, inserted, and exported. This change prevents duplicate symbols from being exported due to dso__find_symbol using a stale symbol cache. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index f8e3057..2ef1f69 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -260,8 +260,7 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al, if (!al->sym) { al->sym = symbol__new(al->addr, 0, 0, "unknown"); if (al->sym) - symbols__insert(&dso->symbols[al->map->type], - al->sym); + dso__insert_symbol(dso, al->map->type, al->sym); } if (al->sym) { -- 2.7.4
[PATCH v2 4/4] perf script: fix export of callchains with recursion in db-export
When an IP with an unresolved symbol occurs in the callchain more than once (ie. recursion), then duplicate symbols can be created because the callchain nodes are never updated after they are first created. To fix this issue we call dso__find_symbol whenever we encounter a NULL symbol, in case we already added a symbol at that IP since we started traversing the callchain. This change prevents duplicate symbols from being exported when duplicate IPs are present in the callchain. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 8ca4186..8d96c80 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -326,6 +326,10 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.machine = machine; al.addr = node->ip; + if (al.map && !al.sym) + al.sym = dso__find_symbol(al.map->dso, MAP__FUNCTION, + al.addr); + db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); /* add node to the call path tree if it doesn't exist */ -- 2.7.4
[PATCH v2 1/4] perf symbols: add dso__insert_symbol function
The current method for inserting symbols is to use the symbols__insert function. However symbols__insert does not update the dso symbol cache. This causes problems in the following scenario: 1. symbol not found at ip using dso__find_symbol 2. symbol inserted at ip using the existing symbols__insert function 3. symbol still not found at ip using dso__find_symbol because cache isn't updated. This is undesired behavior. The undesired behavior in (3) is addressed by creating a new function, dso__insert_symbol to both insert the symbol and update the symbol cache if necessary. If dso__insert_symbol is used in (2) instead of symbols__insert, then the undesired behavior in (3) is avoided. Signed-off-by: Chris Phlipot --- tools/perf/util/symbol.c | 12 tools/perf/util/symbol.h | 3 +++ 2 files changed, 15 insertions(+) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 2946295..21af8e8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -413,6 +413,18 @@ void dso__reset_find_symbol_cache(struct dso *dso) } } +void dso__insert_symbol(struct dso *dso, enum map_type type, struct symbol *sym) +{ + symbols__insert(&dso->symbols[type], sym); + + /* update the symbol cache if necessary */ + if (dso->last_find_result[type].addr >= sym->start && + (dso->last_find_result[type].addr < sym->end || + sym->start == sym->end)) { + dso->last_find_result[type].symbol = sym; + } +} + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr) { diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 07211c2..2b5e4ed 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -246,6 +246,9 @@ int __dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, int dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, symbol_filter_t filter); +void dso__insert_symbol(struct dso *dso, enum map_type type, + struct symbol *sym); + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr); struct symbol *dso__find_symbol_by_name(struct dso *dso, enum map_type type, -- 2.7.4
[PATCH 2/4] perf script: fix symbol insertion behavior in db-export
Use the dso__insert_symbol function instead of symbols__insert in order to properly update the dso symbol cache. If the cache is not updated, then duplicate symbols can be unintentionally created, inserted, and exported. This change prevents duplicate symbols from being exported due to dso__find_symbol using a stale symbol cache. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index f8e3057..2ef1f69 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -260,8 +260,7 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al, if (!al->sym) { al->sym = symbol__new(al->addr, 0, 0, "unknown"); if (al->sym) - symbols__insert(&dso->symbols[al->map->type], - al->sym); + dso__insert_symbol(dso, al->map->type, al->sym); } if (al->sym) { -- 2.7.4
[PATCH 3/4] perf script: fix callchain addresses in db-export
Remove the call to map_ip, because it has already been called when assembling the callchain. Calling it a second time can result in incorrect addresses being used. This can have effects such as duplicate symbols being created and exported. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 2ef1f69..8ca4186 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -324,10 +324,7 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.sym = node->sym; al.map = node->map; al.machine = machine; - if (al.map) - al.addr = al.map->map_ip(al.map, node->ip); - else - al.addr = node->ip; + al.addr = node->ip; db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); -- 2.7.4
[PATCH 4/4] perf script: fix export of callchains with recursion in db-export
When an IP with an unresolved symbol occurs in the callchain more than once (ie. recursion), then duplicate symbols can be created because the callchain nodes are never updated after they are first created. To fix this issue we call dso__find_symbol whenever we encounter a NULL symbol, in case we already added a symbol at that IP since we started traversing the callchain. This change prevents duplicate symbols from being exported when duplicate IPs are present in the callchain. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 8ca4186..424c77c 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -326,6 +326,10 @@ static struct call_path *call_path_from_sample(struct db_export *dbe, al.machine = machine; al.addr = node->ip; + if(al.map && !al.sym) + al.sym = dso__find_symbol(al.map->dso, MAP__FUNCTION, + al.addr); + db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); /* add node to the call path tree if it doesn't exist */ -- 2.7.4
[PATCH 0/4] perf script: fix duplicate symbols in db-export
This patch set contains 3 fixes for duplicate symbol creation in the db-export implementation and one new symbol API required for the fixes. commit 9c7b37cd63d0 already removed the majority of duplicates, but these fixes take care of the remaining corner cases. each patch (except for the 1st, which is a dependency for patch 2) reduces the number of duplicate symbols exported. When all patches are applied, my test workload has no more duplicate symbols being exported. Tests ran: $perf record --call-graph=dwarf stress -c 2 -t 20 $perf script -s scripts/python/export-to-postgresql.py test all callchains $psql test To show the effect of the changes we run the following query before/after the changes on a database created using the export-to-postgresql.py script with callchains enabled. If this query returns any value greater than 1, then it means that there are duplicates present. In the test workload, at least one symbol occurs 299 times before applying the fixes: test=# select count(*) as cnt from symbols group by sym_start,sym_end,dso_id order by cnt desc limit 1; cnt - 299 (1 row) After applying the fixes no symbol occurs more than once: test=# select count(*) as cnt from symbols group by sym_start,sym_end,dso_id order by cnt desc limit 1; cnt - 1 (1 row) Chris Phlipot (4): perf symbols: add dso__insert_symbol function perf script: fix symbol insertion behavior in db-export perf script: fix callchain addresses in db-export perf script: fix export of callchains with recursion in db-export tools/perf/util/db-export.c | 12 ++-- tools/perf/util/symbol.c| 12 tools/perf/util/symbol.h| 3 +++ 3 files changed, 21 insertions(+), 6 deletions(-) -- 2.7.4
[PATCH 1/4] perf symbols: add dso__insert_symbol function
The current method for inserting symbols is to use the symbols__insert function. However symbols__insert does not update the dso symbol cache. This causes problems in the following scenario: 1. symbol not found at ip using dso__find_symbol 2. symbol inserted at ip using the existing symbols__insert function 3. symbol still not found at ip using dso__find_symbol because cache isn't updated. This is undesired behavior. The undesired behavior in (3) is addressed by creating a new function, dso__insert_symbol to both insert the symbol and update the symbol cache if necessary. If dso__insert_symbol is used in (2) instead of symbols__insert, then the undesired behavior in (3) is avoided. Signed-off-by: Chris Phlipot --- tools/perf/util/symbol.c | 12 tools/perf/util/symbol.h | 3 +++ 2 files changed, 15 insertions(+) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 2946295..21af8e8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -413,6 +413,18 @@ void dso__reset_find_symbol_cache(struct dso *dso) } } +void dso__insert_symbol(struct dso *dso, enum map_type type, struct symbol *sym) +{ + symbols__insert(&dso->symbols[type], sym); + + /* update the symbol cache if necessary */ + if (dso->last_find_result[type].addr >= sym->start && + (dso->last_find_result[type].addr < sym->end || + sym->start == sym->end)) { + dso->last_find_result[type].symbol = sym; + } +} + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr) { diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 07211c2..2b5e4ed 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -246,6 +246,9 @@ int __dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, int dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map, symbol_filter_t filter); +void dso__insert_symbol(struct dso *dso, enum map_type type, + struct symbol *sym); + struct symbol *dso__find_symbol(struct dso *dso, enum map_type type, u64 addr); struct symbol *dso__find_symbol_by_name(struct dso *dso, enum map_type type, -- 2.7.4
[tip:perf/core] perf symbols: Fix handling of zero-length symbols.
Commit-ID: 9c7b37cd63d0d910c531233209286f169993cbd9 Gitweb: http://git.kernel.org/tip/9c7b37cd63d0d910c531233209286f169993cbd9 Author: Chris Phlipot AuthorDate: Sat, 7 May 2016 02:16:59 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 9 May 2016 18:40:03 -0300 perf symbols: Fix handling of zero-length symbols. This change introduces a fix to symbols__find, so that it is able to find symbols of length zero (where start == end). The current code has the following problem: - The current implementation of symbols__find is unable to find any symbols of length zero. - The db-export framework explicitly creates zero length symbols at locations where no symbol currently exists. The combination of the two above behaviors results in behavior similar to the example below. 1. addr_location is created for a sample, but symbol is unable to be resolved. 2. db export creates an "unknown" symbol of length zero at that address and inserts it into the dso. 3. A new sample comes in at the same address, but symbol__find is unable to find the zero length symbol, so it is still unresolved. 4. db export sees the symbol is unresolved, and allocated a duplicate symbol, even though it already did this in step 2. This behavior continues every time an address without symbol information is seen, which causes a very large number of these symbols to be allocated. The effect of this fix can be observed by looking at the contents of an exported database before/after the fix (generated with scripts/python/export-to-postgresql.py) Ex. BEFORE THE CHANGE: example_db=# select count(*) from symbols; count 900213 (1 row) example_db=# select count(*) from symbols where symbols.name='unknown'; count 897355 (1 row) example_db=# select count(*) from symbols where symbols.name!='unknown'; count --- 2858 (1 row) AFTER THE CHANGE: example_db=# select count(*) from symbols; count --- 25217 (1 row) example_db=# select count(*) from symbols where name='unknown'; count --- 22359 (1 row) example_db=# select count(*) from symbols where name!='unknown'; count --- 2858 (1 row) Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462612620-25008-1-git-send-email-cphlip...@gmail.com [ Moved the test to later in the rb_tree tests, as this not the likely case ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/symbol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 415c4f6..2946295 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -301,7 +301,7 @@ static struct symbol *symbols__find(struct rb_root *symbols, u64 ip) if (ip < s->start) n = n->rb_left; - else if (ip >= s->end) + else if (ip > s->end || (ip == s->end && ip != s->start)) n = n->rb_right; else return s;
[tip:perf/core] perf script: Fix incorrect python db-export error message
Commit-ID: aff633406ca2772554bad7b37f2dfbc409b6ea74 Gitweb: http://git.kernel.org/tip/aff633406ca2772554bad7b37f2dfbc409b6ea74 Author: Chris Phlipot AuthorDate: Sat, 7 May 2016 02:17:00 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 9 May 2016 14:08:39 -0300 perf script: Fix incorrect python db-export error message Fix the error message printed when attempting and failing to create the call path root incorrectly references the call return process. This change fixes the message to properly reference the failure to create the call path root. Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1462612620-25008-2-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/scripting-engines/trace-event-python.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 091bce6..1546b74 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -1062,7 +1062,7 @@ static void set_table_handlers(struct tables *tables) tables->dbe.cpr = call_path_root__new(); if (!tables->dbe.cpr) - Py_FatalError("failed to create calls processor"); + Py_FatalError("failed to create call path root"); } tables->db_export_mode = true;
Re: [PATCH 1/2] perf tools: fix handling of zero-length symbols.
> On May 9, 2016, at 10:06 AM, Arnaldo Carvalho de Melo > wrote: > > Em Sat, May 07, 2016 at 02:16:59AM -0700, Chris Phlipot escreveu: >> This change introduces a fix to symbols__find, so that it is able to find >> symbols of length zero (where start==end) >> >> The current code has the following problem: >> -The current implementation of symbols__find is unable to find any symbols >> of length zero. >> -The db-export framework explicitly creates zero length symbols at >> locations where no symbol currently exists. >> >> The combination of the two above behaviors results in behavior similar to >> the example below. > > Ok, but you made the unlikely case be the first test, how about this one > liner instead? > > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c > index 415c4f6d98fd..7a0917569fb3 100644 > --- a/tools/perf/util/symbol.c > +++ b/tools/perf/util/symbol.c > @@ -301,7 +301,7 @@ static struct symbol *symbols__find(struct rb_root > *symbols, u64 ip) > >if (ip < s->start) >n = n->rb_left; > -else if (ip >= s->end) > +else if (ip > s->end || (ip == s->end && ip != s->start) >n = n->rb_right; >else >return s; > > I agree. This looks like a better fix. >> +++ b/tools/perf/util/symbol.c >> @@ -299,7 +299,9 @@ static struct symbol *symbols__find(struct rb_root >> *symbols, u64 ip) >>while (n) { >>struct symbol *s = rb_entry(n, struct symbol, rb_node); >> >> -if (ip < s->start) >> +if (ip == s->start && s->start == s->end) >> +return s; >> +else if (ip < s->start) >>n = n->rb_left; >>else if (ip >= s->end) >>n = n->rb_right; >> -- >> 2.7.4
[PATCH 2/2] perf script: fix incorrect python db-export error message
fix the error message printed when attempting and failing to create the call path root incorrectly references the call return process. this change fixes the message to properly reference the failure to create the call path root. Signed-off-by: Chris Phlipot --- tools/perf/util/scripting-engines/trace-event-python.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 091bce6..1546b74 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -1062,7 +1062,7 @@ static void set_table_handlers(struct tables *tables) tables->dbe.cpr = call_path_root__new(); if (!tables->dbe.cpr) - Py_FatalError("failed to create calls processor"); + Py_FatalError("failed to create call path root"); } tables->db_export_mode = true; -- 2.7.4
[PATCH 1/2] perf tools: fix handling of zero-length symbols.
This change introduces a fix to symbols__find, so that it is able to find symbols of length zero (where start==end) The current code has the following problem: -The current implementation of symbols__find is unable to find any symbols of length zero. -The db-export framework explicitly creates zero length symbols at locations where no symbol currently exists. The combination of the two above behaviors results in behavior similar to the example below. 1. addr_location is created for a sample, but symbol is unable to be resolved. 2. db export creates an "unknown" symbol of length zero at that address and inserts it into the dso. 3. A new sample comes in at the same address, but symbol__find is unable to find the zero length symbol, so it is still unresolved. 4. db export sees the symbol is unresolved, and allocated a duplicate symbol, even though it already did this in step 2. This behavior continues every time an address without symbol information is seen, which causes a very large number of these symbols to be allocated. The effect of this fix can be observed by looking at the contents of an exported database before/after the fix (generated with scripts/python/export-to-postgresql.py) Ex. BEFORE THE CHANGE: example_db=# select count(*) from symbols; count 900213 (1 row) example_db=# select count(*) from symbols where symbols.name='unknown'; count 897355 (1 row) example_db=# select count(*) from symbols where symbols.name!='unknown'; count --- 2858 (1 row) AFTER THE CHANGE: example_db=# select count(*) from symbols; count --- 25217 (1 row) example_db=# select count(*) from symbols where name='unknown'; count --- 22359 (1 row) example_db=# select count(*) from symbols where name!='unknown'; count --- 2858 (1 row) Signed-off-by: Chris Phlipot --- tools/perf/util/symbol.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 415c4f6..e42bf9a 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -299,7 +299,9 @@ static struct symbol *symbols__find(struct rb_root *symbols, u64 ip) while (n) { struct symbol *s = rb_entry(n, struct symbol, rb_node); - if (ip < s->start) + if (ip == s->start && s->start == s->end) + return s; + else if (ip < s->start) n = n->rb_left; else if (ip >= s->end) n = n->rb_right; -- 2.7.4
[tip:perf/core] perf script: Update export-to-postgresql to support callchain export
Commit-ID: 3521f3bc9dae4a79cfb9cc9ffcf6d961bbb7cbac Gitweb: http://git.kernel.org/tip/3521f3bc9dae4a79cfb9cc9ffcf6d961bbb7cbac Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:11 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 13:00:55 -0300 perf script: Update export-to-postgresql to support callchain export Update the export-to-postgresql.py to support the newly introduced callchain export. callchains are added into the existing call_paths table and can now be associated with samples when the "callpaths" commandline option is used with the script. Ex.: $ perf script -s export-to-postgresql.py example_db all callchains Includes the following changes to enable callchain export via the python export APIs: - Add the "callchains" commandline option, which is used to enable callchain export by setting the perf_db_export_callchains global - Add perf_db_export_callchains checks for call_path table creation and population. - Add call_path_id to samples_table to conform with the new API example usage and output using a small test app: test_app.c: volatile int x = 0; void inc_x_loop() { int i; for(i=0; i<1; i++) x++; } void a() { inc_x_loop(); } void b() { inc_x_loop(); } int main() { a(); b(); return 0; } example usage: $ gcc -g -O0 test_app.c $ perf record --call-graph=dwarf ./a.out [ perf record: Woken up 77 times to write data ] [ perf record: Captured and wrote 19.373 MB perf.data (2404 samples) ] $ perf script -s scripts/python/export-to-postgresql.py example_db all callchains $ psql example_db example_db=# SELECT (SELECT name FROM symbols WHERE id = cps.symbol_id) as symbol, (SELECT name FROM symbols WHERE id = (SELECT symbol_id from call_paths where id = cps.parent_id)) as parent_symbol, sum(period) as event_count FROM samples join call_paths as cps on call_path_id = cps.id GROUP BY cps.id,evsel_id ORDER BY event_count DESC LIMIT 5; symbol | parent_symbol | event_count --+--+- inc_x_loop | a| 734250982 inc_x_loop | b| 731028057 unknown | unknown | 1335858 task_tick_fair | scheduler_tick | 1238842 update_wall_time | tick_do_update_jiffies64 | 650373 (5 rows) The above data shows total "self time" in cycles for each call path that was sampled. It is intended to demonstrate how it accounts separately for the two ways to reach the "inc_x_loop" function(via "a" and "b"). Recursive common table expressions can be used as well to get cumulative time spent in a function as well, but that is beyond the scope of this basic example. Signed-off-by: Chris Phlipot Acked-by: Adrian Hunter Cc: Jiri Olsa Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-7-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/scripts/python/export-to-postgresql.py | 47 +++ 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index 6f0ca68..7656ff8 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -223,11 +223,14 @@ sys.path.append(os.environ['PERF_EXEC_PATH'] + \ perf_db_export_mode = True perf_db_export_calls = False +perf_db_export_callchains = False + def usage(): - print >> sys.stderr, "Usage is: export-to-postgresql.py [] []" + print >> sys.stderr, "Usage is: export-to-postgresql.py [] [] []" print >> sys.stderr, "where:columns 'all' or 'branches'" - print >> sys.stderr, " calls 'calls' => create calls table" + print >> sys.stderr, " calls 'calls' => create calls and call_paths table" + print >> sys.stderr, " callchains 'callchains' => create call_paths table" raise Exception("Too few arguments") if (len(sys.argv) < 2): @@ -245,9 +248,11 @@ if columns not in ("all", "branches"): branches = (columns == "branches") -if (len(sys.argv) >= 4): - if (sys.argv[3] == "calls"): +for i in range(3,len(sys.argv)): + if (sys.argv[i] == "calls"): perf_db_export_c
[tip:perf/core] perf script: Expose usage of the callchain db export via the python api
Commit-ID: 2c15f5eb04e9e7e19a2c8be6b50c63a4c6062a44 Gitweb: http://git.kernel.org/tip/2c15f5eb04e9e7e19a2c8be6b50c63a4c6062a44 Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:10 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 13:00:54 -0300 perf script: Expose usage of the callchain db export via the python api This change allows python scripts to be able to utilize the recent changes to the db export api allowing the export of call_paths derived from sampled callchains. These call paths are also now associated with the samples from which they were derived. - This feature is enabled by setting "perf_db_export_callchains" to true - When enabled, samples that have callchain information will have the callchains exported via call_path_table - The call_path_id field is added to sample_table to enable association of samples with the corresponding callchain stored in the call paths table. A call_path_id of 0 will be exported if there is no corresponding callchain. - When "perf_db_export_callchains" and "perf_db_export_calls" are both set to True, the call path root data structure will be shared. This prevents duplicating of data and call path ids that would result from building two separate call path trees in memory. - The call_return_processor structure definition was relocated to the header file to make its contents visible to db-export.c. This enables the sharing of call path trees between the two features, as mentioned above. This change is visible to python scripts using the python db export api. The change is backwards compatible with scripts written against the previous API, assuming that the scripts model the sample_table function after the one in export-to-postgresql.py script by allowing for additional arguments to be added in the future. ie. using *x as the final argument of the sample_table function. Signed-off-by: Chris Phlipot Acked-by: Adrian Hunter Cc: Jiri Olsa Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-6-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- .../util/scripting-engines/trace-event-python.c| 35 -- tools/perf/util/thread-stack.c | 13 tools/perf/util/thread-stack.h | 14 - 3 files changed, 46 insertions(+), 16 deletions(-) diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 7bb8592..091bce6 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -682,7 +682,7 @@ static int python_export_sample(struct db_export *dbe, struct tables *tables = container_of(dbe, struct tables, dbe); PyObject *t; - t = tuple_new(21); + t = tuple_new(22); tuple_set_u64(t, 0, es->db_id); tuple_set_u64(t, 1, es->evsel->db_id); @@ -705,6 +705,7 @@ static int python_export_sample(struct db_export *dbe, tuple_set_u64(t, 18, es->sample->data_src); tuple_set_s32(t, 19, es->sample->flags & PERF_BRANCH_MASK); tuple_set_s32(t, 20, !!(es->sample->flags & PERF_IP_FLAG_IN_TX)); + tuple_set_u64(t, 21, es->call_path_id); call_object(tables->sample_handler, t, "sample_table"); @@ -999,8 +1000,10 @@ static void set_table_handlers(struct tables *tables) { const char *perf_db_export_mode = "perf_db_export_mode"; const char *perf_db_export_calls = "perf_db_export_calls"; - PyObject *db_export_mode, *db_export_calls; + const char *perf_db_export_callchains = "perf_db_export_callchains"; + PyObject *db_export_mode, *db_export_calls, *db_export_callchains; bool export_calls = false; + bool export_callchains = false; int ret; memset(tables, 0, sizeof(struct tables)); @@ -1017,6 +1020,7 @@ static void set_table_handlers(struct tables *tables) if (!ret) return; + /* handle export calls */ tables->dbe.crp = NULL; db_export_calls = PyDict_GetItemString(main_dict, perf_db_export_calls); if (db_export_calls) { @@ -1034,6 +1038,33 @@ static void set_table_handlers(struct tables *tables) Py_FatalError("failed to create calls processor"); } + /* handle export callchains */ + tables->dbe.cpr = NULL; + db_export_callchains = PyDict_GetItemString(main_dict, + perf_db_export_callchains); + if (db_export_callchains) { + ret = PyObject_IsTrue(db_export_callchains); + if (ret == -1) + handler_call_die(perf_db_export_callchains); + export_callchains = !!ret; + }
[tip:perf/core] perf script: Add call path id to exported sample in db export
Commit-ID: 568850eaad8cdd3783c3347623dfcad4f043cf1c Gitweb: http://git.kernel.org/tip/568850eaad8cdd3783c3347623dfcad4f043cf1c Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:09 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 13:00:53 -0300 perf script: Add call path id to exported sample in db export The exported sample now contains a reference to the call_path_id that represents its callchain. While callchains themselves are nice to have, being able to associate them with samples makes them much more useful, and can allow for such things as determining how much cumulative time is spent in a particular function. This information is normally possible to get from the call return processor. However, when doing normal sampling, call/return information is not available, thus necessitating the need for associating samples directly with call paths. This commit include changes to db-export layer to make this information available for subsequent patches in this change set, but by itself, does not make any changes visible to the user. Signed-off-by: Chris Phlipot Acked-by: Adrian Hunter Cc: Jiri Olsa Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-5-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/db-export.c | 4 +++- tools/perf/util/db-export.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index a0ca90c..f8e3057 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -408,8 +408,10 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, struct call_path *cp = call_path_from_sample(dbe, al->machine, thread, sample, evsel); - if (cp) + if (cp) { db_export__call_path(dbe, cp); + es.call_path_id = cp->db_id; + } } if ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) && diff --git a/tools/perf/util/db-export.h b/tools/perf/util/db-export.h index f5daf55..67bc6b8 100644 --- a/tools/perf/util/db-export.h +++ b/tools/perf/util/db-export.h @@ -44,6 +44,7 @@ struct export_sample { u64 addr_dso_db_id; u64 addr_sym_db_id; u64 addr_offset; /* addr offset from symbol start */ + u64 call_path_id; }; struct db_export {
[tip:perf/core] perf script: Enable db export to output sampled callchains
Commit-ID: 0a3eba3ad613fa9d5af754f7ae8c4b46047cb2a7 Gitweb: http://git.kernel.org/tip/0a3eba3ad613fa9d5af754f7ae8c4b46047cb2a7 Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:08 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 13:00:52 -0300 perf script: Enable db export to output sampled callchains This change enables the db export api to export callchains. This is accomplished by adding callchains obtained from samples to the call_path_root structure and exporting them via the current call path export API. While the current API does support exporting call paths, this is not supported when sampling. This commit addresses that missing feature by allowing the export of call paths when callchains are present in samples. Summary: - This feature is activated by initializing the call_path_root member inside the db_export structure to a non-null value. - Callchains are resolved with thread__resolve_callchain() and then stored and exported by adding a call path under call path root. - Symbol and DSO for each callchain node are exported via db_ids_from_al() This commit puts in place infrastructure to be used by subsequent commits, and by itself, does not introduce any user-visible changes. Signed-off-by: Chris Phlipot Acked-by: Adrian Hunter Cc: Jiri Olsa Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-4-git-send-email-cphlip...@gmail.com [ Made adjustments suggested by Adrian Hunter, see thread via this cset's Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/db-export.c | 82 + tools/perf/util/db-export.h | 2 ++ 2 files changed, 84 insertions(+) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 4fc607c..a0ca90c 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -23,6 +23,7 @@ #include "event.h" #include "util.h" #include "thread-stack.h" +#include "callchain.h" #include "call-path.h" #include "db-export.h" @@ -277,6 +278,79 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al, return 0; } +static struct call_path *call_path_from_sample(struct db_export *dbe, + struct machine *machine, + struct thread *thread, + struct perf_sample *sample, + struct perf_evsel *evsel) +{ + u64 kernel_start = machine__kernel_start(machine); + struct call_path *current = &dbe->cpr->call_path; + enum chain_order saved_order = callchain_param.order; + int err; + + if (!symbol_conf.use_callchain || !sample->callchain) + return NULL; + + /* +* Since the call path tree must be built starting with the root, we +* must use ORDER_CALL for call chain resolution, in order to process +* the callchain starting with the root node and ending with the leaf. +*/ + callchain_param.order = ORDER_CALLER; + err = thread__resolve_callchain(thread, &callchain_cursor, evsel, + sample, NULL, NULL, + sysctl_perf_event_max_stack); + if (err) { + callchain_param.order = saved_order; + return NULL; + } + callchain_cursor_commit(&callchain_cursor); + + while (1) { + struct callchain_cursor_node *node; + struct addr_location al; + u64 dso_db_id = 0, sym_db_id = 0, offset = 0; + + memset(&al, 0, sizeof(al)); + + node = callchain_cursor_current(&callchain_cursor); + if (!node) + break; + /* +* Handle export of symbol and dso for this node by +* constructing an addr_location struct and then passing it to +* db_ids_from_al() to perform the export. +*/ + al.sym = node->sym; + al.map = node->map; + al.machine = machine; + if (al.map) + al.addr = al.map->map_ip(al.map, node->ip); + else + al.addr = node->ip; + + db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); + + /* add node to the call path tree if it doesn't exist */ + current = call_path__findnew(dbe->cpr, current, +al.sym, node->ip, +kernel_start); + + callchain_cursor_advance(&callchain_cursor); + } + + /* Reset the callchain order to its prior value. */ + c
[tip:perf/core] perf tools: Refactor code to move call path handling out of thread-stack
Commit-ID: 451db12617bc6ff1bb8ed456ed4f257594134255 Gitweb: http://git.kernel.org/tip/451db12617bc6ff1bb8ed456ed4f257594134255 Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:07 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 13:00:43 -0300 perf tools: Refactor code to move call path handling out of thread-stack Move the call path handling code out of thread-stack.c and thread-stack.h to allow other components that are not part of thread-stack to create call paths. Summary: - Create call-path.c and call-path.h and add them to the build. - Move all call path related code out of thread-stack.c and thread-stack.h and into call-path.c and call-path.h. - A small subset of structures and functions are now visible through call-path.h, which is required for thread-stack.c to continue to compile. This change is a prerequisite for subsequent patches in this change set and by itself contains no user-visible changes. Signed-off-by: Chris Phlipot Acked-by: Adrian Hunter Cc: Jiri Olsa Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-3-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/Build | 1 + tools/perf/util/call-path.c| 122 tools/perf/util/call-path.h| 77 + tools/perf/util/db-export.c| 1 + .../util/scripting-engines/trace-event-python.c| 1 + tools/perf/util/thread-stack.c | 126 + tools/perf/util/thread-stack.h | 25 +--- 7 files changed, 204 insertions(+), 149 deletions(-) diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 90229a8..027bb2b 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -74,6 +74,7 @@ libperf-y += srcline.o libperf-y += data.o libperf-y += tsc.o libperf-y += cloexec.o +libperf-y += call-path.o libperf-y += thread-stack.o libperf-$(CONFIG_AUXTRACE) += auxtrace.o libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/ diff --git a/tools/perf/util/call-path.c b/tools/perf/util/call-path.c new file mode 100644 index 000..904a170 --- /dev/null +++ b/tools/perf/util/call-path.c @@ -0,0 +1,122 @@ +/* + * call-path.h: Manipulate a tree data structure containing function call paths + * Copyright (c) 2014, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#include +#include + +#include "util.h" +#include "call-path.h" + +static void call_path__init(struct call_path *cp, struct call_path *parent, + struct symbol *sym, u64 ip, bool in_kernel) +{ + cp->parent = parent; + cp->sym = sym; + cp->ip = sym ? 0 : ip; + cp->db_id = 0; + cp->in_kernel = in_kernel; + RB_CLEAR_NODE(&cp->rb_node); + cp->children = RB_ROOT; +} + +struct call_path_root *call_path_root__new(void) +{ + struct call_path_root *cpr; + + cpr = zalloc(sizeof(struct call_path_root)); + if (!cpr) + return NULL; + call_path__init(&cpr->call_path, NULL, NULL, 0, false); + INIT_LIST_HEAD(&cpr->blocks); + return cpr; +} + +void call_path_root__free(struct call_path_root *cpr) +{ + struct call_path_block *pos, *n; + + list_for_each_entry_safe(pos, n, &cpr->blocks, node) { + list_del(&pos->node); + free(pos); + } + free(cpr); +} + +static struct call_path *call_path__new(struct call_path_root *cpr, + struct call_path *parent, + struct symbol *sym, u64 ip, + bool in_kernel) +{ + struct call_path_block *cpb; + struct call_path *cp; + size_t n; + + if (cpr->next < cpr->sz) { + cpb = list_last_entry(&cpr->blocks, struct call_path_block, + node); + } else { + cpb = zalloc(sizeof(struct call_path_block)); + if (!cpb) + return NULL; + list_add_tail(&cpb->node, &cpr->blocks); + cpr->sz += CALL_PATH_BLOCK_SIZE; + } + + n = cpr->next++ & CALL_PATH_BLOCK_MASK; + cp = &cpb->cp[n]; + + call_path__init(cp, parent, sym, ip, in_kernel); + + return cp; +} + +struct
[tip:perf/core] perf callchain: Fix incorrect ordering of entries
Commit-ID: 9919a65ec532799544dfdfd6df6f994b74c12b42 Gitweb: http://git.kernel.org/tip/9919a65ec532799544dfdfd6df6f994b74c12b42 Author: Chris Phlipot AuthorDate: Thu, 28 Apr 2016 01:19:06 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 6 May 2016 08:59:47 -0300 perf callchain: Fix incorrect ordering of entries The existing implementation of thread__resolve_callchain, under certain circumstances, can assemble callchain entries in the incorrect order. The callchain entries are resolved incorrectly for a sample when all of the following conditions are met: 1. callchain_param.order is set to ORDER_CALLER 2. thread__resolve_callchain_sample is able to resolve callchain entries for the sample. 3. unwind__get_entries is also able to resolve callchain entries for the sample. The fix is accomplished by reversing the order in which thread__resolve_callchain_sample and unwind__get_entries are called when callchain_param.order is set to ORDER_CALLER. Unwind specific code from thread__resolve_callchain is also moved into a new static function to improve readability of the fix. How to Reproduce the Existing Bug: Modifying perf script to print call trees in the opposite order or applying the remaining patches from this series and comparing the results output from export-to-postgtresql.py are the easiest ways to see the bug, however it can still be seen in current builds using perf report. Here is how i can reproduce the bug using perf report: # perf record --call-graph=dwarf stress -c 1 -t 5 when i run this command: # perf report --call-graph=flat,0,0,callee This callchain, containing kernel (handle_irq_event, etc) and userspace samples (__libc_start_main, etc) is contained in the output, which looks correct (callee order): gen8_irq_handler handle_irq_event_percpu handle_irq_event handle_edge_irq handle_irq do_IRQ ret_from_intr __random rand 0x558f2a04dded 0x558f2a04c774 __libc_start_main 0x558f2a04dcd9 Now run this command using caller order: # perf report --call-graph=flat,0,0,caller It is expected to see the exact reverse of the above when using caller order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom) in the output, but it is nowhere to be found. instead you see this: ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random Notice how internally the kernel symbols are reversed and the user space symbols are reversed, but the kernel symbols still appear above the user space symbols. if this patch is applied and perf script is re-run, you will see the expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom): 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler Signed-off-by: Chris Phlipot Tested-by: Arnaldo Carvalho de Melo Acked-by: Jiri Olsa Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 56 ++- 1 file changed, 41 insertions(+), 15 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 8c7bf4d..639a290 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1817,8 +1817,6 @@ static int thread__resolve_callchain_sample(struct thread *thread, int skip_idx = -1; int first_call = 0; - callchain_cursor_reset(cursor); - if (perf_evsel__has_branch_callstack(evsel)) { err = resolve_lbr_callchain_sample(thread, cursor, sample, parent, root_al, max_stack); @@ -1929,20 +1927,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) entry->map, entry->sym); } -int thread__resolve_callchain(struct thread *thread, - struct callchain_cursor *cursor, - struct perf_evsel *evsel, - struct perf_sample *sample, -
Re: [PATCH 2/5] perf script: extend db-export api to include callchains for samples
Hi Adrian, I have just resubmitted these changes as a new patch set, which I believe should address most of your concerns. Please review the new patch set instead of continuing with this one. https://lkml.org/lkml/2016/4/28/75 Thanks, Chris On 04/22/2016 09:41 PM, Chris Phlipot wrote: On 04/22/2016 12:56 AM, Adrian Hunter wrote: The call_paths table already has symbol_id which belongs uniquely to a DSO, so why do we need dso_id as well? If the symbol_id is 0 because the IP could not be resolved to a symbol, this is not necessarily a valid assumption. Without a dso_id in the call_paths table, it is not possible to resolve the dso when symbol information is missing. the db_export api currently does not have enough information to match a DSO with an IP. It is often useful to still have the call path associated with a DSO even if there is no symbol, which i why i recommend keeping the dso_id in the call_paths table. Why do you need a callback? Seems like the only thing you need from thread-stack.c is the call path tree. You could move that to its own .c/.h files and then process the call chain in db-export.c My original intent was to reuse existing code with minimal changes and conform the existing design patterns they used. Thread-stack.c, for example, currently uses a callback to populate the call_return table, so I used a callback as well to populate the call_path table. I am open to making this change if it is believed it will result in a cleaner implementation. Also a list of changes like the one above heavily implies you are not obeying the one patch == one change rule. Please try to make patches that only do one thing and also run checkpatch. While i can split this into a few smaller patches there is only really justification for applying all of them all together. If this is still preferred i can resubmit this in smaller parts. If you don't mind, I'll let you respond to my comments before I comment on any other patches. Let me know if you have any additional comments. Thanks, Chris
[PATCH 3/6] perf script: enable db export to output sampled callchains
This change enables the db export api to export callchains. This is accomplished by adding callchains obtained from samples to the call_path_root structure and exporting them via the current call path export API. While the current API does support exporting call paths, this is not supported when sampling. This commit addresses that missing feature by allowing the export of call paths when callchains are present in samples. Summary: -This feature is activated by initializing the call_path_root member inside the db_export structure to a non-null value. -Callchains are resolved with thread__resolve_callchain() and then stored and exported by adding a call path under call path root. -Symbol and DSO for each callchain node are exported via db_ids_from_al() This commit puts in place infrastructure to be used by subsequent commits, and by itself, does not introduce any user-visible changes. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 86 + tools/perf/util/db-export.h | 2 ++ 2 files changed, 88 insertions(+) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 4fc607c..cb96591 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -15,6 +15,8 @@ #include +#include + #include "evsel.h" #include "machine.h" #include "thread.h" @@ -23,6 +25,7 @@ #include "event.h" #include "util.h" #include "thread-stack.h" +#include "callchain.h" #include "call-path.h" #include "db-export.h" @@ -277,6 +280,81 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al, return 0; } +static struct call_path *call_path_from_sample(struct db_export *dbe, + struct machine *machine, + struct thread *thread, + struct perf_sample *sample, + struct perf_evsel *evsel + ) +{ + u64 kernel_start = machine__kernel_start(machine); + + struct call_path *current = &dbe->cpr->call_path; + enum chain_order saved_order = callchain_param.order; + + int err = 0; + + if (!symbol_conf.use_callchain || !sample->callchain) + return NULL; + + /* Since the call path tree must be built starting with the root, we +* must use ORDER_CALL for call chain resolution, in order to process +* the callchain starting with the root node and ending with the leaf. +*/ + callchain_param.order = ORDER_CALLER; + err = thread__resolve_callchain(thread, &callchain_cursor, evsel, + sample, NULL, NULL, + PERF_MAX_STACK_DEPTH); + if (err) { + callchain_param.order = saved_order; + return NULL; + } + callchain_cursor_commit(&callchain_cursor); + + while (1) { + struct callchain_cursor_node *node; + struct addr_location al; + u64 dso_db_id = 0, sym_db_id = 0, offset = 0; + + memset(&al, 0, sizeof(al)); + + node = callchain_cursor_current(&callchain_cursor); + if (!node) + break; + + /* handle export of symbol and dso for this node by +* constructing an addr_location struct and then passing it to +* db_ids_from_al() to perform the export. +*/ + al.sym = node->sym; + al.map = node->map; + al.machine = machine; + if (al.map) + al.addr = al.map->map_ip(al.map, node->ip); + else + al.addr = node->ip; + + db_ids_from_al(dbe, &al, &dso_db_id, &sym_db_id, &offset); + + /* add node to the call path tree if it doesn't exist */ + current = call_path__findnew(dbe->cpr, current, +al.sym, node->ip, +kernel_start); + + callchain_cursor_advance(&callchain_cursor); + } + + /* Reset the callchain order to its prior value. */ + callchain_param.order = saved_order; + + if (current == &dbe->cpr->call_path) { + /* Bail because the callchain was empty. */ + return NULL; + } + + return current; +} + int db_export__branch_type(struct db_export *dbe, u32 branch_type, const char *name) { @@ -330,6 +408,14 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, if (err) goto out_pu
[PATCH 1/6] perf tools: fix incorrect ordering of callchain entries
The existing implementation of thread__resolve_callchain, under certain circumstances, can assemble callchain entries in the incorrect order. The callchain entries are resolved incorrectly for a sample when all of the following conditions are met: 1. callchain_param.order is set to ORDER_CALLER 2. thread__resolve_callchain_sample is able to resolve callchain entries for the sample. 3. unwind__get_entries is also able to resolve callchain entries for the sample. The fix is accomplished by reversing the order in which thread__resolve_callchain_sample and unwind__get_entries are called when callchain_param.order is set to ORDER_CALLER. Unwind specific code from thread__resolve_callchain is also moved into a new static function to improve readability of the fix. How to Reproduce the Existing Bug: Modifying perf script to print call trees in the opposite order or applying the remaining patches from this series and comparing the results output from export-to-postgtresql.py are the easiest ways to see the bug, however it can still be seen in current builds using perf report. Here is how i can reproduce the bug using perf report: $ ./perf record --call-graph=dwarf stress -c 1 -t 5 when i run this command: $./perf report --call-graph=flat,0,0,callee This callchain is contained in the output, which looks correct (callee order): gen8_irq_handler handle_irq_event_percpu handle_irq_event handle_edge_irq handle_irq do_IRQ ret_from_intr __random rand 0x558f2a04dded 0x558f2a04c774 __libc_start_main 0x558f2a04dcd9 Now run this command using caller order: $./perf report --call-graph=flat,0,0,caller It is expected to see the exact reverse of the above when using caller order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom) in the output, but it is nowhere to be found. instead you see this: ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random Notice how internally the kernel symbols are reversed and the user space symbols are reversed, but the kernel symbols still appear above the user space symbols. if this patch is applied and perf script is re-run, you will see the expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom): 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler Signed-off-by: Chris Phlipot --- tools/perf/util/machine.c | 56 ++- 1 file changed, 41 insertions(+), 15 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 2cb95bb..baec208 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1812,8 +1812,6 @@ static int thread__resolve_callchain_sample(struct thread *thread, int skip_idx = -1; int first_call = 0; - callchain_cursor_reset(cursor); - if (perf_evsel__has_branch_callstack(evsel)) { err = resolve_lbr_callchain_sample(thread, cursor, sample, parent, root_al, max_stack); @@ -1924,20 +1922,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) entry->map, entry->sym); } -int thread__resolve_callchain(struct thread *thread, - struct callchain_cursor *cursor, - struct perf_evsel *evsel, - struct perf_sample *sample, - struct symbol **parent, - struct addr_location *root_al, - int max_stack) +static int thread__resolve_callchain_unwind(struct thread *thread, + struct callchain_cursor *cursor, + struct perf_evsel *evsel, + struct perf_sample *sample, + int max_stack) { - int ret = thread__resolve_callchain_sample(thread, cursor, evsel, - sample, parent, -
[PATCH 5/6] perf script: expose usage of the callchain db export via the python api
This change allows python scripts to be able to utilize the recent changes to the db export api allowing the export of call_paths derived from sampled callchains. These call paths are also now associated with the samples from which they were derived. -This feature is enabled by setting "perf_db_export_callchains" to true -When enabled, samples that have callchain information will have the callchains exported via call_path_table -The call_path_id field is added to sample_table to enable association of samples with the corresponding callchain stored in the call paths table. A call_path_id of 0 will be exported if there is no corresponding callchain. -When "perf_db_export_callchains" and "perf_db_export_calls" are both set to True, the call path root data structure will be shared. This prevents duplicating of data and call path ids that would result from building two separate call path trees in memory. -The call_return_processor structure definition was relocated to the header file to make its contents visible to db-export.c. This enables the sharing of call path trees between the two features, as mentioned above. This change is visible to python scripts using the python db export api. The change is backwards compatible with scripts written against the previous API, assuming that the scripts model the sample_table function after the one in export-to-postgresql.py script by allowing for additional arguments to be added in the future. ie. using *x as the final argument of the sample_table function. Signed-off-by: Chris Phlipot --- .../util/scripting-engines/trace-event-python.c| 35 -- tools/perf/util/thread-stack.c | 13 tools/perf/util/thread-stack.h | 14 - 3 files changed, 46 insertions(+), 16 deletions(-) diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 7bb8592..091bce6 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -682,7 +682,7 @@ static int python_export_sample(struct db_export *dbe, struct tables *tables = container_of(dbe, struct tables, dbe); PyObject *t; - t = tuple_new(21); + t = tuple_new(22); tuple_set_u64(t, 0, es->db_id); tuple_set_u64(t, 1, es->evsel->db_id); @@ -705,6 +705,7 @@ static int python_export_sample(struct db_export *dbe, tuple_set_u64(t, 18, es->sample->data_src); tuple_set_s32(t, 19, es->sample->flags & PERF_BRANCH_MASK); tuple_set_s32(t, 20, !!(es->sample->flags & PERF_IP_FLAG_IN_TX)); + tuple_set_u64(t, 21, es->call_path_id); call_object(tables->sample_handler, t, "sample_table"); @@ -999,8 +1000,10 @@ static void set_table_handlers(struct tables *tables) { const char *perf_db_export_mode = "perf_db_export_mode"; const char *perf_db_export_calls = "perf_db_export_calls"; - PyObject *db_export_mode, *db_export_calls; + const char *perf_db_export_callchains = "perf_db_export_callchains"; + PyObject *db_export_mode, *db_export_calls, *db_export_callchains; bool export_calls = false; + bool export_callchains = false; int ret; memset(tables, 0, sizeof(struct tables)); @@ -1017,6 +1020,7 @@ static void set_table_handlers(struct tables *tables) if (!ret) return; + /* handle export calls */ tables->dbe.crp = NULL; db_export_calls = PyDict_GetItemString(main_dict, perf_db_export_calls); if (db_export_calls) { @@ -1034,6 +1038,33 @@ static void set_table_handlers(struct tables *tables) Py_FatalError("failed to create calls processor"); } + /* handle export callchains */ + tables->dbe.cpr = NULL; + db_export_callchains = PyDict_GetItemString(main_dict, + perf_db_export_callchains); + if (db_export_callchains) { + ret = PyObject_IsTrue(db_export_callchains); + if (ret == -1) + handler_call_die(perf_db_export_callchains); + export_callchains = !!ret; + } + + if (export_callchains) { + /* +* Attempt to use the call path root from the call return +* processor, if the call return processor is in use. Otherwise, +* we allocate a new call path root. This prevents exporting +* duplicate call path ids when both are in use simultaniously. +*/ + if (tables->dbe.crp) + tables->dbe.cpr = tables->dbe.crp->cpr; + else +
[PATCH 6/6] perf script: update export-to-postgresql to support callchain export
Update the export-to-postgresql.py to support the newly introduced callchain export. callchains are added into the existing call_paths table and can now be associated with samples when the "callpaths" commandline option is used with the script. ex. $perf script -s export-to-postgresql.py example_db all callchains Includes the following changes to enable callchain export via the python export APIs: -Add the "callchains" commandline option, which is used to enable callchain export by setting the perf_db_export_callchains global -Add perf_db_export_callchains checks for call_path table creation and population. -Add call_path_id to samples_table to conform with the new API example usage and output using a small test app: test_app.c: volatile int x = 0; void inc_x_loop() { int i; for(i=0; i<1; i++) x++; } void a() { inc_x_loop(); } void b() { inc_x_loop(); } int main() { a(); b(); return 0; } example usage: $ gcc -g -O0 test_app.c $ ./perf record --call-graph=dwarf ./a.out [ perf record: Woken up 77 times to write data ] [ perf record: Captured and wrote 19.373 MB perf.data (2404 samples) ] $ ./perf script -s scripts/python/export-to-postgresql.py example_db all callchains $ psql example_db example_db=# SELECT (SELECT name FROM symbols WHERE id = cps.symbol_id) as symbol, (SELECT name FROM symbols WHERE id = (SELECT symbol_id from call_paths where id = cps.parent_id)) as parent_symbol, sum(period) as event_count FROM samples join call_paths as cps on call_path_id = cps.id GROUP BY cps.id,evsel_id ORDER BY event_count DESC LIMIT 5; symbol | parent_symbol | event_count --+--+- inc_x_loop | a| 734250982 inc_x_loop | b| 731028057 unknown | unknown | 1335858 task_tick_fair | scheduler_tick | 1238842 update_wall_time | tick_do_update_jiffies64 | 650373 (5 rows) The above data shows total "self time" in cycles for each call path that was sampled. It is intended to demonstrate how it accounts separately for the two ways to reach the "inc_x_loop" function(via "a" and "b"). Recursive common table expressions can be used as well to get cumulative time spent in a function as well, but that is beyond the scope of this basic example. Signed-off-by: Chris Phlipot --- tools/perf/scripts/python/export-to-postgresql.py | 47 +++ 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index 6f0ca68..7656ff8 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -223,11 +223,14 @@ sys.path.append(os.environ['PERF_EXEC_PATH'] + \ perf_db_export_mode = True perf_db_export_calls = False +perf_db_export_callchains = False + def usage(): - print >> sys.stderr, "Usage is: export-to-postgresql.py [] []" + print >> sys.stderr, "Usage is: export-to-postgresql.py [] [] []" print >> sys.stderr, "where:columns 'all' or 'branches'" - print >> sys.stderr, " calls 'calls' => create calls table" + print >> sys.stderr, " calls 'calls' => create calls and call_paths table" + print >> sys.stderr, " callchains 'callchains' => create call_paths table" raise Exception("Too few arguments") if (len(sys.argv) < 2): @@ -245,9 +248,11 @@ if columns not in ("all", "branches"): branches = (columns == "branches") -if (len(sys.argv) >= 4): - if (sys.argv[3] == "calls"): +for i in range(3,len(sys.argv)): + if (sys.argv[i] == "calls"): perf_db_export_calls = True + elif (sys.argv[i] == "callchains"): + perf_db_export_callchains = True else: usage() @@ -358,14 +363,16 @@ else: 'transactionbigint,' 'data_src bigint,' 'branch_typeinteger,' - 'in_tx boolean)') + 'in_tx boolean,' + 'call_path_id bigint)') -if perf_db_export_calls: +if perf_db_export_calls or perf_db_export_callchains:
[PATCH 2/6] perf tools: refractor code to move call path handling out of thread-stack
Move the call path handling code out of thread-stack.c and thread-stack.h to allow other components that are not part of thread-stack to create call paths. Summary: -Create call-path.c and call-path.h and add them to the build. -Move all call path related code out of thread-stack.c and thread-stack.h and into call-path.c and call-path.h. -A small subset of structures and functions are now visible through call-path.h, which is required for thread-stack.c to continue to compile. This change is a prerequisite for subsequent patches in this change set and by itself contains no user-visible changes. Signed-off-by: Chris Phlipot --- tools/perf/util/Build | 1 + tools/perf/util/call-path.c| 122 tools/perf/util/call-path.h| 77 + tools/perf/util/db-export.c| 1 + .../util/scripting-engines/trace-event-python.c| 1 + tools/perf/util/thread-stack.c | 126 + tools/perf/util/thread-stack.h | 25 +--- 7 files changed, 204 insertions(+), 149 deletions(-) create mode 100644 tools/perf/util/call-path.c create mode 100644 tools/perf/util/call-path.h diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 90229a8..027bb2b 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -74,6 +74,7 @@ libperf-y += srcline.o libperf-y += data.o libperf-y += tsc.o libperf-y += cloexec.o +libperf-y += call-path.o libperf-y += thread-stack.o libperf-$(CONFIG_AUXTRACE) += auxtrace.o libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/ diff --git a/tools/perf/util/call-path.c b/tools/perf/util/call-path.c new file mode 100644 index 000..904a170 --- /dev/null +++ b/tools/perf/util/call-path.c @@ -0,0 +1,122 @@ +/* + * call-path.h: Manipulate a tree data structure containing function call paths + * Copyright (c) 2014, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#include +#include + +#include "util.h" +#include "call-path.h" + +static void call_path__init(struct call_path *cp, struct call_path *parent, + struct symbol *sym, u64 ip, bool in_kernel) +{ + cp->parent = parent; + cp->sym = sym; + cp->ip = sym ? 0 : ip; + cp->db_id = 0; + cp->in_kernel = in_kernel; + RB_CLEAR_NODE(&cp->rb_node); + cp->children = RB_ROOT; +} + +struct call_path_root *call_path_root__new(void) +{ + struct call_path_root *cpr; + + cpr = zalloc(sizeof(struct call_path_root)); + if (!cpr) + return NULL; + call_path__init(&cpr->call_path, NULL, NULL, 0, false); + INIT_LIST_HEAD(&cpr->blocks); + return cpr; +} + +void call_path_root__free(struct call_path_root *cpr) +{ + struct call_path_block *pos, *n; + + list_for_each_entry_safe(pos, n, &cpr->blocks, node) { + list_del(&pos->node); + free(pos); + } + free(cpr); +} + +static struct call_path *call_path__new(struct call_path_root *cpr, + struct call_path *parent, + struct symbol *sym, u64 ip, + bool in_kernel) +{ + struct call_path_block *cpb; + struct call_path *cp; + size_t n; + + if (cpr->next < cpr->sz) { + cpb = list_last_entry(&cpr->blocks, struct call_path_block, + node); + } else { + cpb = zalloc(sizeof(struct call_path_block)); + if (!cpb) + return NULL; + list_add_tail(&cpb->node, &cpr->blocks); + cpr->sz += CALL_PATH_BLOCK_SIZE; + } + + n = cpr->next++ & CALL_PATH_BLOCK_MASK; + cp = &cpb->cp[n]; + + call_path__init(cp, parent, sym, ip, in_kernel); + + return cp; +} + +struct call_path *call_path__findnew(struct call_path_root *cpr, +struct call_path *parent, +struct symbol *sym, u64 ip, u64 ks) +{ + struct rb_node **p; + struct rb_node *node_parent = NULL; + struct call_path *cp; + bool in_kernel = ip >= ks; + + if (sym) + ip = 0; + + if (!parent) + return call_path__new(cpr, parent,
[PATCH 4/6] perf script: add call path id to exported sample in db export
The exported sample now contains a reference to the call_path_id that represents its callchain. While callchains themselves are nice to have, being able to associate them with samples makes them much more useful, and can allow for such things as determining how much cumulative time is spent in a particular function. This information is normally possible to get from the call return processor. However, when doing normal sampling, call/return information is not available, thus necessitating the need for associating samples directly with call paths. This commit include changes to db-export layer to make this information available for subsequent patches in this change set, but by itself, does not make any changes visible to the user. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c | 4 +++- tools/perf/util/db-export.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index cb96591..74f6875 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -412,8 +412,10 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, struct call_path *cp = call_path_from_sample(dbe, al->machine, thread, sample, evsel); - if (cp) + if (cp) { db_export__call_path(dbe, cp); + es.call_path_id = cp->db_id; + } } if ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) && diff --git a/tools/perf/util/db-export.h b/tools/perf/util/db-export.h index f5daf55..67bc6b8 100644 --- a/tools/perf/util/db-export.h +++ b/tools/perf/util/db-export.h @@ -44,6 +44,7 @@ struct export_sample { u64 addr_dso_db_id; u64 addr_sym_db_id; u64 addr_offset; /* addr offset from symbol start */ + u64 call_path_id; }; struct db_export { -- 2.7.4
[PATCH 0/6] perf script: export sampled callchains to database
This patch set contains a set of changes to allow the export of sampled callchains, and to associate them with samples, via the Python db export API and export-to-postgresql.py script. Call path information is currently only available in the database when call/return info is available, but not when doing normal sampling. These changes make this information available for normal sampling runs as well. Patches 2-6 are required to make this information available in the database. Patch 1 is needed to fix an existing issue where callchains are processed incorrectly which can cause the other patches to export incorrect call paths for a small percentage of samples (depending on the workload). Chris Phlipot (6): perf tools: fix incorrect ordering of callchain entries perf tools: refractor code to move call path handling out of thread-stack perf script: enable db export to output sampled callchains perf script: add call path id to exported sample in db export perf script: expose usage of the callchain db export via the python api perf script: update export-to-postgresql to support callchain export tools/perf/scripts/python/export-to-postgresql.py | 47 --- tools/perf/util/Build | 1 + tools/perf/util/call-path.c| 122 ++ tools/perf/util/call-path.h| 77 tools/perf/util/db-export.c| 89 + tools/perf/util/db-export.h| 3 + tools/perf/util/machine.c | 56 ++--- .../util/scripting-engines/trace-event-python.c| 36 +- tools/perf/util/thread-stack.c | 139 + tools/perf/util/thread-stack.h | 31 ++--- 10 files changed, 408 insertions(+), 193 deletions(-) create mode 100644 tools/perf/util/call-path.c create mode 100644 tools/perf/util/call-path.h -- 2.7.4
[tip:perf/core] perf script: Fix segfault when printing callchains
Commit-ID: e557b674a9470dae99916be6105e6780b3a072ca Gitweb: http://git.kernel.org/tip/e557b674a9470dae99916be6105e6780b3a072ca Author: Chris Phlipot AuthorDate: Tue, 19 Apr 2016 19:32:11 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 25 Apr 2016 12:49:17 -0300 perf script: Fix segfault when printing callchains This fixes a bug caused by an unitialized callchain cursor. The crash frist appeared in: 6f736735e30f ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}") The callchain cursor is a struct that contains pointers, that when uninitialized will cause unpredictable behavior (usually a crash) when trying to append to the callchain. The existing implementation has the following issues: 1. The callchain cursor used is not initialized, resulting in unpredictable behavior when used. 2. The cursor is declared on the stack. Even if it is properly initalized, the implmentation will leak memory when the function returns, since all the references to the callchain_nodes allocated by callchain_cursor_append will be lost when the cursor goes out of scope. 3. Storing the cursor on the stack is inefficient. Even if memory is properly freed when it goes out of scope, a performance penalty will be incurred due to reallocation of callchain nodes. callchain_cursor_append is designed to avoid these reallocations when an existing cursor is reused. This patch fixes the crash by replacing cursor_callchain with a reference to the global callchain_cursor which also resolves all 3 issues mentioned above. How to reproduce the crash: $ perf record --call-graph=dwarf stress -t 1 -c 1 $ perf script > /dev/null Segfault Signed-off-by: Chris Phlipot Tested-by: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Fixes: 6f736735e30f ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}") Link: http://lkml.kernel.org/r/1461119531-2529-1-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 5099740..f43b0c6 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -570,12 +570,12 @@ static void print_sample_bts(struct perf_sample *sample, /* print branch_from information */ if (PRINT_FIELD(IP)) { unsigned int print_opts = output[attr->type].print_ip_opts; - struct callchain_cursor *cursor = NULL, cursor_callchain; + struct callchain_cursor *cursor = NULL; if (symbol_conf.use_callchain && sample->callchain && - thread__resolve_callchain(al->thread, &cursor_callchain, evsel, + thread__resolve_callchain(al->thread, &callchain_cursor, evsel, sample, NULL, NULL, scripting_max_stack) == 0) - cursor = &cursor_callchain; + cursor = &callchain_cursor; if (cursor == NULL) { putchar(' '); @@ -789,12 +789,12 @@ static void process_event(struct perf_script *script, printf("%16" PRIu64, sample->weight); if (PRINT_FIELD(IP)) { - struct callchain_cursor *cursor = NULL, cursor_callchain; + struct callchain_cursor *cursor = NULL; if (symbol_conf.use_callchain && sample->callchain && - thread__resolve_callchain(al->thread, &cursor_callchain, evsel, + thread__resolve_callchain(al->thread, &callchain_cursor, evsel, sample, NULL, NULL, scripting_max_stack) == 0) - cursor = &cursor_callchain; + cursor = &callchain_cursor; putchar(cursor ? '\n' : ' '); sample__fprintf_sym(sample, al, 0, output[attr->type].print_ip_opts, cursor, stdout);
[tip:perf/core] perf script: Fix postgresql ubuntu install instructions
Commit-ID: d6632dd59b66c89724ef28e2723586d1429382aa Gitweb: http://git.kernel.org/tip/d6632dd59b66c89724ef28e2723586d1429382aa Author: Chris Phlipot AuthorDate: Tue, 19 Apr 2016 01:56:02 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 19 Apr 2016 12:36:54 -0300 perf script: Fix postgresql ubuntu install instructions The current instructions for setting up an Ubuntu system for using the export-to-postgresql.py script are incorrect. The instructions in the script have been updated to work on newer versions of ubuntu. -Add missing dependencies to apt-get command: python-pyside.qtsql, libqt4-sql-psql -Add '-s' option to createuser command to force the user to be a superuser since the command doesn't prompt as indicated in the current instructions. Tested on: Ubuntu 14.04, Ubuntu 16.04(beta) Signed-off-by: Chris Phlipot Cc: Adrian Hunter Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1461056164-14914-3-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/scripts/python/export-to-postgresql.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index 1b02cdc..6f0ca68 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -34,10 +34,9 @@ import datetime # # ubuntu: # -# $ sudo apt-get install postgresql +# $ sudo apt-get install postgresql python-pyside.qtsql libqt4-sql-psql # $ sudo su - postgres -# $ createuser -# Shall the new role be a superuser? (y/n) y +# $ createuser -s # # An example of using this script with Intel PT: #
Re: [PATCH 2/5] perf script: extend db-export api to include callchains for samples
On 04/22/2016 12:56 AM, Adrian Hunter wrote: The call_paths table already has symbol_id which belongs uniquely to a DSO, so why do we need dso_id as well? If the symbol_id is 0 because the IP could not be resolved to a symbol, this is not necessarily a valid assumption. Without a dso_id in the call_paths table, it is not possible to resolve the dso when symbol information is missing. the db_export api currently does not have enough information to match a DSO with an IP. It is often useful to still have the call path associated with a DSO even if there is no symbol, which i why i recommend keeping the dso_id in the call_paths table. Why do you need a callback? Seems like the only thing you need from thread-stack.c is the call path tree. You could move that to its own .c/.h files and then process the call chain in db-export.c My original intent was to reuse existing code with minimal changes and conform the existing design patterns they used. Thread-stack.c, for example, currently uses a callback to populate the call_return table, so I used a callback as well to populate the call_path table. I am open to making this change if it is believed it will result in a cleaner implementation. Also a list of changes like the one above heavily implies you are not obeying the one patch == one change rule. Please try to make patches that only do one thing and also run checkpatch. While i can split this into a few smaller patches there is only really justification for applying all of them all together. If this is still preferred i can resubmit this in smaller parts. If you don't mind, I'll let you respond to my comments before I comment on any other patches. Let me know if you have any additional comments. Thanks, Chris
Re: [PATCH 1/5] perf tools: fix incorrect ordering of callchain entries
Modifying perf script to print call trees in the opposite order or applying patch 2 from this series and comparing the results output from export-to-postgtresql.py are the easiest ways to see the bug, however it can still be seen in current builds using perf report. Here is how i can reproduce the bug using perf report: $ ./perf record --call-graph=dwarf stress -c 1 -t 5 when i run this command: $./perf report --call-graph=flat,0,0,callee i see this calltree contained in the output, which looks correct (callee order): gen8_irq_handler handle_irq_event_percpu handle_irq_event handle_edge_irq handle_irq do_IRQ ret_from_intr __random rand 0x558f2a04dded 0x558f2a04c774 __libc_start_main 0x558f2a04dcd9 Now I run this command: $./perf report --call-graph=flat,0,0,caller I would expect to see the exact reverse of the above when using caller order(with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom) in the output, but it is nowhere to be found. instead you see this: ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random Notice how internally the kernel symbols are reversed and the user space symbols are reversed, but the kernel symbols still appear above the user space symbols. if you apply this patch and re-run, you will see the expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom): 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler Thanks, Chris On 04/22/2016 12:59 AM, Adrian Hunter wrote: +Jiri since he wrote the original code On 22/04/16 10:55, Adrian Hunter wrote: On 19/04/16 11:56, Chris Phlipot wrote: The existing implentation implementation of thread__resolve_callchain, Remove 'implentation' under certain circumstanes, can assemble callchain entries in the 'circumstanes' -> 'circumstances' incorrect order. A the callchain entries are resolved incorrectly for a sample when all of the following conditions are met: 1. callchain_param.order is set to ORDER_CALLER 2. thread__resolve_callchain_sample is able to resolve callchain entries for the sample. 3. unwind__get_entries is also able to resolve callchain entries for the sample. The fix is accomplished by reversing the order in which thread__resolve_callchain_sample and unwind__get_entries are called when callchain_param.order is set to ORDER_CALLER. Can you give an example of the commands you used and what the call chain looked like before and after. Also please run ./scripts/checkpatch.pl Unwind specific code from thread__resolve_callchain is also moved into a new static function to improve readability of the fix. Signed-off-by: Chris Phlipot --- tools/perf/util/machine.c | 57 ++- 1 file changed, 41 insertions(+), 16 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0c4dabc..dd086c8 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1806,8 +1806,6 @@ static int thread__resolve_callchain_sample(struct thread *thread, int skip_idx = -1; int first_call = 0; - callchain_cursor_reset(cursor); - if (has_branch_callstack(evsel)) { err = resolve_lbr_callchain_sample(thread, cursor, sample, parent, root_al, max_stack); @@ -1918,20 +1916,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) entry->map, entry->sym); } -int thread__resolve_callchain(struct thread *thread, - struct callchain_cursor *cursor, - struct perf_evsel *evsel, - struct perf_sample *sample, - struct symbol **parent, - struct addr_location *root_al, - int max_stack) -{ - int ret = thread__resolve_callchain_sample(thread, cursor, evsel, - sample, parent, -
[PATCH] perf script: fix segfault when printing callchains using builtin-script
This fixes a bug caused by an unitialized callchain cursor. The crash frist appeared in: 6f736735e30f ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}") The callchain cursor is a struct that contains pointers, that when uninitialized will cause unpredictable behavior (usually a crash) when trying to append to the callchain. The existing implementation has the following issues: 1. The callchain cursor used is not initialized, resulting in unpredictable behavior when used. 2. The cursor is declared on the stack. Even if it is properly initalized, the implmentation will leak memory when the function returns, since all the references to the callchain_nodes allocated by callchain_cursor_append will be lost when the cursor goes out of scope. 3. Storing the cursor on the stack is inefficient. Even if memory is properly freed when it goes out of scope, a performance penalty will be incurred due to reallocation of callchain nodes. callchain_cursor_append is designed to avoid these reallocations when an existing cursor is reused. This patch fixes the crash by replacing cursor_callchain with a reference to the global callchain_cursor which also resolves all 3 issues mentioned above. How to reproduce the crash: $ perf record --call-graph=dwarf stress -t 1 -c 1 $ perf script > /dev/null Segfault Signed-off-by: Chris Phlipot --- tools/perf/builtin-script.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 5099740..f43b0c6 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -570,12 +570,12 @@ static void print_sample_bts(struct perf_sample *sample, /* print branch_from information */ if (PRINT_FIELD(IP)) { unsigned int print_opts = output[attr->type].print_ip_opts; - struct callchain_cursor *cursor = NULL, cursor_callchain; + struct callchain_cursor *cursor = NULL; if (symbol_conf.use_callchain && sample->callchain && - thread__resolve_callchain(al->thread, &cursor_callchain, evsel, + thread__resolve_callchain(al->thread, &callchain_cursor, evsel, sample, NULL, NULL, scripting_max_stack) == 0) - cursor = &cursor_callchain; + cursor = &callchain_cursor; if (cursor == NULL) { putchar(' '); @@ -789,12 +789,12 @@ static void process_event(struct perf_script *script, printf("%16" PRIu64, sample->weight); if (PRINT_FIELD(IP)) { - struct callchain_cursor *cursor = NULL, cursor_callchain; + struct callchain_cursor *cursor = NULL; if (symbol_conf.use_callchain && sample->callchain && - thread__resolve_callchain(al->thread, &cursor_callchain, evsel, + thread__resolve_callchain(al->thread, &callchain_cursor, evsel, sample, NULL, NULL, scripting_max_stack) == 0) - cursor = &cursor_callchain; + cursor = &callchain_cursor; putchar(cursor ? '\n' : ' '); sample__fprintf_sym(sample, al, 0, output[attr->type].print_ip_opts, cursor, stdout); -- 2.7.4
[PATCH 3/5] perf script: fix postgresql ubuntu install instructions
The current instructions for setting up an Ubuntu system for using the export-to-postgresql.py script are incorrect. The instructions in the script have been updated to work on newer versions of ubuntu. -Add missing dependencies to apt-get command: python-pyside.qtsql, libqt4-sql-psql -Add '-s' option to createuser command to force the user to be a superuser since the command doesn't prompt as indicated in the current instructions. Tested on: Ubuntu 14.04, Ubuntu 16.04(beta) Signed-off-by: Chris Phlipot --- tools/perf/scripts/python/export-to-postgresql.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index 1b02cdc..6f0ca68 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -34,10 +34,9 @@ import datetime # # ubuntu: # -# $ sudo apt-get install postgresql +# $ sudo apt-get install postgresql python-pyside.qtsql libqt4-sql-psql # $ sudo su - postgres -# $ createuser -# Shall the new role be a superuser? (y/n) y +# $ createuser -s # # An example of using this script with Intel PT: # -- 2.7.4
[PATCH 2/5] perf script: extend db-export api to include callchains for samples
The current implementation of the python database export API only includes call path information when using some form of call/return tracing, but is unable to do so when sampling. The following API extensions allow exporting of data collected by perf record when using --call-graph. The additions to the python api include the following: - add call_path_id to sample_table to allow association of samples with call paths - add dso_id to call_path_table to more closely align the data with that of a callchain_node db-export and trace-script-python were both extended to accomidate the API changes listed above. Thread-stack's functionality was expanded to allow storage and exporting of callchains that result from individual samples. - Introduced a new static function (thread_stack__process_callchain) to resolve call paths using the existing callchain resolution provided by thread__resolve_callchain - The existing call_path tree in call_return_processor is used for storing the data from the resolved callchain. - Call_return_processor was also extended to allow the ability to export full call paths in addtion to the existing individual call/return pairs, since call/return pairs are not available when doing sampling. The code was tested using call graphs from fp and dwarf. export-to-postgresqlwas utilized with intel-pt data to verify that changes did not negatively affect existing behavior of the db-export api. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c| 21 ++- tools/perf/util/db-export.h| 2 + .../util/scripting-engines/trace-event-python.c| 20 ++- tools/perf/util/thread-stack.c | 162 + tools/perf/util/thread-stack.h | 14 +- 5 files changed, 184 insertions(+), 35 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 049438d..69c9a9d 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -329,6 +329,13 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, if (err) goto out_put; + dbe->call_path_last_seen_db_id = 0; + if(dbe->crp) { + thread_stack__process_callchain(thread, comm, evsel, + al->machine, sample, + PERF_MAX_STACK_DEPTH, dbe->crp); + } + if ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) && sample_addr_correlates_sym(&evsel->attr)) { struct addr_location addr_al; @@ -346,6 +353,7 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, goto out_put; } } + es.call_path_db_id = dbe->call_path_last_seen_db_id; if (dbe->export_sample) err = dbe->export_sample(dbe, &es); @@ -397,9 +405,10 @@ int db_export__branch_types(struct db_export *dbe) int db_export__call_path(struct db_export *dbe, struct call_path *cp) { int err; - - if (cp->db_id) + if (cp->db_id) { + dbe->call_path_last_seen_db_id = cp->db_id; return 0; + } if (cp->parent) { err = db_export__call_path(dbe, cp->parent); @@ -409,8 +418,14 @@ int db_export__call_path(struct db_export *dbe, struct call_path *cp) cp->db_id = ++dbe->call_path_last_db_id; - if (dbe->export_call_path) + if (dbe->export_call_path) { + if (cp->dso) + db_export__dso(dbe, cp->dso, cp->machine); + if (cp->sym && cp->dso) + db_export__symbol(dbe, cp->sym, cp->dso); + dbe->call_path_last_seen_db_id = cp->db_id; return dbe->export_call_path(dbe, cp); + } return 0; } diff --git a/tools/perf/util/db-export.h b/tools/perf/util/db-export.h index 25e22fd..40e3b07 100644 --- a/tools/perf/util/db-export.h +++ b/tools/perf/util/db-export.h @@ -43,6 +43,7 @@ struct export_sample { u64 addr_dso_db_id; u64 addr_sym_db_id; u64 addr_offset; /* addr offset from symbol start */ + u64 call_path_db_id; }; struct db_export { @@ -73,6 +74,7 @@ struct db_export { u64 symbol_last_db_id; u64 sample_last_db_id; u64 call_path_last_db_id; + u64 call_path_last_seen_db_id; /* last db_id seen(exported or not) */ u64 call_return_last_db_id; struct list_head deferred; }; diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 525eb49..ca3f9c6 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c
[PATCH 4/5] perf script: add option to control callchain export from python
resolving the callchain for every sample can be very expensive. This commit introduces a new option "perf_db_export_callchains" which will be enabled when it is set to True from within the python script. Signed-off-by: Chris Phlipot --- tools/perf/util/db-export.c| 2 +- tools/perf/util/db-export.h| 1 + .../util/scripting-engines/trace-event-python.c| 22 +- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c index 69c9a9d..3bd8247 100644 --- a/tools/perf/util/db-export.c +++ b/tools/perf/util/db-export.c @@ -330,7 +330,7 @@ int db_export__sample(struct db_export *dbe, union perf_event *event, goto out_put; dbe->call_path_last_seen_db_id = 0; - if(dbe->crp) { + if(dbe->crp && dbe->export_callchains) { thread_stack__process_callchain(thread, comm, evsel, al->machine, sample, PERF_MAX_STACK_DEPTH, dbe->crp); diff --git a/tools/perf/util/db-export.h b/tools/perf/util/db-export.h index 40e3b07..b75fea9 100644 --- a/tools/perf/util/db-export.h +++ b/tools/perf/util/db-export.h @@ -77,6 +77,7 @@ struct db_export { u64 call_path_last_seen_db_id; /* last db_id seen(exported or not) */ u64 call_return_last_db_id; struct list_head deferred; + bool export_callchains; }; int db_export__init(struct db_export *dbe); diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index ca3f9c6..2e67b35 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -1008,8 +1008,10 @@ error: static void set_table_handlers(struct tables *tables) { const char *perf_db_export_mode = "perf_db_export_mode"; + const char *perf_db_export_callchains = "perf_db_export_callchains"; const char *perf_db_export_calls = "perf_db_export_calls"; - PyObject *db_export_mode, *db_export_calls; + PyObject *db_export_mode, *db_export_callchains, *db_export_calls; + bool export_callchains = false; bool export_calls = false; int ret; @@ -1028,6 +1030,16 @@ static void set_table_handlers(struct tables *tables) return; tables->dbe.crp = NULL; + + db_export_callchains = PyDict_GetItemString(main_dict, + perf_db_export_callchains); + if (db_export_callchains) { + ret = PyObject_IsTrue(db_export_callchains); + if (ret == -1) + handler_call_die(perf_db_export_callchains); + export_callchains = !!ret; + } + db_export_calls = PyDict_GetItemString(main_dict, perf_db_export_calls); if (db_export_calls) { ret = PyObject_IsTrue(db_export_calls); @@ -1043,9 +1055,17 @@ static void set_table_handlers(struct tables *tables) &tables->dbe); if (!tables->dbe.crp) Py_FatalError("failed to create calls processor"); + } else if (export_callchains) { + tables->dbe.crp = + call_return_processor__new(python_process_call_path, + NULL, + &tables->dbe); + if (!tables->dbe.crp) + Py_FatalError("failed to create calls processor"); } tables->db_export_mode = true; + tables->dbe.export_callchains = export_callchains; /* * Reserve per symbol space for symbol->db_id via symbol__priv() */ -- 2.7.4
[PATCH 1/5] perf tools: fix incorrect ordering of callchain entries
The existing implentation implementation of thread__resolve_callchain, under certain circumstanes, can assemble callchain entries in the incorrect order. A the callchain entries are resolved incorrectly for a sample when all of the following conditions are met: 1. callchain_param.order is set to ORDER_CALLER 2. thread__resolve_callchain_sample is able to resolve callchain entries for the sample. 3. unwind__get_entries is also able to resolve callchain entries for the sample. The fix is accomplished by reversing the order in which thread__resolve_callchain_sample and unwind__get_entries are called when callchain_param.order is set to ORDER_CALLER. Unwind specific code from thread__resolve_callchain is also moved into a new static function to improve readability of the fix. Signed-off-by: Chris Phlipot --- tools/perf/util/machine.c | 57 ++- 1 file changed, 41 insertions(+), 16 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0c4dabc..dd086c8 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1806,8 +1806,6 @@ static int thread__resolve_callchain_sample(struct thread *thread, int skip_idx = -1; int first_call = 0; - callchain_cursor_reset(cursor); - if (has_branch_callstack(evsel)) { err = resolve_lbr_callchain_sample(thread, cursor, sample, parent, root_al, max_stack); @@ -1918,20 +1916,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) entry->map, entry->sym); } -int thread__resolve_callchain(struct thread *thread, - struct callchain_cursor *cursor, - struct perf_evsel *evsel, - struct perf_sample *sample, - struct symbol **parent, - struct addr_location *root_al, - int max_stack) -{ - int ret = thread__resolve_callchain_sample(thread, cursor, evsel, - sample, parent, - root_al, max_stack); - if (ret) - return ret; - +static int thread__resolve_callchain_unwind(struct thread *thread, +struct callchain_cursor *cursor, + struct perf_evsel *evsel, + struct perf_sample *sample, + int max_stack) +{ /* Can we do dwarf post unwind? */ if (!((evsel->attr.sample_type & PERF_SAMPLE_REGS_USER) && (evsel->attr.sample_type & PERF_SAMPLE_STACK_USER))) @@ -1944,7 +1934,42 @@ int thread__resolve_callchain(struct thread *thread, return unwind__get_entries(unwind_entry, cursor, thread, sample, max_stack); +} + +int thread__resolve_callchain(struct thread *thread, + struct callchain_cursor *cursor, + struct perf_evsel *evsel, + struct perf_sample *sample, + struct symbol **parent, + struct addr_location *root_al, + int max_stack) +{ + int ret = 0; + callchain_cursor_reset(&callchain_cursor); + + if (callchain_param.order == ORDER_CALLEE) { +ret = thread__resolve_callchain_sample(thread, cursor, + evsel, sample, + parent, root_al, + max_stack); + if (ret) + return ret; +ret = thread__resolve_callchain_unwind(thread, cursor, + evsel, sample, + max_stack); + } else { +ret = thread__resolve_callchain_unwind(thread, cursor, + evsel, sample, + max_stack); + if (ret) + return ret; +ret = thread__resolve_callchain_sample(thread, cursor, + evsel, sample, + parent, root_al, + max_stack); + } + return ret; } int machine__for_each_thread(struct machine *machine, -- 2.7.4
[PATCH 5/5] perf script: Update export-to-postgresql to support callchains
Update the export-to-postgresql.py to support the newly introduced callchain export. callchains are added into the existing call_paths table and can now be associated with samples when the "callpaths" commandline option is used with the script. Includes the following changes: -Now set the perf_db_export_callchains global -Add "callchains" commandline option" -Add perf_db_export_callchains checks for call_path table creation and population. -Add call_path_id to samples_table -Add dso_id to call_paths_table Signed-off-by: Chris Phlipot --- tools/perf/scripts/python/export-to-postgresql.py | 60 ++- 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index 6f0ca68..e64009f 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -223,11 +223,14 @@ sys.path.append(os.environ['PERF_EXEC_PATH'] + \ perf_db_export_mode = True perf_db_export_calls = False +perf_db_export_callchains = False + def usage(): - print >> sys.stderr, "Usage is: export-to-postgresql.py [] []" + print >> sys.stderr, "Usage is: export-to-postgresql.py [] [] []" print >> sys.stderr, "where:columns 'all' or 'branches'" - print >> sys.stderr, " calls 'calls' => create calls table" + print >> sys.stderr, " calls 'calls' => create calls and call_paths table" + print >> sys.stderr, " callchains 'callchains' => create call_paths table" raise Exception("Too few arguments") if (len(sys.argv) < 2): @@ -245,12 +248,14 @@ if columns not in ("all", "branches"): branches = (columns == "branches") -if (len(sys.argv) >= 4): - if (sys.argv[3] == "calls"): +for i in range(3,len(sys.argv)): + if (sys.argv[i] == "calls"): perf_db_export_calls = True + elif (sys.argv[i] == "callchains"): + perf_db_export_callchains = True else: usage() - + output_dir_name = os.getcwd() + "/" + dbname + "-perf-data" os.mkdir(output_dir_name) @@ -358,14 +363,17 @@ else: 'transactionbigint,' 'data_src bigint,' 'branch_typeinteger,' - 'in_tx boolean)') + 'in_tx boolean,' + 'call_path_id bigint)') -if perf_db_export_calls: +if perf_db_export_calls or perf_db_export_callchains: do_query(query, 'CREATE TABLE call_paths (' 'id bigint NOT NULL,' 'parent_id bigint,' 'symbol_id bigint,' - 'ip bigint)') + 'ip bigint,' + 'dso_id bigint)') +if perf_db_export_calls: do_query(query, 'CREATE TABLE calls (' 'id bigint NOT NULL,' 'thread_id bigint,' @@ -427,7 +435,7 @@ do_query(query, 'CREATE VIEW comm_threads_view AS ' '(SELECT tid FROM threads WHERE id = thread_id) AS tid' ' FROM comm_threads') -if perf_db_export_calls: +if perf_db_export_calls or perf_db_export_callchains: do_query(query, 'CREATE VIEW call_paths_view AS ' 'SELECT ' 'c.id,' @@ -443,6 +451,7 @@ if perf_db_export_calls: '(SELECT dso_id FROM symbols WHERE id = p.symbol_id) AS parent_dso_id,' '(SELECT dso FROM symbols_view WHERE id = p.symbol_id) AS parent_dso_short_name' ' FROM call_paths c INNER JOIN call_paths p ON p.id = c.parent_id') +if perf_db_export_calls: do_query(query, 'CREATE VIEW calls_view AS ' 'SELECT ' 'calls.id,' @@ -540,8 +549,9 @@ dso_file= open_output_file("dso_table.bin") symbol_file= open_output_file("symbol_table.bin") branch_type_file = open_output_file("branch_type_table.bin") sample_file= open_output_file("sample_table.bin") -if perf_db_export_calls: +if perf_db_export_calls or perf_db_export_callchains: call_path_file = open_output_file("call_path_table.bin") +if perf_db_export_calls:
[tip:perf/core] perf tools: Fix perf script python database export crash
Commit-ID: 616df645d7238e45d3b369933a30fee4e4e305e2 Gitweb: http://git.kernel.org/tip/616df645d7238e45d3b369933a30fee4e4e305e2 Author: Chris Phlipot AuthorDate: Tue, 8 Mar 2016 21:11:54 -0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 9 Mar 2016 10:31:02 -0300 perf tools: Fix perf script python database export crash Remove the union in evsel so that the database id and priv pointer can be used simultainously without conflicting and crashing. Detailed Description for the fixed bug follows: perf script crashes with a segmentation fault on user space tool version 4.5.rc7.ge2857b when using the python database export API. It works properly in 4.4 and prior versions. the crash fist appeared in: cfc8874a4859 ("perf script: Process cpu/threads maps") How to reproduce the bug: Remove any temporary files left over from a previous crash (if you have already attemped to reproduce the bug): $ rm -r test_db-perf-data $ dropdb test_db $ perf record timeout 1 yes >/dev/null $ perf script -s scripts/python/export-to-postgresql.py test_db Stack Trace: Program received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x1) at malloc.c:2929 2929 malloc.c: No such file or directory. (gdb) bt at util/stat.c:122 argv=, prefix=) at builtin-script.c:2231 argc=argc@entry=4, argv=argv@entry=0x7fffdf70) at perf.c:390 at perf.c:451 Signed-off-by: Chris Phlipot Acked-by: Jiri Olsa Cc: Adrian Hunter Cc: Peter Zijlstra Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps") Link: http://lkml.kernel.org/r/1457500314-8912-1-git-send-email-cphlip...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evsel.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index efad78f..501ea6e 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -93,10 +93,8 @@ struct perf_evsel { const char *unit; struct event_format *tp_format; off_t id_offset; - union { - void*priv; - u64 db_id; - }; + void*priv; + u64 db_id; struct cgroup_sel *cgrp; void*handler; struct cpu_map *cpus;
[PATCH 1/1] Fixes: cfc8874a485 ("perf script: Process cpu/threads maps")
fix the perf script python database export crash. Remove the union in evsel so that the database id and priv pointer can be used simultainously without conflicting and crashing. Detailed Description for the fixed bug follows: perf script crashes with a segmentaiton fault on user space tool version 4.5.rc7.ge2857b when using the python database export API. It works properly in 4.4 and prior versions. the crash fist appeared in cfc8874a485 ("perf script: Process cpu/threads maps") How to reprodcue the bug: remove any temporary files left over from a previous crash (if you have already attemped to reproduce the bug): $ rm -r test_db-perf-data $ dropdb test_db $ ./perf record timeout 1 yes >/dev/null $ ./perf script -s scripts/python/export-to-postgresql.py test_db Stack Trace: Program received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x1) at malloc.c:2929 2929malloc.c: No such file or directory. (gdb) bt at util/stat.c:122 argv=, prefix=) at builtin-script.c:2231 argc=argc@entry=4, argv=argv@entry=0x7fffdf70) at perf.c:390 at perf.c:451 Signed-off-by: Chris Phlipot --- tools/perf/util/evsel.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 8e75434..4d8037a 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -93,10 +93,8 @@ struct perf_evsel { const char *unit; struct event_format *tp_format; off_t id_offset; - union { - void*priv; - u64 db_id; - }; + void*priv; + u64 db_id; struct cgroup_sel *cgrp; void*handler; struct cpu_map *cpus; -- 1.9.1