Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Peter Zijlstrawrites: > On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: >> >> I would just make the identifier a structure containing the >> device number and the inode number. It didn't look like perf required >> the identifier to be a simple integer. > > Right, perf doesn't care at all here, its just a transport. perf report? In that case I think perf cares enough to know there is some identifier it is reporting things by. Eric
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Peter Zijlstra writes: > On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: >> >> I would just make the identifier a structure containing the >> device number and the inode number. It didn't look like perf required >> the identifier to be a simple integer. > > Right, perf doesn't care at all here, its just a transport. perf report? In that case I think perf cares enough to know there is some identifier it is reporting things by. Eric
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
On Wednesday 14 December 2016 09:22 PM, Eric W. Biederman wrote: Peter Zijlstrawrites: On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: I would just make the identifier a structure containing the device number and the inode number. It didn't look like perf required the identifier to be a simple integer. Right, perf doesn't care at all here, its just a transport. perf report? In that case I think perf cares enough to know there is some identifier it is reporting things by. Let me post v4 with this change.. Thanks Hari
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
On Wednesday 14 December 2016 09:22 PM, Eric W. Biederman wrote: Peter Zijlstra writes: On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: I would just make the identifier a structure containing the device number and the inode number. It didn't look like perf required the identifier to be a simple integer. Right, perf doesn't care at all here, its just a transport. perf report? In that case I think perf cares enough to know there is some identifier it is reporting things by. Let me post v4 with this change.. Thanks Hari
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: > > I would just make the identifier a structure containing the > device number and the inode number. It didn't look like perf required > the identifier to be a simple integer. Right, perf doesn't care at all here, its just a transport.
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote: > > I would just make the identifier a structure containing the > device number and the inode number. It didn't look like perf required > the identifier to be a simple integer. Right, perf doesn't care at all here, its just a transport.
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hari Bathiniwrites: > Hi Eric, > > > On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote: >> Hari Bathini writes: >> >>> This patch introduces a cgroup identifier entry field in perf report to >>> identify or distinguish data of different cgroups. It uses the unique >>> inode number of cgroup namespace, included in perf data with the new >>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption >>> that each container is created with it's own cgroup namespace, this >>> allows assessment/analysis of multiple containers at once. >> In the large this sounds reasonable. >> >> The details are wrong. The cgroup id needs to be device >> number + inode number, not just inode number. >> > > As the assumption that device number is going to be the same for > all namespaces may not stand the test of time, the inode number is > not going to be unique, to use as an identifier.. > > I am thinking of an identifier like the below. This may be OK for now > as dev_num & inode_num are 32bit each. > > identifier = (dev_num << 32 | inode_num) > > But this may leave us with identifiers that are not unique if dev_num > & inode_num are changed to 64bit. Should that be of concern? Do > you have any alternate suggestions to come up with unique identifier > in such scenario too..? Inode numbers in general are 64bit. The namespace inodes admittedly are currently implemented as 32bit quantities but that is not something we want to hard code into the userspace interface. I would just make the identifier a structure containing the device number and the inode number. It didn't look like perf required the identifier to be a simple integer. Eric
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hari Bathini writes: > Hi Eric, > > > On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote: >> Hari Bathini writes: >> >>> This patch introduces a cgroup identifier entry field in perf report to >>> identify or distinguish data of different cgroups. It uses the unique >>> inode number of cgroup namespace, included in perf data with the new >>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption >>> that each container is created with it's own cgroup namespace, this >>> allows assessment/analysis of multiple containers at once. >> In the large this sounds reasonable. >> >> The details are wrong. The cgroup id needs to be device >> number + inode number, not just inode number. >> > > As the assumption that device number is going to be the same for > all namespaces may not stand the test of time, the inode number is > not going to be unique, to use as an identifier.. > > I am thinking of an identifier like the below. This may be OK for now > as dev_num & inode_num are 32bit each. > > identifier = (dev_num << 32 | inode_num) > > But this may leave us with identifiers that are not unique if dev_num > & inode_num are changed to 64bit. Should that be of concern? Do > you have any alternate suggestions to come up with unique identifier > in such scenario too..? Inode numbers in general are 64bit. The namespace inodes admittedly are currently implemented as 32bit quantities but that is not something we want to hard code into the userspace interface. I would just make the identifier a structure containing the device number and the inode number. It didn't look like perf required the identifier to be a simple integer. Eric
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hi Eric, On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote: Hari Bathiniwrites: This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the unique inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. In the large this sounds reasonable. The details are wrong. The cgroup id needs to be device number + inode number, not just inode number. As the assumption that device number is going to be the same for all namespaces may not stand the test of time, the inode number is not going to be unique, to use as an identifier.. I am thinking of an identifier like the below. This may be OK for now as dev_num & inode_num are 32bit each. identifier = (dev_num << 32 | inode_num) But this may leave us with identifiers that are not unique if dev_num & inode_num are changed to 64bit. Should that be of concern? Do you have any alternate suggestions to come up with unique identifier in such scenario too..? Thanks Hari
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hi Eric, On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote: Hari Bathini writes: This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the unique inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. In the large this sounds reasonable. The details are wrong. The cgroup id needs to be device number + inode number, not just inode number. As the assumption that device number is going to be the same for all namespaces may not stand the test of time, the inode number is not going to be unique, to use as an identifier.. I am thinking of an identifier like the below. This may be OK for now as dev_num & inode_num are 32bit each. identifier = (dev_num << 32 | inode_num) But this may leave us with identifiers that are not unique if dev_num & inode_num are changed to 64bit. Should that be of concern? Do you have any alternate suggestions to come up with unique identifier in such scenario too..? Thanks Hari
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hari Bathiniwrites: > This patch introduces a cgroup identifier entry field in perf report to > identify or distinguish data of different cgroups. It uses the unique > inode number of cgroup namespace, included in perf data with the new > PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption > that each container is created with it's own cgroup namespace, this > allows assessment/analysis of multiple containers at once. In the large this sounds reasonable. The details are wrong. The cgroup id needs to be device number + inode number, not just inode number. Eric > Signed-off-by: Hari Bathini > --- > tools/perf/util/hist.c |4 > tools/perf/util/hist.h |1 + > tools/perf/util/sort.c | 22 ++ > tools/perf/util/sort.h |2 ++ > 4 files changed, 29 insertions(+) > > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c > @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists, > bool sample_self, > struct hist_entry_ops *ops) > { > + struct namespaces *ns = thread__namespaces(al->thread); > struct hist_entry entry = { > .thread = al->thread, > .comm = thread__comm(al->thread), > + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0, > .ms = { > .map= al->map, > .sym= al->sym, Eric
Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
Hari Bathini writes: > This patch introduces a cgroup identifier entry field in perf report to > identify or distinguish data of different cgroups. It uses the unique > inode number of cgroup namespace, included in perf data with the new > PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption > that each container is created with it's own cgroup namespace, this > allows assessment/analysis of multiple containers at once. In the large this sounds reasonable. The details are wrong. The cgroup id needs to be device number + inode number, not just inode number. Eric > Signed-off-by: Hari Bathini > --- > tools/perf/util/hist.c |4 > tools/perf/util/hist.h |1 + > tools/perf/util/sort.c | 22 ++ > tools/perf/util/sort.h |2 ++ > 4 files changed, 29 insertions(+) > > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c > @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists, > bool sample_self, > struct hist_entry_ops *ops) > { > + struct namespaces *ns = thread__namespaces(al->thread); > struct hist_entry entry = { > .thread = al->thread, > .comm = thread__comm(al->thread), > + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0, > .ms = { > .map= al->map, > .sym= al->sym, Eric
[PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the unique inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. Shown below is the output of perf report, sorted based on cgroup id, on a system that was running three containers at the time of perf record and clearly showing one of the containers' considerable use of kernel memory in comparison with others: $ perf report -s cgroup_id,sample --stdio # # Total Lost Samples: 0 # # Samples: 1K of event 'kmem:kmalloc' # Event count (approx.): 1828 # # Overhead cgroup idSamples # .. # 84.74% 4026532048 1549 7.93% 4026531835 145 3.67% 402653204767 2.68% 402653204649 0.98% 0 18 Signed-off-by: Hari Bathini--- tools/perf/util/hist.c |4 tools/perf/util/hist.h |1 + tools/perf/util/sort.c | 22 ++ tools/perf/util/sort.h |2 ++ 4 files changed, 29 insertions(+) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index a69f027..a6650d7 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2,6 +2,7 @@ #include "build-id.h" #include "hist.h" #include "session.h" +#include "namespaces.h" #include "sort.h" #include "evlist.h" #include "evsel.h" @@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO); } + hists__new_col_len(hists, HISTC_CGROUP_ID, 10); hists__new_col_len(hists, HISTC_CPU, 3); hists__new_col_len(hists, HISTC_SOCKET, 6); hists__new_col_len(hists, HISTC_MEM_LOCKED, 6); @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists, bool sample_self, struct hist_entry_ops *ops) { + struct namespaces *ns = thread__namespaces(al->thread); struct hist_entry entry = { .thread = al->thread, .comm = thread__comm(al->thread), + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0, .ms = { .map= al->map, .sym= al->sym, diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 9928fed..894c95d 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -29,6 +29,7 @@ enum hist_column { HISTC_DSO, HISTC_THREAD, HISTC_COMM, + HISTC_CGROUP_ID, HISTC_PARENT, HISTC_CPU, HISTC_SOCKET, diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 452e15a..b6152df 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -536,6 +536,27 @@ struct sort_entry sort_cpu = { .se_width_idx = HISTC_CPU, }; +/* --sort cgroup_id */ + +static int64_t +sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return (int64_t)right->cgroup_id - (int64_t)left->cgroup_id; +} + +static int hist_entry__cgroup_id_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%-*u", width, he->cgroup_id); +} + +struct sort_entry sort_cgroup_id = { + .se_header = "cgroup id", + .se_cmp = sort__cgroup_id_cmp, + .se_snprintf= hist_entry__cgroup_id_snprintf, + .se_width_idx = HISTC_CGROUP_ID, +}; + /* --sort socket */ static int64_t @@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight), DIM(SORT_TRANSACTION, "transaction", sort_transaction), DIM(SORT_TRACE, "trace", sort_trace), + DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 099c975..e8058f6 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -95,6 +95,7 @@ struct hist_entry { u64 transaction; s32 socket; s32 cpu; + u32 cgroup_id; u8 cpumode; u8 depth; @@ -211,6 +212,7 @@ enum sort_type { SORT_GLOBAL_WEIGHT, SORT_TRANSACTION, SORT_TRACE, + SORT_CGROUP_ID, /* branch stack specific sort keys */ __SORT_BRANCH_STACK,
[PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the unique inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. Shown below is the output of perf report, sorted based on cgroup id, on a system that was running three containers at the time of perf record and clearly showing one of the containers' considerable use of kernel memory in comparison with others: $ perf report -s cgroup_id,sample --stdio # # Total Lost Samples: 0 # # Samples: 1K of event 'kmem:kmalloc' # Event count (approx.): 1828 # # Overhead cgroup idSamples # .. # 84.74% 4026532048 1549 7.93% 4026531835 145 3.67% 402653204767 2.68% 402653204649 0.98% 0 18 Signed-off-by: Hari Bathini --- tools/perf/util/hist.c |4 tools/perf/util/hist.h |1 + tools/perf/util/sort.c | 22 ++ tools/perf/util/sort.h |2 ++ 4 files changed, 29 insertions(+) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index a69f027..a6650d7 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2,6 +2,7 @@ #include "build-id.h" #include "hist.h" #include "session.h" +#include "namespaces.h" #include "sort.h" #include "evlist.h" #include "evsel.h" @@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO); } + hists__new_col_len(hists, HISTC_CGROUP_ID, 10); hists__new_col_len(hists, HISTC_CPU, 3); hists__new_col_len(hists, HISTC_SOCKET, 6); hists__new_col_len(hists, HISTC_MEM_LOCKED, 6); @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists, bool sample_self, struct hist_entry_ops *ops) { + struct namespaces *ns = thread__namespaces(al->thread); struct hist_entry entry = { .thread = al->thread, .comm = thread__comm(al->thread), + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0, .ms = { .map= al->map, .sym= al->sym, diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 9928fed..894c95d 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -29,6 +29,7 @@ enum hist_column { HISTC_DSO, HISTC_THREAD, HISTC_COMM, + HISTC_CGROUP_ID, HISTC_PARENT, HISTC_CPU, HISTC_SOCKET, diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 452e15a..b6152df 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -536,6 +536,27 @@ struct sort_entry sort_cpu = { .se_width_idx = HISTC_CPU, }; +/* --sort cgroup_id */ + +static int64_t +sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return (int64_t)right->cgroup_id - (int64_t)left->cgroup_id; +} + +static int hist_entry__cgroup_id_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%-*u", width, he->cgroup_id); +} + +struct sort_entry sort_cgroup_id = { + .se_header = "cgroup id", + .se_cmp = sort__cgroup_id_cmp, + .se_snprintf= hist_entry__cgroup_id_snprintf, + .se_width_idx = HISTC_CGROUP_ID, +}; + /* --sort socket */ static int64_t @@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight), DIM(SORT_TRANSACTION, "transaction", sort_transaction), DIM(SORT_TRACE, "trace", sort_trace), + DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 099c975..e8058f6 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -95,6 +95,7 @@ struct hist_entry { u64 transaction; s32 socket; s32 cpu; + u32 cgroup_id; u8 cpumode; u8 depth; @@ -211,6 +212,7 @@ enum sort_type { SORT_GLOBAL_WEIGHT, SORT_TRANSACTION, SORT_TRACE, + SORT_CGROUP_ID, /* branch stack specific sort keys */ __SORT_BRANCH_STACK,