Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Eric W. Biederman
Peter Zijlstra  writes:

> On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
>> 
>> I would just make the identifier a structure containing the
>> device number and the inode number.  It didn't look like perf required
>> the identifier to be a simple integer.
>
> Right, perf doesn't care at all here, its just a transport.

perf report?  In that case I think perf cares enough to know there is
some identifier it is reporting things by.

Eric



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Eric W. Biederman
Peter Zijlstra  writes:

> On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
>> 
>> I would just make the identifier a structure containing the
>> device number and the inode number.  It didn't look like perf required
>> the identifier to be a simple integer.
>
> Right, perf doesn't care at all here, its just a transport.

perf report?  In that case I think perf cares enough to know there is
some identifier it is reporting things by.

Eric



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Hari Bathini



On Wednesday 14 December 2016 09:22 PM, Eric W. Biederman wrote:

Peter Zijlstra  writes:


On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:

I would just make the identifier a structure containing the
device number and the inode number.  It didn't look like perf required
the identifier to be a simple integer.

Right, perf doesn't care at all here, its just a transport.

perf report?  In that case I think perf cares enough to know there is
some identifier it is reporting things by.


Let me post v4 with this change..

Thanks
Hari



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Hari Bathini



On Wednesday 14 December 2016 09:22 PM, Eric W. Biederman wrote:

Peter Zijlstra  writes:


On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:

I would just make the identifier a structure containing the
device number and the inode number.  It didn't look like perf required
the identifier to be a simple integer.

Right, perf doesn't care at all here, its just a transport.

perf report?  In that case I think perf cares enough to know there is
some identifier it is reporting things by.


Let me post v4 with this change..

Thanks
Hari



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Peter Zijlstra
On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
> 
> I would just make the identifier a structure containing the
> device number and the inode number.  It didn't look like perf required
> the identifier to be a simple integer.

Right, perf doesn't care at all here, its just a transport.


Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-14 Thread Peter Zijlstra
On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
> 
> I would just make the identifier a structure containing the
> device number and the inode number.  It didn't look like perf required
> the identifier to be a simple integer.

Right, perf doesn't care at all here, its just a transport.


Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-13 Thread Eric W. Biederman
Hari Bathini  writes:

> Hi Eric,
>
>
> On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:
>> Hari Bathini  writes:
>>
>>> This patch introduces a cgroup identifier entry field in perf report to
>>> identify or distinguish data of different cgroups. It uses the unique
>>> inode number of cgroup namespace, included in perf data with the new
>>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
>>> that each container is created with it's own cgroup namespace, this
>>> allows assessment/analysis of multiple containers at once.
>> In the large this sounds reasonable.
>>
>> The details are wrong.  The cgroup id needs to be device
>> number + inode number, not just inode number.
>>
>
> As the assumption that device number is going to be the same for
> all namespaces may not stand the test of time, the inode number is
> not going to be unique, to use as an identifier..
>
> I am thinking of an identifier like the below. This may be OK for now
> as dev_num & inode_num are 32bit each.
>
> identifier = (dev_num << 32 | inode_num)
>
> But this may leave us with identifiers that are not unique if dev_num
> & inode_num are changed to 64bit. Should that be of concern? Do
> you have any alternate suggestions to come up with unique identifier
> in such scenario too..?

Inode numbers in general are 64bit.  The namespace inodes admittedly are
currently implemented as 32bit quantities but that is not something we
want to hard code into the userspace interface.

I would just make the identifier a structure containing the
device number and the inode number.  It didn't look like perf required
the identifier to be a simple integer.

Eric


Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-13 Thread Eric W. Biederman
Hari Bathini  writes:

> Hi Eric,
>
>
> On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:
>> Hari Bathini  writes:
>>
>>> This patch introduces a cgroup identifier entry field in perf report to
>>> identify or distinguish data of different cgroups. It uses the unique
>>> inode number of cgroup namespace, included in perf data with the new
>>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
>>> that each container is created with it's own cgroup namespace, this
>>> allows assessment/analysis of multiple containers at once.
>> In the large this sounds reasonable.
>>
>> The details are wrong.  The cgroup id needs to be device
>> number + inode number, not just inode number.
>>
>
> As the assumption that device number is going to be the same for
> all namespaces may not stand the test of time, the inode number is
> not going to be unique, to use as an identifier..
>
> I am thinking of an identifier like the below. This may be OK for now
> as dev_num & inode_num are 32bit each.
>
> identifier = (dev_num << 32 | inode_num)
>
> But this may leave us with identifiers that are not unique if dev_num
> & inode_num are changed to 64bit. Should that be of concern? Do
> you have any alternate suggestions to come up with unique identifier
> in such scenario too..?

Inode numbers in general are 64bit.  The namespace inodes admittedly are
currently implemented as 32bit quantities but that is not something we
want to hard code into the userspace interface.

I would just make the identifier a structure containing the
device number and the inode number.  It didn't look like perf required
the identifier to be a simple integer.

Eric


Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-13 Thread Hari Bathini

Hi Eric,


On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:

Hari Bathini  writes:


This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the unique
inode number of cgroup namespace, included in perf data with the new
PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
that each container is created with it's own cgroup namespace, this
allows assessment/analysis of multiple containers at once.

In the large this sounds reasonable.

The details are wrong.  The cgroup id needs to be device
number + inode number, not just inode number.



As the assumption that device number is going to be the same for
all namespaces may not stand the test of time, the inode number is
not going to be unique, to use as an identifier..

I am thinking of an identifier like the below. This may be OK for now
as dev_num & inode_num are 32bit each.

identifier = (dev_num << 32 | inode_num)

But this may leave us with identifiers that are not unique if dev_num
& inode_num are changed to 64bit. Should that be of concern? Do
you have any alternate suggestions to come up with unique identifier
in such scenario too..?

Thanks
Hari



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-13 Thread Hari Bathini

Hi Eric,


On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:

Hari Bathini  writes:


This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the unique
inode number of cgroup namespace, included in perf data with the new
PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
that each container is created with it's own cgroup namespace, this
allows assessment/analysis of multiple containers at once.

In the large this sounds reasonable.

The details are wrong.  The cgroup id needs to be device
number + inode number, not just inode number.



As the assumption that device number is going to be the same for
all namespaces may not stand the test of time, the inode number is
not going to be unique, to use as an identifier..

I am thinking of an identifier like the below. This may be OK for now
as dev_num & inode_num are 32bit each.

identifier = (dev_num << 32 | inode_num)

But this may leave us with identifiers that are not unique if dev_num
& inode_num are changed to 64bit. Should that be of concern? Do
you have any alternate suggestions to come up with unique identifier
in such scenario too..?

Thanks
Hari



Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-12 Thread Eric W. Biederman
Hari Bathini  writes:

> This patch introduces a cgroup identifier entry field in perf report to
> identify or distinguish data of different cgroups. It uses the unique
> inode number of cgroup namespace, included in perf data with the new
> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
> that each container is created with it's own cgroup namespace, this
> allows assessment/analysis of multiple containers at once.

In the large this sounds reasonable.

The details are wrong.  The cgroup id needs to be device
number + inode number, not just inode number.

Eric

> Signed-off-by: Hari Bathini 
> ---
>  tools/perf/util/hist.c |4 
>  tools/perf/util/hist.h |1 +
>  tools/perf/util/sort.c |   22 ++
>  tools/perf/util/sort.h |2 ++
>  4 files changed, 29 insertions(+)
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
>  bool sample_self,
>  struct hist_entry_ops *ops)
>  {
> + struct namespaces *ns = thread__namespaces(al->thread);
>   struct hist_entry entry = {
>   .thread = al->thread,
>   .comm = thread__comm(al->thread),
> + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
>   .ms = {
>   .map= al->map,
>   .sym= al->sym,

Eric


Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-12 Thread Eric W. Biederman
Hari Bathini  writes:

> This patch introduces a cgroup identifier entry field in perf report to
> identify or distinguish data of different cgroups. It uses the unique
> inode number of cgroup namespace, included in perf data with the new
> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
> that each container is created with it's own cgroup namespace, this
> allows assessment/analysis of multiple containers at once.

In the large this sounds reasonable.

The details are wrong.  The cgroup id needs to be device
number + inode number, not just inode number.

Eric

> Signed-off-by: Hari Bathini 
> ---
>  tools/perf/util/hist.c |4 
>  tools/perf/util/hist.h |1 +
>  tools/perf/util/sort.c |   22 ++
>  tools/perf/util/sort.h |2 ++
>  4 files changed, 29 insertions(+)
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
>  bool sample_self,
>  struct hist_entry_ops *ops)
>  {
> + struct namespaces *ns = thread__namespaces(al->thread);
>   struct hist_entry entry = {
>   .thread = al->thread,
>   .comm = thread__comm(al->thread),
> + .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
>   .ms = {
>   .map= al->map,
>   .sym= al->sym,

Eric


[PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-12 Thread Hari Bathini
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the unique
inode number of cgroup namespace, included in perf data with the new
PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
that each container is created with it's own cgroup namespace, this
allows assessment/analysis of multiple containers at once.

Shown below is the output of perf report, sorted based on cgroup id, on
a system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:


$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'kmem:kmalloc'
# Event count (approx.): 1828
#
# Overhead  cgroup idSamples
#   ..  
#
84.74%  4026532048  1549
 7.93%  4026531835   145
 3.67%  402653204767
 2.68%  402653204649
 0.98%  0 18

Signed-off-by: Hari Bathini 
---
 tools/perf/util/hist.c |4 
 tools/perf/util/hist.h |1 +
 tools/perf/util/sort.c |   22 ++
 tools/perf/util/sort.h |2 ++
 4 files changed, 29 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a69f027..a6650d7 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2,6 +2,7 @@
 #include "build-id.h"
 #include "hist.h"
 #include "session.h"
+#include "namespaces.h"
 #include "sort.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
}
 
+   hists__new_col_len(hists, HISTC_CGROUP_ID, 10);
hists__new_col_len(hists, HISTC_CPU, 3);
hists__new_col_len(hists, HISTC_SOCKET, 6);
hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
   bool sample_self,
   struct hist_entry_ops *ops)
 {
+   struct namespaces *ns = thread__namespaces(al->thread);
struct hist_entry entry = {
.thread = al->thread,
.comm = thread__comm(al->thread),
+   .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
.ms = {
.map= al->map,
.sym= al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9928fed..894c95d 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -29,6 +29,7 @@ enum hist_column {
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
+   HISTC_CGROUP_ID,
HISTC_PARENT,
HISTC_CPU,
HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 452e15a..b6152df 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,27 @@ struct sort_entry sort_cpu = {
.se_width_idx   = HISTC_CPU,
 };
 
+/* --sort cgroup_id */
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return (int64_t)right->cgroup_id - (int64_t)left->cgroup_id;
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he, char *bf,
+ size_t size, unsigned int width)
+{
+   return repsep_snprintf(bf, size, "%-*u", width, he->cgroup_id);
+}
+
+struct sort_entry sort_cgroup_id = {
+   .se_header  = "cgroup id",
+   .se_cmp = sort__cgroup_id_cmp,
+   .se_snprintf= hist_entry__cgroup_id_snprintf,
+   .se_width_idx   = HISTC_CGROUP_ID,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = {
DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
DIM(SORT_TRANSACTION, "transaction", sort_transaction),
DIM(SORT_TRACE, "trace", sort_trace),
+   DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 099c975..e8058f6 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -95,6 +95,7 @@ struct hist_entry {
u64 transaction;
s32 socket;
s32 cpu;
+   u32 cgroup_id;
u8  cpumode;
u8  depth;
 
@@ -211,6 +212,7 @@ enum sort_type {
SORT_GLOBAL_WEIGHT,
SORT_TRANSACTION,
SORT_TRACE,
+   SORT_CGROUP_ID,
 
/* branch stack specific sort keys */
__SORT_BRANCH_STACK,



[PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report

2016-12-12 Thread Hari Bathini
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the unique
inode number of cgroup namespace, included in perf data with the new
PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
that each container is created with it's own cgroup namespace, this
allows assessment/analysis of multiple containers at once.

Shown below is the output of perf report, sorted based on cgroup id, on
a system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:


$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'kmem:kmalloc'
# Event count (approx.): 1828
#
# Overhead  cgroup idSamples
#   ..  
#
84.74%  4026532048  1549
 7.93%  4026531835   145
 3.67%  402653204767
 2.68%  402653204649
 0.98%  0 18

Signed-off-by: Hari Bathini 
---
 tools/perf/util/hist.c |4 
 tools/perf/util/hist.h |1 +
 tools/perf/util/sort.c |   22 ++
 tools/perf/util/sort.h |2 ++
 4 files changed, 29 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a69f027..a6650d7 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2,6 +2,7 @@
 #include "build-id.h"
 #include "hist.h"
 #include "session.h"
+#include "namespaces.h"
 #include "sort.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
}
 
+   hists__new_col_len(hists, HISTC_CGROUP_ID, 10);
hists__new_col_len(hists, HISTC_CPU, 3);
hists__new_col_len(hists, HISTC_SOCKET, 6);
hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
   bool sample_self,
   struct hist_entry_ops *ops)
 {
+   struct namespaces *ns = thread__namespaces(al->thread);
struct hist_entry entry = {
.thread = al->thread,
.comm = thread__comm(al->thread),
+   .cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
.ms = {
.map= al->map,
.sym= al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9928fed..894c95d 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -29,6 +29,7 @@ enum hist_column {
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
+   HISTC_CGROUP_ID,
HISTC_PARENT,
HISTC_CPU,
HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 452e15a..b6152df 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,27 @@ struct sort_entry sort_cpu = {
.se_width_idx   = HISTC_CPU,
 };
 
+/* --sort cgroup_id */
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return (int64_t)right->cgroup_id - (int64_t)left->cgroup_id;
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he, char *bf,
+ size_t size, unsigned int width)
+{
+   return repsep_snprintf(bf, size, "%-*u", width, he->cgroup_id);
+}
+
+struct sort_entry sort_cgroup_id = {
+   .se_header  = "cgroup id",
+   .se_cmp = sort__cgroup_id_cmp,
+   .se_snprintf= hist_entry__cgroup_id_snprintf,
+   .se_width_idx   = HISTC_CGROUP_ID,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = {
DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
DIM(SORT_TRANSACTION, "transaction", sort_transaction),
DIM(SORT_TRACE, "trace", sort_trace),
+   DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 099c975..e8058f6 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -95,6 +95,7 @@ struct hist_entry {
u64 transaction;
s32 socket;
s32 cpu;
+   u32 cgroup_id;
u8  cpumode;
u8  depth;
 
@@ -211,6 +212,7 @@ enum sort_type {
SORT_GLOBAL_WEIGHT,
SORT_TRANSACTION,
SORT_TRACE,
+   SORT_CGROUP_ID,
 
/* branch stack specific sort keys */
__SORT_BRANCH_STACK,