Re: [RESEND PATCH V2 3/4] perf/x86/intel: drain PEBS buffer in event read

2018-01-10 Thread Liang, Kan



On 1/10/2018 5:39 AM, Jiri Olsa wrote:

On Mon, Jan 08, 2018 at 07:15:15AM -0800, kan.li...@intel.com wrote:

From: Kan Liang 

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value needed for event update unless
flush the PEBS buffer.

Drain the PEBS buffer in event read when large PEBS is enabled.

For the threshold is one, even auto-reload is enabled, it doesn't need
to be specially handled. Because auto-reload is only effect when event
overflow. There is no overflow in event read.

Signed-off-by: Kan Liang 
---
  arch/x86/events/intel/core.c |  9 +
  arch/x86/events/intel/ds.c   | 10 ++
  arch/x86/events/perf_event.h |  2 ++
  3 files changed, 21 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 09c26a4..bdc35f8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event *event)
intel_pmu_pebs_del(event);
  }
  
+static void intel_pmu_read_event(struct perf_event *event)

+{
+   if (event->attr.precise_ip)
+   return intel_pmu_pebs_read(event);


check for (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
would be more accurate?



It will narrow down the events.
But considering the readability, I think it would be better to use 
precise_ip.The exposed functions in ds.c should be generic functions for 
all PEBS events, not specific case.

I think _AUTO_RELOAD looks too specific.

Thanks,
Kan



Re: [RESEND PATCH V2 3/4] perf/x86/intel: drain PEBS buffer in event read

2018-01-10 Thread Liang, Kan



On 1/10/2018 5:39 AM, Jiri Olsa wrote:

On Mon, Jan 08, 2018 at 07:15:15AM -0800, kan.li...@intel.com wrote:

From: Kan Liang 

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value needed for event update unless
flush the PEBS buffer.

Drain the PEBS buffer in event read when large PEBS is enabled.

For the threshold is one, even auto-reload is enabled, it doesn't need
to be specially handled. Because auto-reload is only effect when event
overflow. There is no overflow in event read.

Signed-off-by: Kan Liang 
---
  arch/x86/events/intel/core.c |  9 +
  arch/x86/events/intel/ds.c   | 10 ++
  arch/x86/events/perf_event.h |  2 ++
  3 files changed, 21 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 09c26a4..bdc35f8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event *event)
intel_pmu_pebs_del(event);
  }
  
+static void intel_pmu_read_event(struct perf_event *event)

+{
+   if (event->attr.precise_ip)
+   return intel_pmu_pebs_read(event);


check for (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
would be more accurate?



It will narrow down the events.
But considering the readability, I think it would be better to use 
precise_ip.The exposed functions in ds.c should be generic functions for 
all PEBS events, not specific case.

I think _AUTO_RELOAD looks too specific.

Thanks,
Kan



RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-09 Thread Liang, Kan
> > > > >
> > > > > Also I guess the current code might miss some events since the head
> can
> > > be
> > > > > different between _read_init() and _read_done(), no?
> > > > >
> > > >
> > > > The overwrite mode requires the ring buffer to be paused during
> > > processing.
> > > > The head is unchanged between __read_init() and __read_done().
> > >
> > > Ah, ok then.  Maybe we could read the head once, and use it during
> > > processing.
> >
> > Yes, it only needs to read head once for overwrite mode.
> > But for non-overwrite, we have to read the head in every
> > perf_mmap__read_event(). Because the head is floating.
> > The non-overwrite is specially handled in patch 5/12 as well.
> 
> Right, I understand it for the non-overwrite mode.
> 
> But, for the overwrite mode, my concern was that it might be possible
> that it reads a stale head in __read_init() (even after it paused the
> ring buffer) and reads an update head in __read_done().  Then it's
> gonna miss some records.  I'm not sure whether it reads the same head
> in __read_init() and __read_done() by the pause.
>

The only scenario which may cause the different 'head' may be as below.
The 'rb->head' is updated in __perf_output_begin(), but haven’t been
assigned to 'pc->data_head' for perf tool. During this period, the 'paused'
is set and __read_init() reads head.
But this scenario never happens because of the ringbuffer lock.

Otherwise, I cannot imagine any other scenarios which may causes the
different 'head' in __read_init() and __read_done() with ringbuffer
paused. Please let me know if there is an example.

There would be some records miss. But it's only because the ringbuffer
is paused. The head should keep the same.

I also did some tests and dump the 'head' in __read_init() and
__read_done() with ringbuffer paused. I didn't see any difference either.

Thanks,
Kan
 
> Thanks,
> Namhyung
> 
> 
> > +   /* non-overwirte doesn't pause the ringbuffer */
> > +   if (!overwrite)
> > +   end = perf_mmap__read_head(map);


RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-09 Thread Liang, Kan
> > > > >
> > > > > Also I guess the current code might miss some events since the head
> can
> > > be
> > > > > different between _read_init() and _read_done(), no?
> > > > >
> > > >
> > > > The overwrite mode requires the ring buffer to be paused during
> > > processing.
> > > > The head is unchanged between __read_init() and __read_done().
> > >
> > > Ah, ok then.  Maybe we could read the head once, and use it during
> > > processing.
> >
> > Yes, it only needs to read head once for overwrite mode.
> > But for non-overwrite, we have to read the head in every
> > perf_mmap__read_event(). Because the head is floating.
> > The non-overwrite is specially handled in patch 5/12 as well.
> 
> Right, I understand it for the non-overwrite mode.
> 
> But, for the overwrite mode, my concern was that it might be possible
> that it reads a stale head in __read_init() (even after it paused the
> ring buffer) and reads an update head in __read_done().  Then it's
> gonna miss some records.  I'm not sure whether it reads the same head
> in __read_init() and __read_done() by the pause.
>

The only scenario which may cause the different 'head' may be as below.
The 'rb->head' is updated in __perf_output_begin(), but haven’t been
assigned to 'pc->data_head' for perf tool. During this period, the 'paused'
is set and __read_init() reads head.
But this scenario never happens because of the ringbuffer lock.

Otherwise, I cannot imagine any other scenarios which may causes the
different 'head' in __read_init() and __read_done() with ringbuffer
paused. Please let me know if there is an example.

There would be some records miss. But it's only because the ringbuffer
is paused. The head should keep the same.

I also did some tests and dump the 'head' in __read_init() and
__read_done() with ringbuffer paused. I didn't see any difference either.

Thanks,
Kan
 
> Thanks,
> Namhyung
> 
> 
> > +   /* non-overwirte doesn't pause the ringbuffer */
> > +   if (!overwrite)
> > +   end = perf_mmap__read_head(map);


RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2018-01-04 Thread Liang, Kan

> Dear Kan,
> 
> On Wed, Jan 3, 2018 at 9:20 PM, Liang, Kan <kan.li...@intel.com> wrote:
> > Hi Stephane and Andi,
> >
> > Could you please review the script?
> >
> > If it's OK for you, could you please Ack/Review this?
> >
> > Thanks,
> > Kan
> >
> >>
> >> From: Kan Liang <kan.li...@intel.com>
> >>
> >> There could be different types of memory in the system. E.g normal
> >> System Memory, Persistent Memory. To understand how the workload
> maps
> >> to
> >> those memories, it's important to know the I/O statistics of them.
> >> Perf can collect physical addresses, but those are raw data.
> >> It still needs extra work to resolve the physical addresses.
> >> Provide a script to facilitate the physical addresses resolving and
> >> I/O statistics.
> >>
> >> Profile with MEM_INST_RETIRED.ALL_LOADS or
> >> MEM_UOPS_RETIRED.ALL_LOADS
> >> event if any of them is available.
> >> Look up the /proc/iomem and resolve the physical address.
> >> Provide memory type summary.
> >>
> >> Here is an example output.
> >>  #perf script report mem-phys-addr
> >> Event: mem_inst_retired.all_loads:P
> >> Memory typecount   percentage
> >>   ---  ---
> >> System RAM7453.2%
> >> Persistent Memory 5539.6%
> >> N/A   10 7.2%
> >>
> >> Signed-off-by: Kan Liang <kan.li...@intel.com>
> >> ---
> >>
> >> Changes since V1:
> >>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
> >>Only profile the loads.
> >>  - Use event name to replace the RAW event
> >>
> >>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
> >>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
> >>  tools/perf/scripts/python/mem-phys-addr.py | 97
> >> ++
> >>  .../util/scripting-engines/trace-event-python.c|  2 +
> >>  4 files changed, 121 insertions(+)
> >>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> report
> >>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >>
> >> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> new file mode 100644
> >> index 000..5a87512
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> @@ -0,0 +1,19 @@
> >> +#!/bin/bash
> >> +
> >> +#
> >> +# Profiling physical memory by all retired load instructions/uops event
> >> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> >> +#
> >> +
> >> +load=`perf list | grep mem_inst_retired.all_loads`
> >> +if [ -z "$load" ]; then
> >> + load=`perf list | grep mem_uops_retired.all_loads`
> >> +fi
> >> +if [ -z "$load" ]; then
> >> + echo "There is no event to count all retired load instructions/uops."
> >> + exit 1
> >> +fi
> >> +
> >> +arg=$(echo $load | tr -d ' ')
> >> +arg="$arg:P"
> >> +perf record --phys-data -e $arg $@
> >> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> new file mode 100644
> >> index 000..3f2b847
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> @@ -0,0 +1,3 @@
> >> +#!/bin/bash
> >> +# description: resolve physical address samples
> >> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-
> addr.py
> >> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> >> b/tools/perf/scripts/python/mem-phys-addr.py
> >> new file mode 100644
> >> index 000..1d1f757
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> >> @@ -0,0 +1,97 @@
> >> +# mem-phys-addr.py: Resolve physical address samples
> >> +# Copyright (c) 2017, Intel Corporation.
> >> +#
> >> +# This program is free software; you can redistribute it and/o

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2018-01-04 Thread Liang, Kan

> Dear Kan,
> 
> On Wed, Jan 3, 2018 at 9:20 PM, Liang, Kan  wrote:
> > Hi Stephane and Andi,
> >
> > Could you please review the script?
> >
> > If it's OK for you, could you please Ack/Review this?
> >
> > Thanks,
> > Kan
> >
> >>
> >> From: Kan Liang 
> >>
> >> There could be different types of memory in the system. E.g normal
> >> System Memory, Persistent Memory. To understand how the workload
> maps
> >> to
> >> those memories, it's important to know the I/O statistics of them.
> >> Perf can collect physical addresses, but those are raw data.
> >> It still needs extra work to resolve the physical addresses.
> >> Provide a script to facilitate the physical addresses resolving and
> >> I/O statistics.
> >>
> >> Profile with MEM_INST_RETIRED.ALL_LOADS or
> >> MEM_UOPS_RETIRED.ALL_LOADS
> >> event if any of them is available.
> >> Look up the /proc/iomem and resolve the physical address.
> >> Provide memory type summary.
> >>
> >> Here is an example output.
> >>  #perf script report mem-phys-addr
> >> Event: mem_inst_retired.all_loads:P
> >> Memory typecount   percentage
> >>   ---  ---
> >> System RAM7453.2%
> >> Persistent Memory 5539.6%
> >> N/A   10 7.2%
> >>
> >> Signed-off-by: Kan Liang 
> >> ---
> >>
> >> Changes since V1:
> >>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
> >>Only profile the loads.
> >>  - Use event name to replace the RAW event
> >>
> >>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
> >>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
> >>  tools/perf/scripts/python/mem-phys-addr.py | 97
> >> ++
> >>  .../util/scripting-engines/trace-event-python.c|  2 +
> >>  4 files changed, 121 insertions(+)
> >>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> report
> >>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >>
> >> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> new file mode 100644
> >> index 000..5a87512
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> @@ -0,0 +1,19 @@
> >> +#!/bin/bash
> >> +
> >> +#
> >> +# Profiling physical memory by all retired load instructions/uops event
> >> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> >> +#
> >> +
> >> +load=`perf list | grep mem_inst_retired.all_loads`
> >> +if [ -z "$load" ]; then
> >> + load=`perf list | grep mem_uops_retired.all_loads`
> >> +fi
> >> +if [ -z "$load" ]; then
> >> + echo "There is no event to count all retired load instructions/uops."
> >> + exit 1
> >> +fi
> >> +
> >> +arg=$(echo $load | tr -d ' ')
> >> +arg="$arg:P"
> >> +perf record --phys-data -e $arg $@
> >> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> new file mode 100644
> >> index 000..3f2b847
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> @@ -0,0 +1,3 @@
> >> +#!/bin/bash
> >> +# description: resolve physical address samples
> >> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-
> addr.py
> >> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> >> b/tools/perf/scripts/python/mem-phys-addr.py
> >> new file mode 100644
> >> index 000..1d1f757
> >> --- /dev/null
> >> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> >> @@ -0,0 +1,97 @@
> >> +# mem-phys-addr.py: Resolve physical address samples
> >> +# Copyright (c) 2017, Intel Corporation.
> >> +#
> >> +# This program is free software; you can redistribute it and/or modify it
> >> +# under the terms and conditions of the GNU General

RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-04 Thread Liang, Kan
> Hi Kan,
> 
> On Wed, Jan 03, 2018 at 02:15:38PM +0000, Liang, Kan wrote:
> > > Hello,
> > >
> > > On Thu, Dec 21, 2017 at 10:08:46AM -0800, kan.li...@intel.com wrote:
> > > > From: Kan Liang <kan.li...@intel.com>
> > > >
> > > > The direction of overwrite mode is backward. The last
> mmap__read_event
> > > > will set tail to map->prev. Need to correct the map->prev to head
> > > > which is the end of next read.
> > >
> > > Why do you update the map->prev needlessly then?  I think we don't
> need it
> > > for overwrite/backward mode, right?
> >
> > The map->prev is needless only when the overwrite does really happen in
> ringbuffer.
> > In a light load system or with big ringbuffer, the unprocessed data will not
> be
> > overwritten. So it's necessary to keep an pointer to indicate the last
> position.
> >
> > Overwrite mode is backward, but the event processing is always forward.
> > So map->prev has to be updated in __read_done().
> 
> Yep, I meant that updating map->prev in every perf_mmap__read_event()
> is unnecessary for the overwrite mode.  It only needs to be set in
> perf_mmap__read_done(), right?

Right, for overwrite, only updating the map->prev in perf_mmap__read_done()
is enough.

But for non-overwrite, we have to update map->prev.
It will be used by perf_mmap__consume() later to write the ring buffer tail.
So I specially handle the non-overwrite as below in patch 5/12.
+   event = perf_mmap__read(map, start, end);
+
+   if (!overwrite)
+   map->prev = *start;

> >
> > >
> > > Also I guess the current code might miss some events since the head can
> be
> > > different between _read_init() and _read_done(), no?
> > >
> >
> > The overwrite mode requires the ring buffer to be paused during
> processing.
> > The head is unchanged between __read_init() and __read_done().
> 
> Ah, ok then.  Maybe we could read the head once, and use it during
> processing.

Yes, it only needs to read head once for overwrite mode.
But for non-overwrite, we have to read the head in every
perf_mmap__read_event(). Because the head is floating.
The non-overwrite is specially handled in patch 5/12 as well.
+   /* non-overwirte doesn't pause the ringbuffer */
+   if (!overwrite)
+   end = perf_mmap__read_head(map);



Thanks,
Kan

> 
> Thanks,
> Namhyung
> 
> 
> >
> > The event during the pause should be missed. But I think it has little 
> > impact
> for
> > the accuracy of the snapshot and can be tolerant for perf top.
> > I mentioned it in the change log of patch 11/12.
> > I also removed the lost events checking for perf top.



RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-04 Thread Liang, Kan
> Hi Kan,
> 
> On Wed, Jan 03, 2018 at 02:15:38PM +0000, Liang, Kan wrote:
> > > Hello,
> > >
> > > On Thu, Dec 21, 2017 at 10:08:46AM -0800, kan.li...@intel.com wrote:
> > > > From: Kan Liang 
> > > >
> > > > The direction of overwrite mode is backward. The last
> mmap__read_event
> > > > will set tail to map->prev. Need to correct the map->prev to head
> > > > which is the end of next read.
> > >
> > > Why do you update the map->prev needlessly then?  I think we don't
> need it
> > > for overwrite/backward mode, right?
> >
> > The map->prev is needless only when the overwrite does really happen in
> ringbuffer.
> > In a light load system or with big ringbuffer, the unprocessed data will not
> be
> > overwritten. So it's necessary to keep an pointer to indicate the last
> position.
> >
> > Overwrite mode is backward, but the event processing is always forward.
> > So map->prev has to be updated in __read_done().
> 
> Yep, I meant that updating map->prev in every perf_mmap__read_event()
> is unnecessary for the overwrite mode.  It only needs to be set in
> perf_mmap__read_done(), right?

Right, for overwrite, only updating the map->prev in perf_mmap__read_done()
is enough.

But for non-overwrite, we have to update map->prev.
It will be used by perf_mmap__consume() later to write the ring buffer tail.
So I specially handle the non-overwrite as below in patch 5/12.
+   event = perf_mmap__read(map, start, end);
+
+   if (!overwrite)
+   map->prev = *start;

> >
> > >
> > > Also I guess the current code might miss some events since the head can
> be
> > > different between _read_init() and _read_done(), no?
> > >
> >
> > The overwrite mode requires the ring buffer to be paused during
> processing.
> > The head is unchanged between __read_init() and __read_done().
> 
> Ah, ok then.  Maybe we could read the head once, and use it during
> processing.

Yes, it only needs to read head once for overwrite mode.
But for non-overwrite, we have to read the head in every
perf_mmap__read_event(). Because the head is floating.
The non-overwrite is specially handled in patch 5/12 as well.
+   /* non-overwirte doesn't pause the ringbuffer */
+   if (!overwrite)
+   end = perf_mmap__read_head(map);



Thanks,
Kan

> 
> Thanks,
> Namhyung
> 
> 
> >
> > The event during the pause should be missed. But I think it has little 
> > impact
> for
> > the accuracy of the snapshot and can be tolerant for perf top.
> > I mentioned it in the change log of patch 11/12.
> > I also removed the lost events checking for perf top.



RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2018-01-03 Thread Liang, Kan
Hi Stephane and Andi,

Could you please review the script?

If it's OK for you, could you please Ack/Review this?

Thanks,
Kan

> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> 

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2018-01-03 Thread Liang, Kan
Hi Stephane and Andi,

Could you please review the script?

If it's OK for you, could you please Ack/Review this?

Thanks,
Kan

> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> "percentage"),
> + print "%-40s  %10s 

RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-03 Thread Liang, Kan
> Hello,
> 
> On Thu, Dec 21, 2017 at 10:08:46AM -0800, kan.li...@intel.com wrote:
> > From: Kan Liang 
> >
> > The direction of overwrite mode is backward. The last mmap__read_event
> > will set tail to map->prev. Need to correct the map->prev to head
> > which is the end of next read.
> 
> Why do you update the map->prev needlessly then?  I think we don't need it
> for overwrite/backward mode, right?

The map->prev is needless only when the overwrite does really happen in 
ringbuffer.
In a light load system or with big ringbuffer, the unprocessed data will not be
overwritten. So it's necessary to keep an pointer to indicate the last position.

Overwrite mode is backward, but the event processing is always forward.
So map->prev has to be updated in __read_done().

> 
> Also I guess the current code might miss some events since the head can be
> different between _read_init() and _read_done(), no?
> 

The overwrite mode requires the ring buffer to be paused during processing.
The head is unchanged between __read_init() and __read_done().

The event during the pause should be missed. But I think it has little impact 
for
the accuracy of the snapshot and can be tolerant for perf top.
I mentioned it in the change log of patch 11/12.
I also removed the lost events checking for perf top.

Thanks,
Kan

> Thanks,
> Namhyung
> 
> 
> >
> > It will be used later.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/mmap.c | 11 +++  tools/perf/util/mmap.h |  1
> > +
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> > a844a2f..4aaeb64 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md,
> bool
> > overwrite,
> >  out:
> > return rc;
> >  }
> > +
> > +/*
> > + * Mandatory for overwrite mode
> > + * The direction of overwrite mode is backward.
> > + * The last mmap__read_event will set tail to map->prev.
> > + * Need to correct the map->prev to head which is the end of next read.
> > + */
> > +void perf_mmap__read_done(struct perf_mmap *map) {
> > +   map->prev = perf_mmap__read_head(map); }
> > diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index
> > abe9b9f..2df27c1 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map);
> >
> >  int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> >  u64 *start, u64 *end);
> > +void perf_mmap__read_done(struct perf_mmap *map);
> >  #endif /*__PERF_MMAP_H */
> > --
> > 2.5.5
> >


RE: [PATCH V3 04/12] perf mmap: introduce perf_mmap__read_done

2018-01-03 Thread Liang, Kan
> Hello,
> 
> On Thu, Dec 21, 2017 at 10:08:46AM -0800, kan.li...@intel.com wrote:
> > From: Kan Liang 
> >
> > The direction of overwrite mode is backward. The last mmap__read_event
> > will set tail to map->prev. Need to correct the map->prev to head
> > which is the end of next read.
> 
> Why do you update the map->prev needlessly then?  I think we don't need it
> for overwrite/backward mode, right?

The map->prev is needless only when the overwrite does really happen in 
ringbuffer.
In a light load system or with big ringbuffer, the unprocessed data will not be
overwritten. So it's necessary to keep an pointer to indicate the last position.

Overwrite mode is backward, but the event processing is always forward.
So map->prev has to be updated in __read_done().

> 
> Also I guess the current code might miss some events since the head can be
> different between _read_init() and _read_done(), no?
> 

The overwrite mode requires the ring buffer to be paused during processing.
The head is unchanged between __read_init() and __read_done().

The event during the pause should be missed. But I think it has little impact 
for
the accuracy of the snapshot and can be tolerant for perf top.
I mentioned it in the change log of patch 11/12.
I also removed the lost events checking for perf top.

Thanks,
Kan

> Thanks,
> Namhyung
> 
> 
> >
> > It will be used later.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/mmap.c | 11 +++  tools/perf/util/mmap.h |  1
> > +
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> > a844a2f..4aaeb64 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md,
> bool
> > overwrite,
> >  out:
> > return rc;
> >  }
> > +
> > +/*
> > + * Mandatory for overwrite mode
> > + * The direction of overwrite mode is backward.
> > + * The last mmap__read_event will set tail to map->prev.
> > + * Need to correct the map->prev to head which is the end of next read.
> > + */
> > +void perf_mmap__read_done(struct perf_mmap *map) {
> > +   map->prev = perf_mmap__read_head(map); }
> > diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index
> > abe9b9f..2df27c1 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map);
> >
> >  int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> >  u64 *start, u64 *end);
> > +void perf_mmap__read_done(struct perf_mmap *map);
> >  #endif /*__PERF_MMAP_H */
> > --
> > 2.5.5
> >


Re: [PATCH V2 4/4] perf/x86: fix: disable userspace RDPMC usage for large PEBS

2017-12-20 Thread Liang, Kan



On 12/20/2017 4:41 PM, Andi Kleen wrote:

On Wed, Dec 20, 2017 at 11:42:51AM -0800, kan.li...@linux.intel.com wrote:

From: Kan Liang 

The userspace RDPMC usage never works for large PEBS since the large
PEBS is introduced by
commit b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt
handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value for userspace RDPMC.

Disable the userspace RDPMC usage when large PEBS is enabled.


The only drawback is that the event wraps quickly. 


It's broken. The value read from RDPMC command will always between 
-reload_val and 0. Without reload times, I think it's impossible to 
calculate the total count number.



I suspect in many
cases it's still usable for measuring short regions.



Unless the short region is less than reload_val.
Otherwise, reload times is needed.

Thanks,
Kan


I wouldn't force disable it, just let the users deal with it.

-Andi



Re: [PATCH V2 4/4] perf/x86: fix: disable userspace RDPMC usage for large PEBS

2017-12-20 Thread Liang, Kan



On 12/20/2017 4:41 PM, Andi Kleen wrote:

On Wed, Dec 20, 2017 at 11:42:51AM -0800, kan.li...@linux.intel.com wrote:

From: Kan Liang 

The userspace RDPMC usage never works for large PEBS since the large
PEBS is introduced by
commit b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt
handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value for userspace RDPMC.

Disable the userspace RDPMC usage when large PEBS is enabled.


The only drawback is that the event wraps quickly. 


It's broken. The value read from RDPMC command will always between 
-reload_val and 0. Without reload times, I think it's impossible to 
calculate the total count number.



I suspect in many
cases it's still usable for measuring short regions.



Unless the short region is less than reload_val.
Otherwise, reload times is needed.

Thanks,
Kan


I wouldn't force disable it, just let the users deal with it.

-Andi



Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 5:07 PM, Peter Zijlstra wrote:

On Tue, Dec 19, 2017 at 03:08:58PM -0500, Liang, Kan wrote:

This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto reload.
Both period_left and event->count are carefully recalculated for auto
reload.


How does prev_count make sense when we've already reloaded a bunch of
times?


Same as non-auto-reload, it's the 'left' (unfinished) period from last time.
The period for the first record should always be the 'left' period no 
matter on which case.
For auto-reload, it doesn't need to increase the prev_count with the 
reload. Because for later records, the period should be exactly the same 
as the reload value.


To calculate the event->count,
For auto-reload, the event->count = prev_count + (reload times - 1) * 
reload value + gap between PMI trigger and PMI handler.


For non-auto-reload, the event->count = prev_count + gap between PMI 
trigger and PMI handler.


The 'prev_count' is same for both auto-reload and non-auto-reload.

The gap is a little bit tricky for auto-reload. Because it starts from 
-reload_value. But for non-auto-reload, it starts from 0.

"delta += (reload_val << shift);" is used to correct it.




For x86_perf_event_set_period(), there is nothing special needed for auto
reload. The period is fixed. The period_left from x86_perf_event_update() is
already handled.


Hurm.. I see. But rather than make an ever bigger trainwreck of things,
I'd rather you just write a special purpose intel_pmu_save_and_restart()
just for AUTO_RELOAD.


OK. I will do it in V2.

Thanks,
Kan


Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 5:07 PM, Peter Zijlstra wrote:

On Tue, Dec 19, 2017 at 03:08:58PM -0500, Liang, Kan wrote:

This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto reload.
Both period_left and event->count are carefully recalculated for auto
reload.


How does prev_count make sense when we've already reloaded a bunch of
times?


Same as non-auto-reload, it's the 'left' (unfinished) period from last time.
The period for the first record should always be the 'left' period no 
matter on which case.
For auto-reload, it doesn't need to increase the prev_count with the 
reload. Because for later records, the period should be exactly the same 
as the reload value.


To calculate the event->count,
For auto-reload, the event->count = prev_count + (reload times - 1) * 
reload value + gap between PMI trigger and PMI handler.


For non-auto-reload, the event->count = prev_count + gap between PMI 
trigger and PMI handler.


The 'prev_count' is same for both auto-reload and non-auto-reload.

The gap is a little bit tricky for auto-reload. Because it starts from 
-reload_value. But for non-auto-reload, it starts from 0.

"delta += (reload_val << shift);" is used to correct it.




For x86_perf_event_set_period(), there is nothing special needed for auto
reload. The period is fixed. The period_left from x86_perf_event_update() is
already handled.


Hurm.. I see. But rather than make an ever bigger trainwreck of things,
I'd rather you just write a special purpose intel_pmu_save_and_restart()
just for AUTO_RELOAD.


OK. I will do it in V2.

Thanks,
Kan


Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 3:08 PM, Liang, Kan wrote:



On 12/19/2017 1:58 PM, Peter Zijlstra wrote:
On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.li...@linux.intel.com 
wrote:

  arch/x86/events/core.c | 14 ++
  arch/x86/events/intel/ds.c |  8 +++-
  2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 35552ea..f74e21d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
   * of the count.
   */
  delta = (new_raw_count << shift) - (prev_raw_count << shift);
+
+    /*
+ * Take auto-reload into account
+ * For the auto-reload before the last time, it went through the
+ * whole period (reload_val) every time.
+ * Just simply add period * times to the event.
+ *
+ * For the last load, the elapsed delta (event-)time need to be
+ * corrected by adding the period. Because the start point is 
-period.

+ */
+    if (reload_times > 0) {
+    delta += (reload_val << shift);
+    local64_add(reload_val * (reload_times - 1), >count);
+    }
  delta >>= shift;
  local64_add(delta, >count);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0b693b7..f0f6026 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct 
perf_event *event,

 void *base, void *top,
 int bit, int count)
  {
+    struct hw_perf_event *hwc = >hw;
  struct perf_sample_data data;
  struct pt_regs regs;
  void *at = get_next_pebs_record_by_bit(base, top, bit);
-    if (!intel_pmu_save_and_restart(event, 0, 0) &&
+    /*
+ * Now, auto-reload is only enabled in fixed period mode.
+ * The reload value is always hwc->sample_period.
+ * May need to change it, if auto-reload is enabled in freq mode 
later.

+ */
+    if (!intel_pmu_save_and_restart(event, hwc->sample_period, count 
- 1) &&

  !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
  return;


This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto 
reload. Both period_left and event->count are carefully recalculated for 
auto reload.


Think a bit more. You are right. We cannot rely on count to tell us if 
it's auto reload.

The count could also be 1 if auto reload is enabled.

I will fix it in V2.

Thanks,
Kan

For x86_perf_event_set_period(), there is nothing special needed for 
auto reload. The period is fixed. The period_left from 
x86_perf_event_update() is already handled.



BTW: It should be 'count' not 'count - 1' which pass to
intel_pmu_save_and_restart(). I just found the issue. I will fix it in
V2 with other improvements if there are any.

Thanks,
Kan




Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 3:08 PM, Liang, Kan wrote:



On 12/19/2017 1:58 PM, Peter Zijlstra wrote:
On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.li...@linux.intel.com 
wrote:

  arch/x86/events/core.c | 14 ++
  arch/x86/events/intel/ds.c |  8 +++-
  2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 35552ea..f74e21d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
   * of the count.
   */
  delta = (new_raw_count << shift) - (prev_raw_count << shift);
+
+    /*
+ * Take auto-reload into account
+ * For the auto-reload before the last time, it went through the
+ * whole period (reload_val) every time.
+ * Just simply add period * times to the event.
+ *
+ * For the last load, the elapsed delta (event-)time need to be
+ * corrected by adding the period. Because the start point is 
-period.

+ */
+    if (reload_times > 0) {
+    delta += (reload_val << shift);
+    local64_add(reload_val * (reload_times - 1), >count);
+    }
  delta >>= shift;
  local64_add(delta, >count);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0b693b7..f0f6026 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct 
perf_event *event,

 void *base, void *top,
 int bit, int count)
  {
+    struct hw_perf_event *hwc = >hw;
  struct perf_sample_data data;
  struct pt_regs regs;
  void *at = get_next_pebs_record_by_bit(base, top, bit);
-    if (!intel_pmu_save_and_restart(event, 0, 0) &&
+    /*
+ * Now, auto-reload is only enabled in fixed period mode.
+ * The reload value is always hwc->sample_period.
+ * May need to change it, if auto-reload is enabled in freq mode 
later.

+ */
+    if (!intel_pmu_save_and_restart(event, hwc->sample_period, count 
- 1) &&

  !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
  return;


This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto 
reload. Both period_left and event->count are carefully recalculated for 
auto reload.


Think a bit more. You are right. We cannot rely on count to tell us if 
it's auto reload.

The count could also be 1 if auto reload is enabled.

I will fix it in V2.

Thanks,
Kan

For x86_perf_event_set_period(), there is nothing special needed for 
auto reload. The period is fixed. The period_left from 
x86_perf_event_update() is already handled.



BTW: It should be 'count' not 'count - 1' which pass to
intel_pmu_save_and_restart(). I just found the issue. I will fix it in
V2 with other improvements if there are any.

Thanks,
Kan




Re: [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read

2017-12-19 Thread Liang, Kan



On 12/19/2017 2:02 PM, Peter Zijlstra wrote:

On Mon, Dec 18, 2017 at 03:34:51AM -0800, kan.li...@linux.intel.com wrote:

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -926,6 +926,16 @@ void intel_pmu_pebs_del(struct perf_event *event)
pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
  }
  
+void intel_pmu_pebs_read(struct perf_event *event)

+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   if (pebs_needs_sched_cb(cpuc))
+   return intel_pmu_drain_pebs_buffer();
+
+   x86_perf_event_update(event, 0, 0);
+}


This is completely broken.. what if @event isn't a pebs event, but we
do have an auto-reloading pebs event configured?



precise_ip will be checked before intel_pmu_pebs_read is called.
So @event must be a pebs event.

@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event 
*event)

intel_pmu_pebs_del(event);
 }

+static void intel_pmu_read_event(struct perf_event *event)
+{
+   if (event->attr.precise_ip)
+   return intel_pmu_pebs_read(event);
+
+   x86_perf_event_update(event, 0, 0);
+}


Re: [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read

2017-12-19 Thread Liang, Kan



On 12/19/2017 2:02 PM, Peter Zijlstra wrote:

On Mon, Dec 18, 2017 at 03:34:51AM -0800, kan.li...@linux.intel.com wrote:

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -926,6 +926,16 @@ void intel_pmu_pebs_del(struct perf_event *event)
pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
  }
  
+void intel_pmu_pebs_read(struct perf_event *event)

+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   if (pebs_needs_sched_cb(cpuc))
+   return intel_pmu_drain_pebs_buffer();
+
+   x86_perf_event_update(event, 0, 0);
+}


This is completely broken.. what if @event isn't a pebs event, but we
do have an auto-reloading pebs event configured?



precise_ip will be checked before intel_pmu_pebs_read is called.
So @event must be a pebs event.

@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event 
*event)

intel_pmu_pebs_del(event);
 }

+static void intel_pmu_read_event(struct perf_event *event)
+{
+   if (event->attr.precise_ip)
+   return intel_pmu_pebs_read(event);
+
+   x86_perf_event_update(event, 0, 0);
+}


Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 1:58 PM, Peter Zijlstra wrote:

On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.li...@linux.intel.com wrote:

  arch/x86/events/core.c | 14 ++
  arch/x86/events/intel/ds.c |  8 +++-
  2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 35552ea..f74e21d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
 * of the count.
 */
delta = (new_raw_count << shift) - (prev_raw_count << shift);
+
+   /*
+* Take auto-reload into account
+* For the auto-reload before the last time, it went through the
+* whole period (reload_val) every time.
+* Just simply add period * times to the event.
+*
+* For the last load, the elapsed delta (event-)time need to be
+* corrected by adding the period. Because the start point is -period.
+*/
+   if (reload_times > 0) {
+   delta += (reload_val << shift);
+   local64_add(reload_val * (reload_times - 1), >count);
+   }
delta >>= shift;
  
  	local64_add(delta, >count);

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0b693b7..f0f6026 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
   void *base, void *top,
   int bit, int count)
  {
+   struct hw_perf_event *hwc = >hw;
struct perf_sample_data data;
struct pt_regs regs;
void *at = get_next_pebs_record_by_bit(base, top, bit);
  
-	if (!intel_pmu_save_and_restart(event, 0, 0) &&

+   /*
+* Now, auto-reload is only enabled in fixed period mode.
+* The reload value is always hwc->sample_period.
+* May need to change it, if auto-reload is enabled in freq mode later.
+*/
+   if (!intel_pmu_save_and_restart(event, hwc->sample_period, count - 1) &&
!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
return;
  


This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto 
reload. Both period_left and event->count are carefully recalculated for 
auto reload.
For x86_perf_event_set_period(), there is nothing special needed for 
auto reload. The period is fixed. The period_left from 
x86_perf_event_update() is already handled.



BTW: It should be 'count' not 'count - 1' which pass to
intel_pmu_save_and_restart(). I just found the issue. I will fix it in
V2 with other improvements if there are any.

Thanks,
Kan




Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload

2017-12-19 Thread Liang, Kan



On 12/19/2017 1:58 PM, Peter Zijlstra wrote:

On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.li...@linux.intel.com wrote:

  arch/x86/events/core.c | 14 ++
  arch/x86/events/intel/ds.c |  8 +++-
  2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 35552ea..f74e21d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
 * of the count.
 */
delta = (new_raw_count << shift) - (prev_raw_count << shift);
+
+   /*
+* Take auto-reload into account
+* For the auto-reload before the last time, it went through the
+* whole period (reload_val) every time.
+* Just simply add period * times to the event.
+*
+* For the last load, the elapsed delta (event-)time need to be
+* corrected by adding the period. Because the start point is -period.
+*/
+   if (reload_times > 0) {
+   delta += (reload_val << shift);
+   local64_add(reload_val * (reload_times - 1), >count);
+   }
delta >>= shift;
  
  	local64_add(delta, >count);

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0b693b7..f0f6026 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
   void *base, void *top,
   int bit, int count)
  {
+   struct hw_perf_event *hwc = >hw;
struct perf_sample_data data;
struct pt_regs regs;
void *at = get_next_pebs_record_by_bit(base, top, bit);
  
-	if (!intel_pmu_save_and_restart(event, 0, 0) &&

+   /*
+* Now, auto-reload is only enabled in fixed period mode.
+* The reload value is always hwc->sample_period.
+* May need to change it, if auto-reload is enabled in freq mode later.
+*/
+   if (!intel_pmu_save_and_restart(event, hwc->sample_period, count - 1) &&
!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
return;
  


This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.



I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto 
reload. Both period_left and event->count are carefully recalculated for 
auto reload.
For x86_perf_event_set_period(), there is nothing special needed for 
auto reload. The period is fixed. The period_left from 
x86_perf_event_update() is already handled.



BTW: It should be 'count' not 'count - 1' which pass to
intel_pmu_save_and_restart(). I just found the issue. I will fix it in
V2 with other improvements if there are any.

Thanks,
Kan




RE: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite

2017-12-19 Thread Liang, Kan
> >>> +int perf_mmap__read_catchup(struct perf_mmap *map,
> >>> + bool overwrite,
> >>> + u64 *start, u64 *end,
> >>> + unsigned long *size)
> >>>{
> >>> - u64 head;
> >>> + u64 head = perf_mmap__read_head(map);
> >>> + u64 old = map->prev;
> >>> + unsigned char *data = map->base + page_size;
> >>>
> >>> - if (!refcount_read(>refcnt))
> >>> - return;
> >>> + *start = overwrite ? head : old;
> >>> + *end = overwrite ? old : head;
> >>>
> >>> - head = perf_mmap__read_head(map);
> >>> - map->prev = head;
> >>> + if (*start == *end)
> >>> + return 0;
> >>> +
> >>> + *size = *end - *start;
> >>> + if (*size > (unsigned long)(map->mask) + 1) {
> >>> + if (!overwrite) {
> >>> + WARN_ONCE(1, "failed to keep up with mmap data.
> >> (warn only once)\n");
> >>> +
> >>> + map->prev = head;
> >>> + perf_mmap__consume(map, overwrite);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /*
> >>> +  * Backward ring buffer is full. We still have a chance to read
> >>> +  * most of data from it.
> >>> +  */
> >>> + if (overwrite_rb_find_range(data, map->mask, head, start,
> >> end))
> >>> + return -1;
> >>> + }
> >>> +
> >>> + return 1;
> >> Coding suggestion: this function returns 2 different value (1 and -1) in
> >> two fail path. Should return 0 for success and -1 for failure.
> >>
> > For perf top, -1 is enough for failure.
> > But for perf_mmap__push, it looks two fail paths are needed, aren't they?
> >
> 
> I see. I think this is not a good practice. Please try to avoid returning
> 3 states. For example, comparing start and end outside? If can't avoid, how
> about returning an enum to notice reader about the difference?

OK. Will avoid it in V3.

> >> 2. Don't update map->prev in perf_mmap__read. Let perf_mmap__read()
> >> update start pointer, so it become stateless and suit for both backward
> >> and forward reading.
> >>
> >> 3. Update md->prev for forward reading. Merge it into
> >> perf_mmap__consume?
> > It looks we have to pass the updated 'start' to perf_mmap__consume.
> > Move interfaces like perf_evlist__mmap_read need to be changed.
> > That would impacts other tools which only support non-overwrite mode.
> >
> > How about update both 'md->prev' and 'start' in perf_mmap__read()
> > like the patch as below?
> 
> What about making perf_mmap__read() totally stateless? Don't
> touch md->prev there, and makes forward reading do an extra
> mp->prev updating before consuming?

We can move the update in the new interface perf_mmap__read_event.

> 
> > Introducing a new perf_mmap__read_event to get rid of
> > the perf_mmap__read_backward/forward.
> >
> > perf_mmap__read_backward will be removed later.
> > Keep perf_mmap__read_forward since other tools still use it.
> 
> 
> OK. For all reader who doesn't care backward or forward, it should use
> perf_mmap__read() and maintains start position by its own, and give it
> a chance to adjust map->prev if the ringbuffer is forward.
> 
> perf_mmap__read_forward() borrows mp->prev to maintain start position
> like this:
> 
> union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
> {
>  
>  return perf_mmap__read(map, >prev, head);
> }
> 
> 

Yes, we can do that for the legacy interface. 

> > It has to update the 'end' for non-overwrite mode in each read since the
> > ringbuffer is not paused.
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index 484a394..74f33fd 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -22,16 +22,16 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map)
> >
> >   /* When check_messup is true, 'end' must points to a good entry */
> >   static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > -u64 start, u64 end, u64 *prev)
> > +u64 *start, u64 end, u64 *prev)
> 
> 
> Don't need *prev because it can be replaced by *start.
> 
> >   {
> > unsigned char *data = map->base + page_size;
> > union perf_event *event = NULL;
> > -   int diff = end - start;
> > +   int diff = end - *start;
> >
> > if (diff >= (int)sizeof(event->header)) {
> > size_t size;
> >
> > -   event = (union perf_event *)[start & map->mask];
> > +   event = (union perf_event *)[*start & map->mask];
> > size = event->header.size;
> >
> > if (size < sizeof(event->header) || diff < (int)size) {
> > @@ -43,8 +43,8 @@ static union perf_event *perf_mmap__read(struct
> perf_mmap *map,
> >  * Event straddles the mmap boundary -- header should
> always
> >  * be inside due to u64 alignment of output.
> >  */
> > -   if ((start & map->mask) + size != ((start + size) & map->mask))
> {
> > -   unsigned int offset = 

RE: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite

2017-12-19 Thread Liang, Kan
> >>> +int perf_mmap__read_catchup(struct perf_mmap *map,
> >>> + bool overwrite,
> >>> + u64 *start, u64 *end,
> >>> + unsigned long *size)
> >>>{
> >>> - u64 head;
> >>> + u64 head = perf_mmap__read_head(map);
> >>> + u64 old = map->prev;
> >>> + unsigned char *data = map->base + page_size;
> >>>
> >>> - if (!refcount_read(>refcnt))
> >>> - return;
> >>> + *start = overwrite ? head : old;
> >>> + *end = overwrite ? old : head;
> >>>
> >>> - head = perf_mmap__read_head(map);
> >>> - map->prev = head;
> >>> + if (*start == *end)
> >>> + return 0;
> >>> +
> >>> + *size = *end - *start;
> >>> + if (*size > (unsigned long)(map->mask) + 1) {
> >>> + if (!overwrite) {
> >>> + WARN_ONCE(1, "failed to keep up with mmap data.
> >> (warn only once)\n");
> >>> +
> >>> + map->prev = head;
> >>> + perf_mmap__consume(map, overwrite);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /*
> >>> +  * Backward ring buffer is full. We still have a chance to read
> >>> +  * most of data from it.
> >>> +  */
> >>> + if (overwrite_rb_find_range(data, map->mask, head, start,
> >> end))
> >>> + return -1;
> >>> + }
> >>> +
> >>> + return 1;
> >> Coding suggestion: this function returns 2 different value (1 and -1) in
> >> two fail path. Should return 0 for success and -1 for failure.
> >>
> > For perf top, -1 is enough for failure.
> > But for perf_mmap__push, it looks two fail paths are needed, aren't they?
> >
> 
> I see. I think this is not a good practice. Please try to avoid returning
> 3 states. For example, comparing start and end outside? If can't avoid, how
> about returning an enum to notice reader about the difference?

OK. Will avoid it in V3.

> >> 2. Don't update map->prev in perf_mmap__read. Let perf_mmap__read()
> >> update start pointer, so it become stateless and suit for both backward
> >> and forward reading.
> >>
> >> 3. Update md->prev for forward reading. Merge it into
> >> perf_mmap__consume?
> > It looks we have to pass the updated 'start' to perf_mmap__consume.
> > Move interfaces like perf_evlist__mmap_read need to be changed.
> > That would impacts other tools which only support non-overwrite mode.
> >
> > How about update both 'md->prev' and 'start' in perf_mmap__read()
> > like the patch as below?
> 
> What about making perf_mmap__read() totally stateless? Don't
> touch md->prev there, and makes forward reading do an extra
> mp->prev updating before consuming?

We can move the update in the new interface perf_mmap__read_event.

> 
> > Introducing a new perf_mmap__read_event to get rid of
> > the perf_mmap__read_backward/forward.
> >
> > perf_mmap__read_backward will be removed later.
> > Keep perf_mmap__read_forward since other tools still use it.
> 
> 
> OK. For all reader who doesn't care backward or forward, it should use
> perf_mmap__read() and maintains start position by its own, and give it
> a chance to adjust map->prev if the ringbuffer is forward.
> 
> perf_mmap__read_forward() borrows mp->prev to maintain start position
> like this:
> 
> union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
> {
>  
>  return perf_mmap__read(map, >prev, head);
> }
> 
> 

Yes, we can do that for the legacy interface. 

> > It has to update the 'end' for non-overwrite mode in each read since the
> > ringbuffer is not paused.
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index 484a394..74f33fd 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -22,16 +22,16 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map)
> >
> >   /* When check_messup is true, 'end' must points to a good entry */
> >   static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > -u64 start, u64 end, u64 *prev)
> > +u64 *start, u64 end, u64 *prev)
> 
> 
> Don't need *prev because it can be replaced by *start.
> 
> >   {
> > unsigned char *data = map->base + page_size;
> > union perf_event *event = NULL;
> > -   int diff = end - start;
> > +   int diff = end - *start;
> >
> > if (diff >= (int)sizeof(event->header)) {
> > size_t size;
> >
> > -   event = (union perf_event *)[start & map->mask];
> > +   event = (union perf_event *)[*start & map->mask];
> > size = event->header.size;
> >
> > if (size < sizeof(event->header) || diff < (int)size) {
> > @@ -43,8 +43,8 @@ static union perf_event *perf_mmap__read(struct
> perf_mmap *map,
> >  * Event straddles the mmap boundary -- header should
> always
> >  * be inside due to u64 alignment of output.
> >  */
> > -   if ((start & map->mask) + size != ((start + size) & map->mask))
> {
> > -   unsigned int offset = 

RE: [PATCH V2 6/8] perf top: add overwrite fall back

2017-12-18 Thread Liang, Kan
> 
> On 2017/12/7 7:33, kan.li...@intel.com wrote:
> > From: Kan Liang 
> >
> > Switch to non-overwrite mode if kernel doesnot support overwrite
> > ringbuffer.
> >
> > It's only effect when overwrite mode is supported.
> > No change to current behavior.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >   tools/perf/builtin-top.c | 35 +++
> >   1 file changed, 35 insertions(+)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 5e15d27..7c462d6 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct
> perf_top *top)
> > return 0;
> >   }
> >
> > +static int perf_top_overwrite_fallback(struct perf_top *top,
> > +  struct perf_evsel *evsel)
> > +{
> > +   struct record_opts *opts = >record_opts;
> > +   struct perf_evlist *evlist = top->evlist;
> > +   struct perf_evsel *counter;
> > +
> > +   if (!opts->overwrite)
> > +   return 0;
> > +
> > +   /* only fall back when first event fails */
> > +   if (evsel != perf_evlist__first(evlist))
> > +   return 0;
> > +
> > +   evlist__for_each_entry(evlist, counter)
> > +   counter->attr.write_backward = false;
> > +   opts->overwrite = false;
> > +   ui__warning("fall back to non-overwrite mode\n");
> > +   return 1;
> 
> You can check perf_missing_features.write_backward here. Only do the
> fallback
> when we know the problem is caused by missing of write_backward.
>

perf_missing_features is only defined in evsel.c
I will add an inline function in evsel.c to do the check.

Thanks,
Kan
 
> > +}
> > +
> >   static int perf_top__start_counters(struct perf_top *top)
> >   {
> > char msg[BUFSIZ];
> > @@ -954,6 +975,20 @@ static int perf_top__start_counters(struct
> perf_top *top)
> >   try_again:
> > if (perf_evsel__open(counter, top->evlist->cpus,
> >  top->evlist->threads) < 0) {
> > +
> > +   /*
> > +* Specially handle overwrite fall back.
> > +* Because perf top is the only tool which has
> > +* overwrite mode by default, support
> > +* both overwrite and non-overwrite mode, and
> > +* require consistent mode for all events.
> > +*
> > +* May move it to generic code with more tools
> > +* have similar attribute.
> > +*/
> > +   if (perf_top_overwrite_fallback(top, counter))
> > +   goto try_again;
> > +
> 
> It will unconditionally remove backward bit even if the problem
> is caused by other missing feature.
> 
> Thank you.



RE: [PATCH V2 6/8] perf top: add overwrite fall back

2017-12-18 Thread Liang, Kan
> 
> On 2017/12/7 7:33, kan.li...@intel.com wrote:
> > From: Kan Liang 
> >
> > Switch to non-overwrite mode if kernel doesnot support overwrite
> > ringbuffer.
> >
> > It's only effect when overwrite mode is supported.
> > No change to current behavior.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >   tools/perf/builtin-top.c | 35 +++
> >   1 file changed, 35 insertions(+)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 5e15d27..7c462d6 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct
> perf_top *top)
> > return 0;
> >   }
> >
> > +static int perf_top_overwrite_fallback(struct perf_top *top,
> > +  struct perf_evsel *evsel)
> > +{
> > +   struct record_opts *opts = >record_opts;
> > +   struct perf_evlist *evlist = top->evlist;
> > +   struct perf_evsel *counter;
> > +
> > +   if (!opts->overwrite)
> > +   return 0;
> > +
> > +   /* only fall back when first event fails */
> > +   if (evsel != perf_evlist__first(evlist))
> > +   return 0;
> > +
> > +   evlist__for_each_entry(evlist, counter)
> > +   counter->attr.write_backward = false;
> > +   opts->overwrite = false;
> > +   ui__warning("fall back to non-overwrite mode\n");
> > +   return 1;
> 
> You can check perf_missing_features.write_backward here. Only do the
> fallback
> when we know the problem is caused by missing of write_backward.
>

perf_missing_features is only defined in evsel.c
I will add an inline function in evsel.c to do the check.

Thanks,
Kan
 
> > +}
> > +
> >   static int perf_top__start_counters(struct perf_top *top)
> >   {
> > char msg[BUFSIZ];
> > @@ -954,6 +975,20 @@ static int perf_top__start_counters(struct
> perf_top *top)
> >   try_again:
> > if (perf_evsel__open(counter, top->evlist->cpus,
> >  top->evlist->threads) < 0) {
> > +
> > +   /*
> > +* Specially handle overwrite fall back.
> > +* Because perf top is the only tool which has
> > +* overwrite mode by default, support
> > +* both overwrite and non-overwrite mode, and
> > +* require consistent mode for all events.
> > +*
> > +* May move it to generic code with more tools
> > +* have similar attribute.
> > +*/
> > +   if (perf_top_overwrite_fallback(top, counter))
> > +   goto try_again;
> > +
> 
> It will unconditionally remove backward bit even if the problem
> is caused by other missing feature.
> 
> Thank you.



RE: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite

2017-12-18 Thread Liang, Kan

> > -union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
> > +union perf_event *perf_mmap__read_backward(struct perf_mmap *map,
> > +  u64 *start, u64 end)
> >   {
> > -   u64 head, end;
> > -   u64 start = map->prev;
> > -
> > -   /*
> > -* Check if event was unmapped due to a POLLHUP/POLLERR.
> > -*/
> > -   if (!refcount_read(>refcnt))
> > -   return NULL;
> > -
> 
> Is removing this checking safe?

It was duplicate as the one in perf_mmap__read_catchup.
Once planned to remove one. But it looks I removed both accidently.
Will keep one in V3.

> 
> > -   head = perf_mmap__read_head(map);
> > -   if (!head)
> > -   return NULL;
> > +   union perf_event *event = NULL;
> >
> > -   /*
> > -* 'head' pointer starts from 0. Kernel minus sizeof(record) form
> > -* it each time when kernel writes to it, so in fact 'head' is
> > -* negative. 'end' pointer is made manually by adding the size of
> > -* the ring buffer to 'head' pointer, means the validate data can
> > -* read is the whole ring buffer. If 'end' is positive, the ring
> > -* buffer has not fully filled, so we must adjust 'end' to 0.
> > -*
> > -* However, since both 'head' and 'end' is unsigned, we can't
> > -* simply compare 'end' against 0. Here we compare '-head' and
> > -* the size of the ring buffer, where -head is the number of bytes
> > -* kernel write to the ring buffer.
> > -*/
> > -   if (-head < (u64)(map->mask + 1))
> > -   end = 0;
> > -   else
> > -   end = head + map->mask + 1;
> > +   event = perf_mmap__read(map, *start, end, >prev);
> > +   *start = map->prev;
> >
> > -   return perf_mmap__read(map, start, end, >prev);
> > +   return event;
> >   }
> 
> perf_mmap__read_backward() and perf_mmap__read_forward() become
> asymmetrical,
> but their names are symmetrical.
> 
> 
> >
> > -void perf_mmap__read_catchup(struct perf_mmap *map)
> > +static int overwrite_rb_find_range(void *buf, int mask, u64 head,
> > +  u64 *start, u64 *end);
> > +
> > +int perf_mmap__read_catchup(struct perf_mmap *map,
> > +   bool overwrite,
> > +   u64 *start, u64 *end,
> > +   unsigned long *size)
> >   {
> > -   u64 head;
> > +   u64 head = perf_mmap__read_head(map);
> > +   u64 old = map->prev;
> > +   unsigned char *data = map->base + page_size;
> >
> > -   if (!refcount_read(>refcnt))
> > -   return;
> > +   *start = overwrite ? head : old;
> > +   *end = overwrite ? old : head;
> >
> > -   head = perf_mmap__read_head(map);
> > -   map->prev = head;
> > +   if (*start == *end)
> > +   return 0;
> > +
> > +   *size = *end - *start;
> > +   if (*size > (unsigned long)(map->mask) + 1) {
> > +   if (!overwrite) {
> > +   WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only once)\n");
> > +
> > +   map->prev = head;
> > +   perf_mmap__consume(map, overwrite);
> > +   return 0;
> > +   }
> > +
> > +   /*
> > +* Backward ring buffer is full. We still have a chance to read
> > +* most of data from it.
> > +*/
> > +   if (overwrite_rb_find_range(data, map->mask, head, start,
> end))
> > +   return -1;
> > +   }
> > +
> > +   return 1;
> 
> Coding suggestion: this function returns 2 different value (1 and -1) in
> two fail path. Should return 0 for success and -1 for failure.
>

For perf top, -1 is enough for failure.
But for perf_mmap__push, it looks two fail paths are needed, aren't they?
 
> You totally redefine perf_mmap__read_catchup(). The original meaning of
> this function is adjust md->prev so following perf_mmap__read() can read
> events one by one safely.  Only backward ring buffer needs catchup:
> kernel's 'head' pointer keeps moving (backward), while md->prev keeps
> unchanged. For forward ring buffer, md->prev will catchup each time when
> reading from the ring buffer.
>
> Your patch totally changes its meaning. Now its meaning is finding the
> available data from a ring buffer (report start and end). At least we
> need a better name.

Sure, I will introduce a new function to do it in V3.

> In addition, if I understand your code correctly, we
> don't need to report 'size' to caller, because size is determined by
> start and end.

Sure, I will remove the 'size' in V3.

> 
> In addition, since the concept of backward and overwrite are now
> unified, we can avoid updating map->prev during one-by-one reading
> (because backward reading don't need consume operation). I think we can
> decouple the updating of map->prev and these two basic read functions.
> This can makes the operations on map->prev clearer.  Now the moving
> direction of map->prev is confusing for backward ring buffer: it moves
> forward during one by one reading, and moves backward when block
> 

RE: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite

2017-12-18 Thread Liang, Kan

> > -union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
> > +union perf_event *perf_mmap__read_backward(struct perf_mmap *map,
> > +  u64 *start, u64 end)
> >   {
> > -   u64 head, end;
> > -   u64 start = map->prev;
> > -
> > -   /*
> > -* Check if event was unmapped due to a POLLHUP/POLLERR.
> > -*/
> > -   if (!refcount_read(>refcnt))
> > -   return NULL;
> > -
> 
> Is removing this checking safe?

It was duplicate as the one in perf_mmap__read_catchup.
Once planned to remove one. But it looks I removed both accidently.
Will keep one in V3.

> 
> > -   head = perf_mmap__read_head(map);
> > -   if (!head)
> > -   return NULL;
> > +   union perf_event *event = NULL;
> >
> > -   /*
> > -* 'head' pointer starts from 0. Kernel minus sizeof(record) form
> > -* it each time when kernel writes to it, so in fact 'head' is
> > -* negative. 'end' pointer is made manually by adding the size of
> > -* the ring buffer to 'head' pointer, means the validate data can
> > -* read is the whole ring buffer. If 'end' is positive, the ring
> > -* buffer has not fully filled, so we must adjust 'end' to 0.
> > -*
> > -* However, since both 'head' and 'end' is unsigned, we can't
> > -* simply compare 'end' against 0. Here we compare '-head' and
> > -* the size of the ring buffer, where -head is the number of bytes
> > -* kernel write to the ring buffer.
> > -*/
> > -   if (-head < (u64)(map->mask + 1))
> > -   end = 0;
> > -   else
> > -   end = head + map->mask + 1;
> > +   event = perf_mmap__read(map, *start, end, >prev);
> > +   *start = map->prev;
> >
> > -   return perf_mmap__read(map, start, end, >prev);
> > +   return event;
> >   }
> 
> perf_mmap__read_backward() and perf_mmap__read_forward() become
> asymmetrical,
> but their names are symmetrical.
> 
> 
> >
> > -void perf_mmap__read_catchup(struct perf_mmap *map)
> > +static int overwrite_rb_find_range(void *buf, int mask, u64 head,
> > +  u64 *start, u64 *end);
> > +
> > +int perf_mmap__read_catchup(struct perf_mmap *map,
> > +   bool overwrite,
> > +   u64 *start, u64 *end,
> > +   unsigned long *size)
> >   {
> > -   u64 head;
> > +   u64 head = perf_mmap__read_head(map);
> > +   u64 old = map->prev;
> > +   unsigned char *data = map->base + page_size;
> >
> > -   if (!refcount_read(>refcnt))
> > -   return;
> > +   *start = overwrite ? head : old;
> > +   *end = overwrite ? old : head;
> >
> > -   head = perf_mmap__read_head(map);
> > -   map->prev = head;
> > +   if (*start == *end)
> > +   return 0;
> > +
> > +   *size = *end - *start;
> > +   if (*size > (unsigned long)(map->mask) + 1) {
> > +   if (!overwrite) {
> > +   WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only once)\n");
> > +
> > +   map->prev = head;
> > +   perf_mmap__consume(map, overwrite);
> > +   return 0;
> > +   }
> > +
> > +   /*
> > +* Backward ring buffer is full. We still have a chance to read
> > +* most of data from it.
> > +*/
> > +   if (overwrite_rb_find_range(data, map->mask, head, start,
> end))
> > +   return -1;
> > +   }
> > +
> > +   return 1;
> 
> Coding suggestion: this function returns 2 different value (1 and -1) in
> two fail path. Should return 0 for success and -1 for failure.
>

For perf top, -1 is enough for failure.
But for perf_mmap__push, it looks two fail paths are needed, aren't they?
 
> You totally redefine perf_mmap__read_catchup(). The original meaning of
> this function is adjust md->prev so following perf_mmap__read() can read
> events one by one safely.  Only backward ring buffer needs catchup:
> kernel's 'head' pointer keeps moving (backward), while md->prev keeps
> unchanged. For forward ring buffer, md->prev will catchup each time when
> reading from the ring buffer.
>
> Your patch totally changes its meaning. Now its meaning is finding the
> available data from a ring buffer (report start and end). At least we
> need a better name.

Sure, I will introduce a new function to do it in V3.

> In addition, if I understand your code correctly, we
> don't need to report 'size' to caller, because size is determined by
> start and end.

Sure, I will remove the 'size' in V3.

> 
> In addition, since the concept of backward and overwrite are now
> unified, we can avoid updating map->prev during one-by-one reading
> (because backward reading don't need consume operation). I think we can
> decouple the updating of map->prev and these two basic read functions.
> This can makes the operations on map->prev clearer.  Now the moving
> direction of map->prev is confusing for backward ring buffer: it moves
> forward during one by one reading, and moves backward when block
> 

RE: [PATCH V4 1/8] perf/x86/intel/uncore: customized event_read for client IMC uncore

2017-12-11 Thread Liang, Kan
Hi Thomas,

Did you get a chance to review the patch series?

Thanks,
Kan
> 
> On Fri, 17 Nov 2017, Liang, Kan wrote:
> 
> > Hi Thomas,
> >
> > Any comments for this patch series?
> 
> it's on my todo list.



RE: [PATCH V4 1/8] perf/x86/intel/uncore: customized event_read for client IMC uncore

2017-12-11 Thread Liang, Kan
Hi Thomas,

Did you get a chance to review the patch series?

Thanks,
Kan
> 
> On Fri, 17 Nov 2017, Liang, Kan wrote:
> 
> > Hi Thomas,
> >
> > Any comments for this patch series?
> 
> it's on my todo list.



RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-12-08 Thread Liang, Kan
Hi Arnaldo,

Ping.
Any comments for the script?

Thanks,
Kan

> >
> > From: Kan Liang 
> >
> > There could be different types of memory in the system. E.g normal
> > System Memory, Persistent Memory. To understand how the workload
> maps
> > to those memories, it's important to know the I/O statistics of them.
> > Perf can collect physical addresses, but those are raw data.
> > It still needs extra work to resolve the physical addresses.
> > Provide a script to facilitate the physical addresses resolving and
> > I/O statistics.
> >
> > Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> > event if any of them is available.
> > Look up the /proc/iomem and resolve the physical address.
> > Provide memory type summary.
> >
> > Here is an example output.
> >  #perf script report mem-phys-addr
> > Event: mem_inst_retired.all_loads:P
> > Memory typecount   percentage
> >   ---  ---
> > System RAM7453.2%
> > Persistent Memory 5539.6%
> > N/A   10 7.2%
> >
> > Signed-off-by: Kan Liang 
> > ---
> >
> > Changes since V1:
> >  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
> >Only profile the loads.
> >  - Use event name to replace the RAW event
> >
> >  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
> > tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
> >  tools/perf/scripts/python/mem-phys-addr.py | 97
> > ++
> >  .../util/scripting-engines/trace-event-python.c|  2 +
> >  4 files changed, 121 insertions(+)
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> > b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > new file mode 100644
> > index 000..5a87512
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > @@ -0,0 +1,19 @@
> > +#!/bin/bash
> > +
> > +#
> > +# Profiling physical memory by all retired load instructions/uops
> > +event # MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS #
> > +
> > +load=`perf list | grep mem_inst_retired.all_loads` if [ -z "$load" ];
> > +then
> > +   load=`perf list | grep mem_uops_retired.all_loads` fi if [ -z
> > +"$load" ]; then
> > +   echo "There is no event to count all retired load instructions/uops."
> > +   exit 1
> > +fi
> > +
> > +arg=$(echo $load | tr -d ' ')
> > +arg="$arg:P"
> > +perf record --phys-data -e $arg $@
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> > b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > new file mode 100644
> > index 000..3f2b847
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > @@ -0,0 +1,3 @@
> > +#!/bin/bash
> > +# description: resolve physical address samples perf script $@ -s
> > +"$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> > diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> > b/tools/perf/scripts/python/mem-phys-addr.py
> > new file mode 100644
> > index 000..1d1f757
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/mem-phys-addr.py
> > @@ -0,0 +1,97 @@
> > +# mem-phys-addr.py: Resolve physical address samples # Copyright (c)
> > +2017, Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +modify it # under the terms and conditions of the GNU General Public
> > +License, # version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but
> > +WITHOUT # ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > +License
> > for
> > +# more details.
> > +
> > +from __future__ import division
> > +import os
> > +import sys
> > +import struct
> > +import re
> > +import bisect
> > +import collections
> > +
> > +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> > +   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> > +
> > +system_ram = []
> > +pmem = []
> > +f = None
> > +load_mem_type_cnt = collections.Counter() event_name = None
> > +
> > +def parse_iomem():
> > +   global f
> > +   f = open('/proc/iomem', 'r')
> > +   for i, j in enumerate(f):
> > +   m = re.split('-|:',j,2)
> > +   if m[2].strip() == 'System RAM':
> > +   system_ram.append(long(m[0], 16))
> > +   system_ram.append(long(m[1], 16))
> > +   if m[2].strip() == 'Persistent Memory':
> > +   pmem.append(long(m[0], 16))
> > +   

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-12-08 Thread Liang, Kan
Hi Arnaldo,

Ping.
Any comments for the script?

Thanks,
Kan

> >
> > From: Kan Liang 
> >
> > There could be different types of memory in the system. E.g normal
> > System Memory, Persistent Memory. To understand how the workload
> maps
> > to those memories, it's important to know the I/O statistics of them.
> > Perf can collect physical addresses, but those are raw data.
> > It still needs extra work to resolve the physical addresses.
> > Provide a script to facilitate the physical addresses resolving and
> > I/O statistics.
> >
> > Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> > event if any of them is available.
> > Look up the /proc/iomem and resolve the physical address.
> > Provide memory type summary.
> >
> > Here is an example output.
> >  #perf script report mem-phys-addr
> > Event: mem_inst_retired.all_loads:P
> > Memory typecount   percentage
> >   ---  ---
> > System RAM7453.2%
> > Persistent Memory 5539.6%
> > N/A   10 7.2%
> >
> > Signed-off-by: Kan Liang 
> > ---
> >
> > Changes since V1:
> >  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
> >Only profile the loads.
> >  - Use event name to replace the RAW event
> >
> >  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
> > tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
> >  tools/perf/scripts/python/mem-phys-addr.py | 97
> > ++
> >  .../util/scripting-engines/trace-event-python.c|  2 +
> >  4 files changed, 121 insertions(+)
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> > b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > new file mode 100644
> > index 000..5a87512
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > @@ -0,0 +1,19 @@
> > +#!/bin/bash
> > +
> > +#
> > +# Profiling physical memory by all retired load instructions/uops
> > +event # MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS #
> > +
> > +load=`perf list | grep mem_inst_retired.all_loads` if [ -z "$load" ];
> > +then
> > +   load=`perf list | grep mem_uops_retired.all_loads` fi if [ -z
> > +"$load" ]; then
> > +   echo "There is no event to count all retired load instructions/uops."
> > +   exit 1
> > +fi
> > +
> > +arg=$(echo $load | tr -d ' ')
> > +arg="$arg:P"
> > +perf record --phys-data -e $arg $@
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> > b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > new file mode 100644
> > index 000..3f2b847
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > @@ -0,0 +1,3 @@
> > +#!/bin/bash
> > +# description: resolve physical address samples perf script $@ -s
> > +"$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> > diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> > b/tools/perf/scripts/python/mem-phys-addr.py
> > new file mode 100644
> > index 000..1d1f757
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/mem-phys-addr.py
> > @@ -0,0 +1,97 @@
> > +# mem-phys-addr.py: Resolve physical address samples # Copyright (c)
> > +2017, Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +modify it # under the terms and conditions of the GNU General Public
> > +License, # version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but
> > +WITHOUT # ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > +License
> > for
> > +# more details.
> > +
> > +from __future__ import division
> > +import os
> > +import sys
> > +import struct
> > +import re
> > +import bisect
> > +import collections
> > +
> > +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> > +   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> > +
> > +system_ram = []
> > +pmem = []
> > +f = None
> > +load_mem_type_cnt = collections.Counter() event_name = None
> > +
> > +def parse_iomem():
> > +   global f
> > +   f = open('/proc/iomem', 'r')
> > +   for i, j in enumerate(f):
> > +   m = re.split('-|:',j,2)
> > +   if m[2].strip() == 'System RAM':
> > +   system_ram.append(long(m[0], 16))
> > +   system_ram.append(long(m[1], 16))
> > +   if m[2].strip() == 'Persistent Memory':
> > +   pmem.append(long(m[0], 16))
> > +   pmem.append(long(m[1], 16))
> > +
> > +def 

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-11-27 Thread Liang, Kan
Hi all,

Any comments for the patch?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> "percentage"),
> + print "%-40s  %10s  %10s\n" % 
> 

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-11-27 Thread Liang, Kan
Hi all,

Any comments for the patch?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> "percentage"),
> + print "%-40s  %10s  %10s\n" % 
> ("", \
> +  

RE: [PATCH V4 1/8] perf/x86/intel/uncore: customized event_read for client IMC uncore

2017-11-16 Thread Liang, Kan
Hi Thomas,

Any comments for this patch series?

Thanks,
Kan

> 
> From: Kan Liang 
> 
> There are two free running counters for client IMC uncore. The custom
> event_init() function hardcode their index to 'UNCORE_PMC_IDX_FIXED' and
> 'UNCORE_PMC_IDX_FIXED + 1'. To support the 'UNCORE_PMC_IDX_FIXED +
> 1'
> case, the generic uncore_perf_event_update is obscurely hacked.
> The code quality issue will bring problem when new counter index is
> introduced into generic code. For example, free running counter index.
> 
> Introduce customized event_read function for client IMC uncore.
> The customized function is exactly copied from previous generic
> uncore_pmu_event_read.
> The 'UNCORE_PMC_IDX_FIXED + 1' case will be isolated for client IMC uncore
> only.
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Change since V3:
>  - Use the customized read function to replace uncore_perf_event_update.
>  - Move generic code change to patch 3/8.
> 
>  arch/x86/events/intel/uncore_snb.c | 33
> +++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/uncore_snb.c
> b/arch/x86/events/intel/uncore_snb.c
> index db1127c..b6d0d72 100644
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
> @@ -449,6 +449,35 @@ static void snb_uncore_imc_event_start(struct
> perf_event *event, int flags)
>   uncore_pmu_start_hrtimer(box);
>  }
> 
> +static void snb_uncore_imc_event_read(struct perf_event *event) {
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + u64 prev_count, new_count, delta;
> + int shift;
> +
> + /*
> +  * There are two free running counters in IMC.
> +  * The index for the second one is hardcoded to
> +  * UNCORE_PMC_IDX_FIXED + 1.
> +  */
> + if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> + shift = 64 - uncore_fixed_ctr_bits(box);
> + else
> + shift = 64 - uncore_perf_ctr_bits(box);
> +
> + /* the hrtimer might modify the previous event value */
> +again:
> + prev_count = local64_read(>hw.prev_count);
> + new_count = uncore_read_counter(box, event);
> + if (local64_xchg(>hw.prev_count, new_count) != prev_count)
> + goto again;
> +
> + delta = (new_count << shift) - (prev_count << shift);
> + delta >>= shift;
> +
> + local64_add(delta, >count);
> +}
> +
>  static void snb_uncore_imc_event_stop(struct perf_event *event, int flags)  {
>   struct intel_uncore_box *box = uncore_event_to_box(event); @@ -
> 471,7 +500,7 @@ static void snb_uncore_imc_event_stop(struct perf_event
> *event, int flags)
>* Drain the remaining delta count out of a event
>* that we are disabling:
>*/
> - uncore_perf_event_update(box, event);
> + snb_uncore_imc_event_read(event);
>   hwc->state |= PERF_HES_UPTODATE;
>   }
>  }
> @@ -533,7 +562,7 @@ static struct pmu snb_uncore_imc_pmu = {
>   .del= snb_uncore_imc_event_del,
>   .start  = snb_uncore_imc_event_start,
>   .stop   = snb_uncore_imc_event_stop,
> - .read   = uncore_pmu_event_read,
> + .read   = snb_uncore_imc_event_read,
>  };
> 
>  static struct intel_uncore_ops snb_uncore_imc_ops = {
> --
> 2.7.4



RE: [PATCH V4 1/8] perf/x86/intel/uncore: customized event_read for client IMC uncore

2017-11-16 Thread Liang, Kan
Hi Thomas,

Any comments for this patch series?

Thanks,
Kan

> 
> From: Kan Liang 
> 
> There are two free running counters for client IMC uncore. The custom
> event_init() function hardcode their index to 'UNCORE_PMC_IDX_FIXED' and
> 'UNCORE_PMC_IDX_FIXED + 1'. To support the 'UNCORE_PMC_IDX_FIXED +
> 1'
> case, the generic uncore_perf_event_update is obscurely hacked.
> The code quality issue will bring problem when new counter index is
> introduced into generic code. For example, free running counter index.
> 
> Introduce customized event_read function for client IMC uncore.
> The customized function is exactly copied from previous generic
> uncore_pmu_event_read.
> The 'UNCORE_PMC_IDX_FIXED + 1' case will be isolated for client IMC uncore
> only.
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Change since V3:
>  - Use the customized read function to replace uncore_perf_event_update.
>  - Move generic code change to patch 3/8.
> 
>  arch/x86/events/intel/uncore_snb.c | 33
> +++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/uncore_snb.c
> b/arch/x86/events/intel/uncore_snb.c
> index db1127c..b6d0d72 100644
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
> @@ -449,6 +449,35 @@ static void snb_uncore_imc_event_start(struct
> perf_event *event, int flags)
>   uncore_pmu_start_hrtimer(box);
>  }
> 
> +static void snb_uncore_imc_event_read(struct perf_event *event) {
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + u64 prev_count, new_count, delta;
> + int shift;
> +
> + /*
> +  * There are two free running counters in IMC.
> +  * The index for the second one is hardcoded to
> +  * UNCORE_PMC_IDX_FIXED + 1.
> +  */
> + if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> + shift = 64 - uncore_fixed_ctr_bits(box);
> + else
> + shift = 64 - uncore_perf_ctr_bits(box);
> +
> + /* the hrtimer might modify the previous event value */
> +again:
> + prev_count = local64_read(>hw.prev_count);
> + new_count = uncore_read_counter(box, event);
> + if (local64_xchg(>hw.prev_count, new_count) != prev_count)
> + goto again;
> +
> + delta = (new_count << shift) - (prev_count << shift);
> + delta >>= shift;
> +
> + local64_add(delta, >count);
> +}
> +
>  static void snb_uncore_imc_event_stop(struct perf_event *event, int flags)  {
>   struct intel_uncore_box *box = uncore_event_to_box(event); @@ -
> 471,7 +500,7 @@ static void snb_uncore_imc_event_stop(struct perf_event
> *event, int flags)
>* Drain the remaining delta count out of a event
>* that we are disabling:
>*/
> - uncore_perf_event_update(box, event);
> + snb_uncore_imc_event_read(event);
>   hwc->state |= PERF_HES_UPTODATE;
>   }
>  }
> @@ -533,7 +562,7 @@ static struct pmu snb_uncore_imc_pmu = {
>   .del= snb_uncore_imc_event_del,
>   .start  = snb_uncore_imc_event_start,
>   .stop   = snb_uncore_imc_event_stop,
> - .read   = uncore_pmu_event_read,
> + .read   = snb_uncore_imc_event_read,
>  };
> 
>  static struct intel_uncore_ops snb_uncore_imc_ops = {
> --
> 2.7.4



RE: [PATCH 6/7] perf tools: Remove 'overwrite' concept from code level

2017-11-13 Thread Liang, Kan
> Since all 'overwrite' usage are cleaned and no one really use a readonly main
> ringbuffer, remove 'overwrite' from function arguments and evlist. The
> concept
> of 'overwrite' and 'write_backward' are cleanner than before:
> 
>   1. In code level, there's no 'overwrite' concept. Each evlist has two
>  ringbuffer groups. One is read-write/forward, another is
> readonly/backward.
>  Don't support read-write/backward and readonly/forward.
> 
>   2. In user interface, we keep '--overwrite' and translate it into
> write_backward
>  in each event.
> 
> Signed-off-by: Wang Nan 
> ---
>  tools/perf/arch/x86/tests/perf-time-to-tsc.c |  2 +-
>  tools/perf/builtin-kvm.c |  2 +-
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/builtin-top.c |  2 +-
>  tools/perf/builtin-trace.c   |  2 +-
>  tools/perf/tests/backward-ring-buffer.c  |  2 +-
>  tools/perf/tests/bpf.c   |  2 +-
>  tools/perf/tests/code-reading.c  |  2 +-
>  tools/perf/tests/keep-tracking.c |  2 +-
>  tools/perf/tests/mmap-basic.c|  2 +-
>  tools/perf/tests/openat-syscall-tp-fields.c  |  2 +-
>  tools/perf/tests/perf-record.c   |  2 +-
>  tools/perf/tests/sw-clock.c  |  2 +-
>  tools/perf/tests/switch-tracking.c   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evlist.c | 14 ++
>  tools/perf/util/evlist.h |  6 ++
>  tools/perf/util/mmap.c   |  6 +++---
>  tools/perf/util/mmap.h   |  2 +-
>  tools/perf/util/python.c | 10 +-
>  20 files changed, 33 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> index 5dd7efb..c7ea843 100644
> --- a/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> +++ b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> @@ -83,7 +83,7 @@ int test__perf_time_to_tsc(struct test *test
> __maybe_unused, int subtest __maybe
> 
>   CHECK__(perf_evlist__open(evlist));
> 
> - CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
> + CHECK__(perf_evlist__mmap(evlist, UINT_MAX));
> 
>   pc = evlist->mmap[0].base;
>   ret = perf_read_tsc_conversion(pc, );
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 0af4c09..e3e2a80 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1043,7 +1043,7 @@ static int kvm_live_open_events(struct
> perf_kvm_stat *kvm)
>   goto out;
>   }
> 
> - if (perf_evlist__mmap(evlist, kvm->opts.mmap_pages, false) < 0) {
> + if (perf_evlist__mmap(evlist, kvm->opts.mmap_pages) < 0) {
>   ui__error("Failed to mmap the events: %s\n",
> str_error_r(errno, sbuf, sizeof(sbuf)));
>   perf_evlist__close(evlist);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc5..b3ef33f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -300,7 +300,7 @@ static int record__mmap_evlist(struct record *rec,
>   struct record_opts *opts = >opts;
>   char msg[512];
> 
> - if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
> + if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>opts->auxtrace_mmap_pages,
>opts->auxtrace_snapshot_mode) < 0) {
>   if (errno == EPERM) {
> @@ -481,7 +481,7 @@ static int record__mmap_read_evlist(struct record
> *rec, struct perf_evlist *evli
>   struct auxtrace_mmap *mm = [i].auxtrace_mmap;
> 
>   if (maps[i].base) {
> - if (perf_mmap__push([i], evlist->overwrite,
> backward, rec, record__pushfn) != 0) {
> + if (perf_mmap__push([i], backward, rec,
> record__pushfn) != 0) {
>   rc = -1;
>   goto out;
>   }
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 477a869..83d2ae2 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -902,7 +902,7 @@ static int perf_top__start_counters(struct perf_top
> *top)
>   }
>   }
> 
> - if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
> + if (perf_evlist__mmap(evlist, opts->mmap_pages) < 0) {
>   ui__error("Failed to mmap with %d (%s)\n",
>   errno, str_error_r(errno, msg, sizeof(msg)));
>   goto out_err;
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index c373f9a..c3f2f98 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2404,7 +2404,7 @@ static int trace__run(struct trace *trace, int argc,
> const char 

RE: [PATCH 6/7] perf tools: Remove 'overwrite' concept from code level

2017-11-13 Thread Liang, Kan
> Since all 'overwrite' usage are cleaned and no one really use a readonly main
> ringbuffer, remove 'overwrite' from function arguments and evlist. The
> concept
> of 'overwrite' and 'write_backward' are cleanner than before:
> 
>   1. In code level, there's no 'overwrite' concept. Each evlist has two
>  ringbuffer groups. One is read-write/forward, another is
> readonly/backward.
>  Don't support read-write/backward and readonly/forward.
> 
>   2. In user interface, we keep '--overwrite' and translate it into
> write_backward
>  in each event.
> 
> Signed-off-by: Wang Nan 
> ---
>  tools/perf/arch/x86/tests/perf-time-to-tsc.c |  2 +-
>  tools/perf/builtin-kvm.c |  2 +-
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/builtin-top.c |  2 +-
>  tools/perf/builtin-trace.c   |  2 +-
>  tools/perf/tests/backward-ring-buffer.c  |  2 +-
>  tools/perf/tests/bpf.c   |  2 +-
>  tools/perf/tests/code-reading.c  |  2 +-
>  tools/perf/tests/keep-tracking.c |  2 +-
>  tools/perf/tests/mmap-basic.c|  2 +-
>  tools/perf/tests/openat-syscall-tp-fields.c  |  2 +-
>  tools/perf/tests/perf-record.c   |  2 +-
>  tools/perf/tests/sw-clock.c  |  2 +-
>  tools/perf/tests/switch-tracking.c   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evlist.c | 14 ++
>  tools/perf/util/evlist.h |  6 ++
>  tools/perf/util/mmap.c   |  6 +++---
>  tools/perf/util/mmap.h   |  2 +-
>  tools/perf/util/python.c | 10 +-
>  20 files changed, 33 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> index 5dd7efb..c7ea843 100644
> --- a/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> +++ b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
> @@ -83,7 +83,7 @@ int test__perf_time_to_tsc(struct test *test
> __maybe_unused, int subtest __maybe
> 
>   CHECK__(perf_evlist__open(evlist));
> 
> - CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
> + CHECK__(perf_evlist__mmap(evlist, UINT_MAX));
> 
>   pc = evlist->mmap[0].base;
>   ret = perf_read_tsc_conversion(pc, );
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 0af4c09..e3e2a80 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1043,7 +1043,7 @@ static int kvm_live_open_events(struct
> perf_kvm_stat *kvm)
>   goto out;
>   }
> 
> - if (perf_evlist__mmap(evlist, kvm->opts.mmap_pages, false) < 0) {
> + if (perf_evlist__mmap(evlist, kvm->opts.mmap_pages) < 0) {
>   ui__error("Failed to mmap the events: %s\n",
> str_error_r(errno, sbuf, sizeof(sbuf)));
>   perf_evlist__close(evlist);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc5..b3ef33f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -300,7 +300,7 @@ static int record__mmap_evlist(struct record *rec,
>   struct record_opts *opts = >opts;
>   char msg[512];
> 
> - if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
> + if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>opts->auxtrace_mmap_pages,
>opts->auxtrace_snapshot_mode) < 0) {
>   if (errno == EPERM) {
> @@ -481,7 +481,7 @@ static int record__mmap_read_evlist(struct record
> *rec, struct perf_evlist *evli
>   struct auxtrace_mmap *mm = [i].auxtrace_mmap;
> 
>   if (maps[i].base) {
> - if (perf_mmap__push([i], evlist->overwrite,
> backward, rec, record__pushfn) != 0) {
> + if (perf_mmap__push([i], backward, rec,
> record__pushfn) != 0) {
>   rc = -1;
>   goto out;
>   }
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 477a869..83d2ae2 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -902,7 +902,7 @@ static int perf_top__start_counters(struct perf_top
> *top)
>   }
>   }
> 
> - if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
> + if (perf_evlist__mmap(evlist, opts->mmap_pages) < 0) {
>   ui__error("Failed to mmap with %d (%s)\n",
>   errno, str_error_r(errno, msg, sizeof(msg)));
>   goto out_err;
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index c373f9a..c3f2f98 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2404,7 +2404,7 @@ static int trace__run(struct trace *trace, int argc,
> const char **argv)
>   if (err < 

RE: [PATCH 0/7] perf tools: Clarify overwrite and backward, bugfix

2017-11-13 Thread Liang, Kan
> Based on previous discussion, perf needs to support only two types
> of ringbuffer: read-write + forward, readonly + backward. This patchset
> completly removes the concept of 'overwrite' from code level, controls
> mapping permission using write_backward instead.

I think I suggested to remove the concept of backward/forward, keep
the overwrite. That's from user's perspective.
Here you use read-write + forward and readonly + backward, which
should be from kernel 's perspective. It's OK for me either.
However, we should make the concepts connected by clearly documented,
not just in changelog.

I suggest you may create a new file, ringbuffer.txt, in Documentation to
explain, the types of ringbuffer, how do they work, what's connections
between overwrite and readonly + backward, and so on. 


> 
> Wang Nan (7):
>   perf mmap: Fix perf backward recording
>   perf tests: Set evlist of test__backward_ring_buffer() to !overwrite
>   perf tests: Set evlist of test__sw_clock_freq() to !overwrite
>   perf tests: Set evlist of test__basic_mmap() to !overwrite
>   perf tests: Set evlist of test__task_exit() to !overwrite
>   perf tools: Remove 'overwrite' concept from code level
>   perf tools: Remove prot field in mmap param
> 

You had another related fix "perf tool: Don't discard prev in backward
mode" https://lkml.org/lkml/2017/10/12/408
It has not been merged yet. It needs to backport.
I think you may want to add it in this series as well.


Thanks,
Kan

>  tools/perf/arch/x86/tests/perf-time-to-tsc.c |  2 +-
>  tools/perf/builtin-kvm.c |  2 +-
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/builtin-top.c |  2 +-
>  tools/perf/builtin-trace.c   |  2 +-
>  tools/perf/tests/backward-ring-buffer.c  |  2 +-
>  tools/perf/tests/bpf.c   |  2 +-
>  tools/perf/tests/code-reading.c  |  2 +-
>  tools/perf/tests/keep-tracking.c |  2 +-
>  tools/perf/tests/mmap-basic.c|  2 +-
>  tools/perf/tests/openat-syscall-tp-fields.c  |  2 +-
>  tools/perf/tests/perf-record.c   |  2 +-
>  tools/perf/tests/sw-clock.c  |  2 +-
>  tools/perf/tests/switch-tracking.c   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evlist.c | 21 ++---
>  tools/perf/util/evlist.h |  6 ++
>  tools/perf/util/mmap.c   | 10 +-
>  tools/perf/util/mmap.h   |  6 +++---
>  tools/perf/util/python.c | 10 +-
>  20 files changed, 41 insertions(+), 44 deletions(-)
> 
> --
> 2.10.1



RE: [PATCH 0/7] perf tools: Clarify overwrite and backward, bugfix

2017-11-13 Thread Liang, Kan
> Based on previous discussion, perf needs to support only two types
> of ringbuffer: read-write + forward, readonly + backward. This patchset
> completly removes the concept of 'overwrite' from code level, controls
> mapping permission using write_backward instead.

I think I suggested to remove the concept of backward/forward, keep
the overwrite. That's from user's perspective.
Here you use read-write + forward and readonly + backward, which
should be from kernel 's perspective. It's OK for me either.
However, we should make the concepts connected by clearly documented,
not just in changelog.

I suggest you may create a new file, ringbuffer.txt, in Documentation to
explain, the types of ringbuffer, how do they work, what's connections
between overwrite and readonly + backward, and so on. 


> 
> Wang Nan (7):
>   perf mmap: Fix perf backward recording
>   perf tests: Set evlist of test__backward_ring_buffer() to !overwrite
>   perf tests: Set evlist of test__sw_clock_freq() to !overwrite
>   perf tests: Set evlist of test__basic_mmap() to !overwrite
>   perf tests: Set evlist of test__task_exit() to !overwrite
>   perf tools: Remove 'overwrite' concept from code level
>   perf tools: Remove prot field in mmap param
> 

You had another related fix "perf tool: Don't discard prev in backward
mode" https://lkml.org/lkml/2017/10/12/408
It has not been merged yet. It needs to backport.
I think you may want to add it in this series as well.


Thanks,
Kan

>  tools/perf/arch/x86/tests/perf-time-to-tsc.c |  2 +-
>  tools/perf/builtin-kvm.c |  2 +-
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/builtin-top.c |  2 +-
>  tools/perf/builtin-trace.c   |  2 +-
>  tools/perf/tests/backward-ring-buffer.c  |  2 +-
>  tools/perf/tests/bpf.c   |  2 +-
>  tools/perf/tests/code-reading.c  |  2 +-
>  tools/perf/tests/keep-tracking.c |  2 +-
>  tools/perf/tests/mmap-basic.c|  2 +-
>  tools/perf/tests/openat-syscall-tp-fields.c  |  2 +-
>  tools/perf/tests/perf-record.c   |  2 +-
>  tools/perf/tests/sw-clock.c  |  2 +-
>  tools/perf/tests/switch-tracking.c   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evlist.c | 21 ++---
>  tools/perf/util/evlist.h |  6 ++
>  tools/perf/util/mmap.c   | 10 +-
>  tools/perf/util/mmap.h   |  6 +++---
>  tools/perf/util/python.c | 10 +-
>  20 files changed, 41 insertions(+), 44 deletions(-)
> 
> --
> 2.10.1



RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-11-06 Thread Liang, Kan
Hi Stephane,

Any comments for the script?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> "percentage"),
> + print "%-40s  %10s  %10s\n" % 
> 

RE: [PATCH V2] perf script: add script to profile and resolve physical mem type

2017-11-06 Thread Liang, Kan
Hi Stephane,

Any comments for the script?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps
> to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.
> 
> Profile with MEM_INST_RETIRED.ALL_LOADS or
> MEM_UOPS_RETIRED.ALL_LOADS
> event if any of them is available.
> Look up the /proc/iomem and resolve the physical address.
> Provide memory type summary.
> 
> Here is an example output.
>  #perf script report mem-phys-addr
> Event: mem_inst_retired.all_loads:P
> Memory typecount   percentage
>   ---  ---
> System RAM7453.2%
> Persistent Memory 5539.6%
> N/A   10 7.2%
> 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - Do not mix DLA and Load Latency. Do not compare the loads and stores.
>Only profile the loads.
>  - Use event name to replace the RAW event
> 
>  tools/perf/scripts/python/bin/mem-phys-addr-record | 19 +
>  tools/perf/scripts/python/bin/mem-phys-addr-report |  3 +
>  tools/perf/scripts/python/mem-phys-addr.py | 97
> ++
>  .../util/scripting-engines/trace-event-python.c|  2 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-record
>  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
>  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> 
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> new file mode 100644
> index 000..5a87512
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> @@ -0,0 +1,19 @@
> +#!/bin/bash
> +
> +#
> +# Profiling physical memory by all retired load instructions/uops event
> +# MEM_INST_RETIRED.ALL_LOADS or MEM_UOPS_RETIRED.ALL_LOADS
> +#
> +
> +load=`perf list | grep mem_inst_retired.all_loads`
> +if [ -z "$load" ]; then
> + load=`perf list | grep mem_uops_retired.all_loads`
> +fi
> +if [ -z "$load" ]; then
> + echo "There is no event to count all retired load instructions/uops."
> + exit 1
> +fi
> +
> +arg=$(echo $load | tr -d ' ')
> +arg="$arg:P"
> +perf record --phys-data -e $arg $@
> diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> new file mode 100644
> index 000..3f2b847
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> @@ -0,0 +1,3 @@
> +#!/bin/bash
> +# description: resolve physical address samples
> +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> new file mode 100644
> index 000..1d1f757
> --- /dev/null
> +++ b/tools/perf/scripts/python/mem-phys-addr.py
> @@ -0,0 +1,97 @@
> +# mem-phys-addr.py: Resolve physical address samples
> +# Copyright (c) 2017, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> +# more details.
> +
> +from __future__ import division
> +import os
> +import sys
> +import struct
> +import re
> +import bisect
> +import collections
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +system_ram = []
> +pmem = []
> +f = None
> +load_mem_type_cnt = collections.Counter()
> +event_name = None
> +
> +def parse_iomem():
> + global f
> + f = open('/proc/iomem', 'r')
> + for i, j in enumerate(f):
> + m = re.split('-|:',j,2)
> + if m[2].strip() == 'System RAM':
> + system_ram.append(long(m[0], 16))
> + system_ram.append(long(m[1], 16))
> + if m[2].strip() == 'Persistent Memory':
> + pmem.append(long(m[0], 16))
> + pmem.append(long(m[1], 16))
> +
> +def print_memory_type():
> + print "Event: %s" % (event_name)
> + print "%-40s  %10s  %10s\n" % ("Memory type", "count",
> "percentage"),
> + print "%-40s  %10s  %10s\n" % 
> ("", \
> +

RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > > > But then you have this in uncore_perf_event_update():
> > > > >
> > > > > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > > > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > > > >
> > > > > So how is that supposed to work?
> > > >
> > > > This is for generic code.
> > > >
> > > > In previous code, the event_read function for IMC use generic code.
> > > > So we have to deal with >= in generic code.
> > > >
> > > > Now, customized event_read function 'snb_uncore_imc_event_read'
> > > >  is introduced for IMC. So IMC does not touch the generic code.
> > > > The generic code is corrected here.
> > >
> > > The customized read function does not help at all.
> > >
> > > uncore_perf_event_update() is used from snb_uncore_imc_event_stop().
> So
> > > it's broken with this patch.
> >
> > Right, need to use the customized read function to replace it as well.
> > I will fix it in next version.
> >
> > >
> > > This is a complete and utter mess to review.
> >
> > Most of the customized functions will be clean up after the series is 
> > applied.
> 
> Correct, but please try to split 2/5 into parts as well. It's a hodgepodge
> of new things and modifications to the code. The logical split is to add
> the new data structures and struct members along with the inline helpers
> and then in the next patch make use of it.
>

Sure, I will do that.

Also, I just noticed that the '>=' is used in nhmex_uncore_msr_enable_event
as well, which does not make sense and unnecessary.
For Nehalem and Westmere, there is only one fixed counter for W-Box.
It's nhm specific issue, not related to generic code.
I will also clean it up in next version. 
 
Thanks,
Kan



RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > > > But then you have this in uncore_perf_event_update():
> > > > >
> > > > > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > > > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > > > >
> > > > > So how is that supposed to work?
> > > >
> > > > This is for generic code.
> > > >
> > > > In previous code, the event_read function for IMC use generic code.
> > > > So we have to deal with >= in generic code.
> > > >
> > > > Now, customized event_read function 'snb_uncore_imc_event_read'
> > > >  is introduced for IMC. So IMC does not touch the generic code.
> > > > The generic code is corrected here.
> > >
> > > The customized read function does not help at all.
> > >
> > > uncore_perf_event_update() is used from snb_uncore_imc_event_stop().
> So
> > > it's broken with this patch.
> >
> > Right, need to use the customized read function to replace it as well.
> > I will fix it in next version.
> >
> > >
> > > This is a complete and utter mess to review.
> >
> > Most of the customized functions will be clean up after the series is 
> > applied.
> 
> Correct, but please try to split 2/5 into parts as well. It's a hodgepodge
> of new things and modifications to the code. The logical split is to add
> the new data structures and struct members along with the inline helpers
> and then in the next patch make use of it.
>

Sure, I will do that.

Also, I just noticed that the '>=' is used in nhmex_uncore_msr_enable_event
as well, which does not make sense and unnecessary.
For Nehalem and Westmere, there is only one fixed counter for W-Box.
It's nhm specific issue, not related to generic code.
I will also clean it up in next version. 
 
Thanks,
Kan



RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > But then you have this in uncore_perf_event_update():
> > >
> > > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > >
> > > So how is that supposed to work?
> >
> > This is for generic code.
> >
> > In previous code, the event_read function for IMC use generic code.
> > So we have to deal with >= in generic code.
> >
> > Now, customized event_read function 'snb_uncore_imc_event_read'
> >  is introduced for IMC. So IMC does not touch the generic code.
> > The generic code is corrected here.
> 
> The customized read function does not help at all.
> 
> uncore_perf_event_update() is used from snb_uncore_imc_event_stop(). So
> it's broken with this patch.

Right, need to use the customized read function to replace it as well.
I will fix it in next version.

> 
> This is a complete and utter mess to review.

Most of the customized functions will be clean up after the series is applied.

Thanks,
Kan



RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > But then you have this in uncore_perf_event_update():
> > >
> > > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > >
> > > So how is that supposed to work?
> >
> > This is for generic code.
> >
> > In previous code, the event_read function for IMC use generic code.
> > So we have to deal with >= in generic code.
> >
> > Now, customized event_read function 'snb_uncore_imc_event_read'
> >  is introduced for IMC. So IMC does not touch the generic code.
> > The generic code is corrected here.
> 
> The customized read function does not help at all.
> 
> uncore_perf_event_update() is used from snb_uncore_imc_event_stop(). So
> it's broken with this patch.

Right, need to use the customized read function to replace it as well.
I will fix it in next version.

> 
> This is a complete and utter mess to review.

Most of the customized functions will be clean up after the series is applied.

Thanks,
Kan



RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > > > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > > > Patch 5/5 will clean up the client IMC uncore.
> > > > > > Before that, we still need it to make client IMC uncore work.
> > > > > >
> > > > > > This patch isolates the >= case for client IMC uncore.
> > > > >
> > > > > Fair enough. A comment to that effect (even when removed later)
> > > > > would have avoided that question.
> > > >
> > > > Thinking more about it. The current code only supports the fixed one,
> right?
> > > > So why would it deal with anything > FIXED?
> > > >
> > >
> > > There are two free running counters in IMC.
> > > To support the second one, the previous code implicitly do
> > > UNCORE_PMC_IDX_FIXED + 1.
> > > So it has to deal with > FIXED case.
> > >
> > >   case SNB_UNCORE_PCI_IMC_DATA_READS:
> > >   base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
> > >   idx = UNCORE_PMC_IDX_FIXED;
> > >   break;
> > >   case SNB_UNCORE_PCI_IMC_DATA_WRITES:
> > >   base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
> > >   idx = UNCORE_PMC_IDX_FIXED + 1;
> > >   break;
> > >   default:
> > >   return -EINVAL;
> >
> > Fugly that is, but as its cleaned up later
> 
> But then you have this in uncore_perf_event_update():
> 
> -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> 
> So how is that supposed to work?

This is for generic code. 

In previous code, the event_read function for IMC use generic code.
So we have to deal with >= in generic code.

Now, customized event_read function 'snb_uncore_imc_event_read'
 is introduced for IMC. So IMC does not touch the generic code. 
The generic code is corrected here.

Thanks,
Kan
> 
> I think your patch order is wrong and breaks bisectability all over the place 
> as
> you fixup the UNCORE_PMC_IDX_FIXED + 1 hackery in 5/5.
> 
> Thanks,
> 
>   tglx


RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > > > > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > > > Patch 5/5 will clean up the client IMC uncore.
> > > > > > Before that, we still need it to make client IMC uncore work.
> > > > > >
> > > > > > This patch isolates the >= case for client IMC uncore.
> > > > >
> > > > > Fair enough. A comment to that effect (even when removed later)
> > > > > would have avoided that question.
> > > >
> > > > Thinking more about it. The current code only supports the fixed one,
> right?
> > > > So why would it deal with anything > FIXED?
> > > >
> > >
> > > There are two free running counters in IMC.
> > > To support the second one, the previous code implicitly do
> > > UNCORE_PMC_IDX_FIXED + 1.
> > > So it has to deal with > FIXED case.
> > >
> > >   case SNB_UNCORE_PCI_IMC_DATA_READS:
> > >   base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
> > >   idx = UNCORE_PMC_IDX_FIXED;
> > >   break;
> > >   case SNB_UNCORE_PCI_IMC_DATA_WRITES:
> > >   base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
> > >   idx = UNCORE_PMC_IDX_FIXED + 1;
> > >   break;
> > >   default:
> > >   return -EINVAL;
> >
> > Fugly that is, but as its cleaned up later
> 
> But then you have this in uncore_perf_event_update():
> 
> -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> 
> So how is that supposed to work?

This is for generic code. 

In previous code, the event_read function for IMC use generic code.
So we have to deal with >= in generic code.

Now, customized event_read function 'snb_uncore_imc_event_read'
 is introduced for IMC. So IMC does not touch the generic code. 
The generic code is corrected here.

Thanks,
Kan
> 
> I think your patch order is wrong and breaks bisectability all over the place 
> as
> you fixup the UNCORE_PMC_IDX_FIXED + 1 hackery in 5/5.
> 
> Thanks,
> 
>   tglx


RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > On Tue, 24 Oct 2017, kan.li...@intel.com wrote:
> > > > > - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > > > + if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > > > >   shift = 64 - uncore_fixed_ctr_bits(box);
> > > > >   else
> > > > >   shift = 64 - uncore_perf_ctr_bits(box); diff --git
> > > > > a/arch/x86/events/intel/uncore_snb.c
> > > > > b/arch/x86/events/intel/uncore_snb.c
> > > > > index db1127c..9d5cd3f 100644
> > > > > --- a/arch/x86/events/intel/uncore_snb.c
> > > > > +++ b/arch/x86/events/intel/uncore_snb.c
> > > > > @@ -498,6 +498,30 @@ static void snb_uncore_imc_event_del(struct
> > > > perf_event *event, int flags)
> > > > >   snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);  }
> > > > >
> > > > > +static void snb_uncore_imc_event_read(struct perf_event *event) {
> > > > > + struct intel_uncore_box *box = uncore_event_to_box(event);
> > > > > + u64 prev_count, new_count, delta;
> > > > > + int shift;
> > > > > +
> > > > > + if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > >
> > > > And this needs to be >= because?
> > >
> > > Patch 5/5 will clean up the client IMC uncore.
> > > Before that, we still need it to make client IMC uncore work.
> > >
> > > This patch isolates the >= case for client IMC uncore.
> >
> > Fair enough. A comment to that effect (even when removed later) would
> > have avoided that question.
> 
> Thinking more about it. The current code only supports the fixed one, right?
> So why would it deal with anything > FIXED?
> 

There are two free running counters in IMC.
To support the second one, the previous code implicitly do
UNCORE_PMC_IDX_FIXED + 1.
So it has to deal with > FIXED case.

case SNB_UNCORE_PCI_IMC_DATA_READS:
base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
idx = UNCORE_PMC_IDX_FIXED;
break;
case SNB_UNCORE_PCI_IMC_DATA_WRITES:
base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
idx = UNCORE_PMC_IDX_FIXED + 1;
break;
default:
return -EINVAL;

Thanks,
Kan


RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Thu, 2 Nov 2017, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Liang, Kan wrote:
> > > > On Tue, 24 Oct 2017, kan.li...@intel.com wrote:
> > > > > - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > > > + if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > > > >   shift = 64 - uncore_fixed_ctr_bits(box);
> > > > >   else
> > > > >   shift = 64 - uncore_perf_ctr_bits(box); diff --git
> > > > > a/arch/x86/events/intel/uncore_snb.c
> > > > > b/arch/x86/events/intel/uncore_snb.c
> > > > > index db1127c..9d5cd3f 100644
> > > > > --- a/arch/x86/events/intel/uncore_snb.c
> > > > > +++ b/arch/x86/events/intel/uncore_snb.c
> > > > > @@ -498,6 +498,30 @@ static void snb_uncore_imc_event_del(struct
> > > > perf_event *event, int flags)
> > > > >   snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);  }
> > > > >
> > > > > +static void snb_uncore_imc_event_read(struct perf_event *event) {
> > > > > + struct intel_uncore_box *box = uncore_event_to_box(event);
> > > > > + u64 prev_count, new_count, delta;
> > > > > + int shift;
> > > > > +
> > > > > + if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > > >
> > > > And this needs to be >= because?
> > >
> > > Patch 5/5 will clean up the client IMC uncore.
> > > Before that, we still need it to make client IMC uncore work.
> > >
> > > This patch isolates the >= case for client IMC uncore.
> >
> > Fair enough. A comment to that effect (even when removed later) would
> > have avoided that question.
> 
> Thinking more about it. The current code only supports the fixed one, right?
> So why would it deal with anything > FIXED?
> 

There are two free running counters in IMC.
To support the second one, the previous code implicitly do
UNCORE_PMC_IDX_FIXED + 1.
So it has to deal with > FIXED case.

case SNB_UNCORE_PCI_IMC_DATA_READS:
base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
idx = UNCORE_PMC_IDX_FIXED;
break;
case SNB_UNCORE_PCI_IMC_DATA_WRITES:
base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
idx = UNCORE_PMC_IDX_FIXED + 1;
break;
default:
return -EINVAL;

Thanks,
Kan


RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Tue, 24 Oct 2017, kan.li...@intel.com wrote:
> > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > shift = 64 - uncore_fixed_ctr_bits(box);
> > else
> > shift = 64 - uncore_perf_ctr_bits(box); diff --git
> > a/arch/x86/events/intel/uncore_snb.c
> > b/arch/x86/events/intel/uncore_snb.c
> > index db1127c..9d5cd3f 100644
> > --- a/arch/x86/events/intel/uncore_snb.c
> > +++ b/arch/x86/events/intel/uncore_snb.c
> > @@ -498,6 +498,30 @@ static void snb_uncore_imc_event_del(struct
> perf_event *event, int flags)
> > snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);  }
> >
> > +static void snb_uncore_imc_event_read(struct perf_event *event) {
> > +   struct intel_uncore_box *box = uncore_event_to_box(event);
> > +   u64 prev_count, new_count, delta;
> > +   int shift;
> > +
> > +   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> 
> And this needs to be >= because?

Patch 5/5 will clean up the client IMC uncore.
Before that, we still need it to make client IMC uncore work.

This patch isolates the >= case for client IMC uncore.

Thanks,
Kan



RE: [PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-11-02 Thread Liang, Kan
> On Tue, 24 Oct 2017, kan.li...@intel.com wrote:
> > -   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> > +   if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
> > shift = 64 - uncore_fixed_ctr_bits(box);
> > else
> > shift = 64 - uncore_perf_ctr_bits(box); diff --git
> > a/arch/x86/events/intel/uncore_snb.c
> > b/arch/x86/events/intel/uncore_snb.c
> > index db1127c..9d5cd3f 100644
> > --- a/arch/x86/events/intel/uncore_snb.c
> > +++ b/arch/x86/events/intel/uncore_snb.c
> > @@ -498,6 +498,30 @@ static void snb_uncore_imc_event_del(struct
> perf_event *event, int flags)
> > snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);  }
> >
> > +static void snb_uncore_imc_event_read(struct perf_event *event) {
> > +   struct intel_uncore_box *box = uncore_event_to_box(event);
> > +   u64 prev_count, new_count, delta;
> > +   int shift;
> > +
> > +   if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> 
> And this needs to be >= because?

Patch 5/5 will clean up the client IMC uncore.
Before that, we still need it to make client IMC uncore work.

This patch isolates the >= case for client IMC uncore.

Thanks,
Kan



RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-02 Thread Liang, Kan
Hi Namhyung,

> On Wed, Nov 01, 2017 at 04:22:53PM +0000, Liang, Kan wrote:
> > > On 2017/11/1 21:57, Liang, Kan wrote:
> > > >> On 2017/11/1 20:00, Namhyung Kim wrote:
> > > >>> On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> > > > There are only four test cases which set overwrite,
> > > > sw-clock,task-exit, mmap-basic, backward-ring-buffer.
> > > > Only backward-ring-buffer is 'backward overwrite'.
> > > > The rest three are all 'forward overwrite'. We just need to set
> > > > write_backward to convert them to 'backward overwrite'.
> > > > I think it's not hard to clean up.
> > >
> > > If we add a new rule that overwrite ring buffers are always backward
> > > then it is not hard to cleanup. However, the support of forward
> > > overwrite ring buffer has a long history and the code is not written
> > > by me. I'd like to check if there is some reason to keep support this
> configuration?
> > >
> >
> > As my observation, currently, there are no perf tools which support
> > 'forward overwrite'.
> > There are only three test cases (sw-clock, task-exit, mmap-basic)
> > which is set to 'forward overwrite'. I don’t see any reason it cannot
> > be changed to 'backward overwrite'
> >
> > Arnaldo? Jirka? Kim?
> >
> > What do you think?
> 
> I think sw-clock, task-exit and mmap-basic test cases can be changed to use
> the forward non-overwrite mode.
> 
> The forward overwrite might be used by externel applications accessing the
> ring buffer directly but not needed for perf tools IMHO.  

The proposal is only for perf tool, not kernel. So external applications can 
still
use forward overwrite to access the ring buffer.

> Let's keep the code simpler as much as possible.

Agree.
Now, there are too many options to access the ring buffer. Not all of them are
supported.
I think we should only keep the crucial options (overwrite/non-overwrite), 
clearly
define them in the document and cleanup the code.

Also, perf record doesn't use the generic interface (e.g. perf_evlist__mmap*) 
as other
tools to access ring buffer. Because the generic interface is hardcoded to only 
support
forward non-overwrite. We should cleanup it as well. But that could be done 
later
separately.

Thanks,
Kan



RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-02 Thread Liang, Kan
Hi Namhyung,

> On Wed, Nov 01, 2017 at 04:22:53PM +0000, Liang, Kan wrote:
> > > On 2017/11/1 21:57, Liang, Kan wrote:
> > > >> On 2017/11/1 20:00, Namhyung Kim wrote:
> > > >>> On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> > > > There are only four test cases which set overwrite,
> > > > sw-clock,task-exit, mmap-basic, backward-ring-buffer.
> > > > Only backward-ring-buffer is 'backward overwrite'.
> > > > The rest three are all 'forward overwrite'. We just need to set
> > > > write_backward to convert them to 'backward overwrite'.
> > > > I think it's not hard to clean up.
> > >
> > > If we add a new rule that overwrite ring buffers are always backward
> > > then it is not hard to cleanup. However, the support of forward
> > > overwrite ring buffer has a long history and the code is not written
> > > by me. I'd like to check if there is some reason to keep support this
> configuration?
> > >
> >
> > As my observation, currently, there are no perf tools which support
> > 'forward overwrite'.
> > There are only three test cases (sw-clock, task-exit, mmap-basic)
> > which is set to 'forward overwrite'. I don’t see any reason it cannot
> > be changed to 'backward overwrite'
> >
> > Arnaldo? Jirka? Kim?
> >
> > What do you think?
> 
> I think sw-clock, task-exit and mmap-basic test cases can be changed to use
> the forward non-overwrite mode.
> 
> The forward overwrite might be used by externel applications accessing the
> ring buffer directly but not needed for perf tools IMHO.  

The proposal is only for perf tool, not kernel. So external applications can 
still
use forward overwrite to access the ring buffer.

> Let's keep the code simpler as much as possible.

Agree.
Now, there are too many options to access the ring buffer. Not all of them are
supported.
I think we should only keep the crucial options (overwrite/non-overwrite), 
clearly
define them in the document and cleanup the code.

Also, perf record doesn't use the generic interface (e.g. perf_evlist__mmap*) 
as other
tools to access ring buffer. Because the generic interface is hardcoded to only 
support
forward non-overwrite. We should cleanup it as well. But that could be done 
later
separately.

Thanks,
Kan



RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-01 Thread Liang, Kan
> On 2017/11/1 21:57, Liang, Kan wrote:
> >> On 2017/11/1 20:00, Namhyung Kim wrote:
> >>> On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> >>>> On 2017/11/1 17:49, Namhyung Kim wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Nov 01, 2017 at 05:53:26AM +, Wang Nan wrote:
> >>>>>> perf record backward recording doesn't work as we expected: it
> >>>>>> never overwrite when ring buffer full.
> >>>>>>
> >>>>>> Test:
> >>>>>>
> >>>>>> (Run a busy printing python task background like this:
> >>>>>>
> >>>>>> while True:
> >>>>>> print 123
> >>>>>>
> >>>>>> send SIGUSR2 to perf to capture snapshot.)
> >>>> [SNIP]
> >>>>
> >>>>>> Signed-off-by: Wang Nan <wangn...@huawei.com>
> >>>>>> ---
> >>>>>> tools/perf/util/evlist.c | 8 +++-
> >>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>>> index c6c891e..4c5daba 100644
> >>>>>> --- a/tools/perf/util/evlist.c
> >>>>>> +++ b/tools/perf/util/evlist.c
> >>>>>> @@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist
> >> *evlist __maybe_unused,
> >>>>>> }
> >>>>>> static int perf_evlist__mmap_per_evsel(struct perf_evlist
> >>>>>> *evlist, int
> >> idx,
> >>>>>> - struct mmap_params *mp, int
> cpu_idx,
> >>>>>> + struct mmap_params *_mp, int
> cpu_idx,
> >>>>>>   int thread, int *_output, int
> >> *_output_backward)
> >>>>>> {
> >>>>>>struct perf_evsel *evsel;
> >>>>>>int revent;
> >>>>>>int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
> >>>>>> +  struct mmap_params *mp;
> >>>>>>evlist__for_each_entry(evlist, evsel) {
> >>>>>>struct perf_mmap *maps = evlist->mmap;
> >>>>>> +  struct mmap_params rdonly_mp;
> >>>>>>int *output = _output;
> >>>>>>int fd;
> >>>>>>int cpu;
> >>>>>> +  mp = _mp;
> >>>>>>if (evsel->attr.write_backward) {
> >>>>>>output = _output_backward;
> >>>>>>maps = evlist->backward_mmap;
> >>>>>> +  rdonly_mp = *_mp;
> >>>>>> +  rdonly_mp.prot &= ~PROT_WRITE;
> >>>>>> +  mp = _mp;
> >>>>>>if (!maps) {
> >>>>>>maps =
> perf_evlist__alloc_mmap(evlist);
> >>>>>> --
> >>>>> What about this instead (not tested)?
> >>>>>
> >>>>>
> >>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>> index c6c891e154a6..27ebe355e794 100644
> >>>>> --- a/tools/perf/util/evlist.c
> >>>>> +++ b/tools/perf/util/evlist.c
> >>>>> @@ -838,6 +838,11 @@ static int
> perf_evlist__mmap_per_evsel(struct
> >> perf_evlist *evlist, int idx,
> >>>>>if (*output == -1) {
> >>>>>*output = fd;
> >>>>> +   if (evsel->attr.write_backward)
> >>>>> +   mp->prot = PROT_READ;
> >>>>> +   else
> >>>>> +   mp->prot = PROT_READ | PROT_WRITE;
> >>>>> +
> >>>> If evlist->overwrite is true, PROT_WRITE should be unset even if
> >>>> write_backward is not set. If you want to delay the setting of
> >>>> mp->prot, you need to consider both evlist->overwrite and
> >>>> evs

RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-01 Thread Liang, Kan
> On 2017/11/1 21:57, Liang, Kan wrote:
> >> On 2017/11/1 20:00, Namhyung Kim wrote:
> >>> On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> >>>> On 2017/11/1 17:49, Namhyung Kim wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Nov 01, 2017 at 05:53:26AM +, Wang Nan wrote:
> >>>>>> perf record backward recording doesn't work as we expected: it
> >>>>>> never overwrite when ring buffer full.
> >>>>>>
> >>>>>> Test:
> >>>>>>
> >>>>>> (Run a busy printing python task background like this:
> >>>>>>
> >>>>>> while True:
> >>>>>> print 123
> >>>>>>
> >>>>>> send SIGUSR2 to perf to capture snapshot.)
> >>>> [SNIP]
> >>>>
> >>>>>> Signed-off-by: Wang Nan 
> >>>>>> ---
> >>>>>> tools/perf/util/evlist.c | 8 +++-
> >>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>>> index c6c891e..4c5daba 100644
> >>>>>> --- a/tools/perf/util/evlist.c
> >>>>>> +++ b/tools/perf/util/evlist.c
> >>>>>> @@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist
> >> *evlist __maybe_unused,
> >>>>>> }
> >>>>>> static int perf_evlist__mmap_per_evsel(struct perf_evlist
> >>>>>> *evlist, int
> >> idx,
> >>>>>> - struct mmap_params *mp, int
> cpu_idx,
> >>>>>> + struct mmap_params *_mp, int
> cpu_idx,
> >>>>>>   int thread, int *_output, int
> >> *_output_backward)
> >>>>>> {
> >>>>>>struct perf_evsel *evsel;
> >>>>>>int revent;
> >>>>>>int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
> >>>>>> +  struct mmap_params *mp;
> >>>>>>evlist__for_each_entry(evlist, evsel) {
> >>>>>>struct perf_mmap *maps = evlist->mmap;
> >>>>>> +  struct mmap_params rdonly_mp;
> >>>>>>int *output = _output;
> >>>>>>int fd;
> >>>>>>int cpu;
> >>>>>> +  mp = _mp;
> >>>>>>if (evsel->attr.write_backward) {
> >>>>>>output = _output_backward;
> >>>>>>maps = evlist->backward_mmap;
> >>>>>> +  rdonly_mp = *_mp;
> >>>>>> +  rdonly_mp.prot &= ~PROT_WRITE;
> >>>>>> +  mp = _mp;
> >>>>>>if (!maps) {
> >>>>>>maps =
> perf_evlist__alloc_mmap(evlist);
> >>>>>> --
> >>>>> What about this instead (not tested)?
> >>>>>
> >>>>>
> >>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>> index c6c891e154a6..27ebe355e794 100644
> >>>>> --- a/tools/perf/util/evlist.c
> >>>>> +++ b/tools/perf/util/evlist.c
> >>>>> @@ -838,6 +838,11 @@ static int
> perf_evlist__mmap_per_evsel(struct
> >> perf_evlist *evlist, int idx,
> >>>>>if (*output == -1) {
> >>>>>*output = fd;
> >>>>> +   if (evsel->attr.write_backward)
> >>>>> +   mp->prot = PROT_READ;
> >>>>> +   else
> >>>>> +   mp->prot = PROT_READ | PROT_WRITE;
> >>>>> +
> >>>> If evlist->overwrite is true, PROT_WRITE should be unset even if
> >>>> write_backward is not set. If you want to delay the setting of
> >>>> mp->prot, you need to consider both evlist->overwrite and
> >>>> evsel->attr.write_backward.
>

RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
 > On 2017/11/1 23:04, Liang, Kan wrote:
> >> On 2017/11/1 22:22, Liang, Kan wrote:
> >>>> On 2017/11/1 21:26, Liang, Kan wrote:
> >>>>>> The meaning of perf record's "overwrite" option and many
> "overwrite"
> >>>>>> in source code are not clear. In perf's code, the 'overwrite' has
> >>>>>> 2
> >> meanings:
> >>>>>> 1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>>>>> 2. Set evsel's "backward" attribute (in apply_config_terms).
> >>>>>>
> >>>>>> perf record doesn't use meaning 1 at all, but have a overwrite
> >>>>>> option, its real meaning is setting backward.
> >>>>>>
> >>>>> I don't understand here.
> >>>>> 'overwrite' has 2 meanings. perf record only support 1.
> >>>>> It should be a bug, and need to be fixed.
> >>>> Not a bug, but ambiguous.
> >>>>
> >>>> Perf record doesn't need overwrite main channel (we have two
> channels:
> >>>> evlist->mmap is main channel and evlist->backward_mmap is backward
> >>>> evlist->channel),
> >>>> but some testcases require it, and your new patchset may require it.
> >>>> 'perf record --overwrite' doesn't set main channel overwrite. What
> >>>> it does
> >> is
> >>>> moving all evsels to backward channel, and we can move some evsels
> >> back to
> >>>> the main channel by /no-overwrite/ setting. This behavior is hard
> >>>> to understand.
> >>>>
> >>> As my understanding, the 'main channel' should depends on what user
> sets.
> >>> If --overwrite is applied, then evlist->backward_mmap should be the
> >>> 'main channel'. evlist->overwrite should be set to true as well.
> >> Then it introduces a main channel switching mechanism, and we need
> >> checking evlist->overwrite and another factor to determine which one
> >> is the main channel. Make things more complex.
> > We should check the evlist->overwrite.
> > Now, all perf tools force evlist->overwrite = false. I think it doesn’t make
> sense.
> >
> > What is another factor?
> 
> If we support mixed channel as well as forward overwrite ring buffer,
> evlist->overwrite is not enough.

I think you have a detailed analysis regarding to the weakness of 'forward 
overwrite'.
commit ID 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf 
event").

There is no perf tools use 'forward overwrite' mode.

The only users are three perf test cases. We can change them to 'backward 
overwrite'

I think it's OK to discard the 'forward overwrite' mode

> 
> > I don't think it will be too complex.
> >
> > In perf_evlist__mmap_ex, we just need to add a check.
> > If (!overwrite)
> > evlist->mmap = perf_evlist__alloc_mmap(evlist); else
> > evlist->backward_mmap = perf_evlist__alloc_mmap(evlist);
> >
> > In perf_evlist__mmap_per_evsel, we already handle per-event overwrite.
> > It just need to add some similar codes to handler per-event nonoverwrite.
> 
> Then the logic becomes:
> 
>   if (check write_backward) {
>  maps = evlist->backward_mmap;
>  if (!maps) {
>maps = perf_evlist__alloc_mmap(...);
>if (!maps) {
>/* error processing */
>}
>evlist->backward_mmap = maps;
>if (evlist->bkw_mmap_state == BKW_MMAP_NOTREADY)
>  perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING);
>  }
>   } else {
>  maps = evlist->mmap;
>  if (!maps) {
>maps = perf_evlist__alloc_mmap(...);
>if (!maps) {
>/* error processing */
>}
>evlist->mmap = maps;
>
>  }
>   }
> 

Thanks.
It looks good to me.

Kan

> > For other codes, they should already check evlist->mmap and evlist-
> >backward_mmap.
> > So they probably don't need to change.
> >
> >
> > Thanks,
> > Kan
> >
> >
> 



RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
 > On 2017/11/1 23:04, Liang, Kan wrote:
> >> On 2017/11/1 22:22, Liang, Kan wrote:
> >>>> On 2017/11/1 21:26, Liang, Kan wrote:
> >>>>>> The meaning of perf record's "overwrite" option and many
> "overwrite"
> >>>>>> in source code are not clear. In perf's code, the 'overwrite' has
> >>>>>> 2
> >> meanings:
> >>>>>> 1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>>>>> 2. Set evsel's "backward" attribute (in apply_config_terms).
> >>>>>>
> >>>>>> perf record doesn't use meaning 1 at all, but have a overwrite
> >>>>>> option, its real meaning is setting backward.
> >>>>>>
> >>>>> I don't understand here.
> >>>>> 'overwrite' has 2 meanings. perf record only support 1.
> >>>>> It should be a bug, and need to be fixed.
> >>>> Not a bug, but ambiguous.
> >>>>
> >>>> Perf record doesn't need overwrite main channel (we have two
> channels:
> >>>> evlist->mmap is main channel and evlist->backward_mmap is backward
> >>>> evlist->channel),
> >>>> but some testcases require it, and your new patchset may require it.
> >>>> 'perf record --overwrite' doesn't set main channel overwrite. What
> >>>> it does
> >> is
> >>>> moving all evsels to backward channel, and we can move some evsels
> >> back to
> >>>> the main channel by /no-overwrite/ setting. This behavior is hard
> >>>> to understand.
> >>>>
> >>> As my understanding, the 'main channel' should depends on what user
> sets.
> >>> If --overwrite is applied, then evlist->backward_mmap should be the
> >>> 'main channel'. evlist->overwrite should be set to true as well.
> >> Then it introduces a main channel switching mechanism, and we need
> >> checking evlist->overwrite and another factor to determine which one
> >> is the main channel. Make things more complex.
> > We should check the evlist->overwrite.
> > Now, all perf tools force evlist->overwrite = false. I think it doesn’t make
> sense.
> >
> > What is another factor?
> 
> If we support mixed channel as well as forward overwrite ring buffer,
> evlist->overwrite is not enough.

I think you have a detailed analysis regarding to the weakness of 'forward 
overwrite'.
commit ID 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf 
event").

There is no perf tools use 'forward overwrite' mode.

The only users are three perf test cases. We can change them to 'backward 
overwrite'

I think it's OK to discard the 'forward overwrite' mode

> 
> > I don't think it will be too complex.
> >
> > In perf_evlist__mmap_ex, we just need to add a check.
> > If (!overwrite)
> > evlist->mmap = perf_evlist__alloc_mmap(evlist); else
> > evlist->backward_mmap = perf_evlist__alloc_mmap(evlist);
> >
> > In perf_evlist__mmap_per_evsel, we already handle per-event overwrite.
> > It just need to add some similar codes to handler per-event nonoverwrite.
> 
> Then the logic becomes:
> 
>   if (check write_backward) {
>  maps = evlist->backward_mmap;
>  if (!maps) {
>maps = perf_evlist__alloc_mmap(...);
>if (!maps) {
>/* error processing */
>}
>evlist->backward_mmap = maps;
>if (evlist->bkw_mmap_state == BKW_MMAP_NOTREADY)
>  perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING);
>  }
>   } else {
>  maps = evlist->mmap;
>  if (!maps) {
>maps = perf_evlist__alloc_mmap(...);
>if (!maps) {
>/* error processing */
>}
>evlist->mmap = maps;
>
>  }
>   }
> 

Thanks.
It looks good to me.

Kan

> > For other codes, they should already check evlist->mmap and evlist-
> >backward_mmap.
> > So they probably don't need to change.
> >
> >
> > Thanks,
> > Kan
> >
> >
> 



RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> On 2017/11/1 22:22, Liang, Kan wrote:
> >> On 2017/11/1 21:26, Liang, Kan wrote:
> >>>> The meaning of perf record's "overwrite" option and many "overwrite"
> >>>> in source code are not clear. In perf's code, the 'overwrite' has 2
> meanings:
> >>>>1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>>>2. Set evsel's "backward" attribute (in apply_config_terms).
> >>>>
> >>>> perf record doesn't use meaning 1 at all, but have a overwrite
> >>>> option, its real meaning is setting backward.
> >>>>
> >>> I don't understand here.
> >>> 'overwrite' has 2 meanings. perf record only support 1.
> >>> It should be a bug, and need to be fixed.
> >> Not a bug, but ambiguous.
> >>
> >> Perf record doesn't need overwrite main channel (we have two channels:
> >> evlist->mmap is main channel and evlist->backward_mmap is backward
> >> evlist->channel),
> >> but some testcases require it, and your new patchset may require it.
> >> 'perf record --overwrite' doesn't set main channel overwrite. What it does
> is
> >> moving all evsels to backward channel, and we can move some evsels
> back to
> >> the main channel by /no-overwrite/ setting. This behavior is hard to
> >> understand.
> >>
> > As my understanding, the 'main channel' should depends on what user sets.
> > If --overwrite is applied, then evlist->backward_mmap should be the
> > 'main channel'. evlist->overwrite should be set to true as well.
> 
> Then it introduces a main channel switching mechanism, and we need
> checking evlist->overwrite and another factor to determine which
> one is the main channel. Make things more complex.

We should check the evlist->overwrite.
Now, all perf tools force evlist->overwrite = false. I think it doesn’t make 
sense.

What is another factor?

I don't think it will be too complex.

In perf_evlist__mmap_ex, we just need to add a check.
If (!overwrite)
evlist->mmap = perf_evlist__alloc_mmap(evlist);
else
evlist->backward_mmap = perf_evlist__alloc_mmap(evlist);

In perf_evlist__mmap_per_evsel, we already handle per-event overwrite.
It just need to add some similar codes to handler per-event nonoverwrite.  

For other codes, they should already check evlist->mmap and 
evlist->backward_mmap.
So they probably don't need to change.


Thanks,
Kan




RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> On 2017/11/1 22:22, Liang, Kan wrote:
> >> On 2017/11/1 21:26, Liang, Kan wrote:
> >>>> The meaning of perf record's "overwrite" option and many "overwrite"
> >>>> in source code are not clear. In perf's code, the 'overwrite' has 2
> meanings:
> >>>>1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>>>2. Set evsel's "backward" attribute (in apply_config_terms).
> >>>>
> >>>> perf record doesn't use meaning 1 at all, but have a overwrite
> >>>> option, its real meaning is setting backward.
> >>>>
> >>> I don't understand here.
> >>> 'overwrite' has 2 meanings. perf record only support 1.
> >>> It should be a bug, and need to be fixed.
> >> Not a bug, but ambiguous.
> >>
> >> Perf record doesn't need overwrite main channel (we have two channels:
> >> evlist->mmap is main channel and evlist->backward_mmap is backward
> >> evlist->channel),
> >> but some testcases require it, and your new patchset may require it.
> >> 'perf record --overwrite' doesn't set main channel overwrite. What it does
> is
> >> moving all evsels to backward channel, and we can move some evsels
> back to
> >> the main channel by /no-overwrite/ setting. This behavior is hard to
> >> understand.
> >>
> > As my understanding, the 'main channel' should depends on what user sets.
> > If --overwrite is applied, then evlist->backward_mmap should be the
> > 'main channel'. evlist->overwrite should be set to true as well.
> 
> Then it introduces a main channel switching mechanism, and we need
> checking evlist->overwrite and another factor to determine which
> one is the main channel. Make things more complex.

We should check the evlist->overwrite.
Now, all perf tools force evlist->overwrite = false. I think it doesn’t make 
sense.

What is another factor?

I don't think it will be too complex.

In perf_evlist__mmap_ex, we just need to add a check.
If (!overwrite)
evlist->mmap = perf_evlist__alloc_mmap(evlist);
else
evlist->backward_mmap = perf_evlist__alloc_mmap(evlist);

In perf_evlist__mmap_per_evsel, we already handle per-event overwrite.
It just need to add some similar codes to handler per-event nonoverwrite.  

For other codes, they should already check evlist->mmap and 
evlist->backward_mmap.
So they probably don't need to change.


Thanks,
Kan




RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> On 2017/11/1 21:26, Liang, Kan wrote:
> >> The meaning of perf record's "overwrite" option and many "overwrite"
> >> in source code are not clear. In perf's code, the 'overwrite' has 2 
> >> meanings:
> >>   1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>   2. Set evsel's "backward" attribute (in apply_config_terms).
> >>
> >> perf record doesn't use meaning 1 at all, but have a overwrite
> >> option, its real meaning is setting backward.
> >>
> > I don't understand here.
> > 'overwrite' has 2 meanings. perf record only support 1.
> > It should be a bug, and need to be fixed.
> 
> Not a bug, but ambiguous.
> 
> Perf record doesn't need overwrite main channel (we have two channels:
> evlist->mmap is main channel and evlist->backward_mmap is backward
> evlist->channel),
> but some testcases require it, and your new patchset may require it.
> 'perf record --overwrite' doesn't set main channel overwrite. What it does is
> moving all evsels to backward channel, and we can move some evsels back to
> the main channel by /no-overwrite/ setting. This behavior is hard to
> understand.
> 

As my understanding, the 'main channel' should depends on what user sets.
If --overwrite is applied, then evlist->backward_mmap should be the
'main channel'. evlist->overwrite should be set to true as well.

/no-overwrite/ setting is per-event setting.
Only when we finish the global setting, then the per-event setting will be
considered. You may refer to apply_config_terms.

Thanks,
Kan






RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> On 2017/11/1 21:26, Liang, Kan wrote:
> >> The meaning of perf record's "overwrite" option and many "overwrite"
> >> in source code are not clear. In perf's code, the 'overwrite' has 2 
> >> meanings:
> >>   1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
> >>   2. Set evsel's "backward" attribute (in apply_config_terms).
> >>
> >> perf record doesn't use meaning 1 at all, but have a overwrite
> >> option, its real meaning is setting backward.
> >>
> > I don't understand here.
> > 'overwrite' has 2 meanings. perf record only support 1.
> > It should be a bug, and need to be fixed.
> 
> Not a bug, but ambiguous.
> 
> Perf record doesn't need overwrite main channel (we have two channels:
> evlist->mmap is main channel and evlist->backward_mmap is backward
> evlist->channel),
> but some testcases require it, and your new patchset may require it.
> 'perf record --overwrite' doesn't set main channel overwrite. What it does is
> moving all evsels to backward channel, and we can move some evsels back to
> the main channel by /no-overwrite/ setting. This behavior is hard to
> understand.
> 

As my understanding, the 'main channel' should depends on what user sets.
If --overwrite is applied, then evlist->backward_mmap should be the
'main channel'. evlist->overwrite should be set to true as well.

/no-overwrite/ setting is per-event setting.
Only when we finish the global setting, then the per-event setting will be
considered. You may refer to apply_config_terms.

Thanks,
Kan






RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-01 Thread Liang, Kan
> On 2017/11/1 20:00, Namhyung Kim wrote:
> > On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> >>
> >> On 2017/11/1 17:49, Namhyung Kim wrote:
> >>> Hi,
> >>>
> >>> On Wed, Nov 01, 2017 at 05:53:26AM +, Wang Nan wrote:
>  perf record backward recording doesn't work as we expected: it never
>  overwrite when ring buffer full.
> 
>  Test:
> 
>  (Run a busy printing python task background like this:
> 
> while True:
> print 123
> 
>  send SIGUSR2 to perf to capture snapshot.)
> >> [SNIP]
> >>
>  Signed-off-by: Wang Nan 
>  ---
> tools/perf/util/evlist.c | 8 +++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
> 
>  diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>  index c6c891e..4c5daba 100644
>  --- a/tools/perf/util/evlist.c
>  +++ b/tools/perf/util/evlist.c
>  @@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist
> *evlist __maybe_unused,
> }
> static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int
> idx,
>  -   struct mmap_params *mp, int 
>  cpu_idx,
>  +   struct mmap_params *_mp, int 
>  cpu_idx,
>  int thread, int *_output, int
> *_output_backward)
> {
>   struct perf_evsel *evsel;
>   int revent;
>   int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
>  +struct mmap_params *mp;
>   evlist__for_each_entry(evlist, evsel) {
>   struct perf_mmap *maps = evlist->mmap;
>  +struct mmap_params rdonly_mp;
>   int *output = _output;
>   int fd;
>   int cpu;
>  +mp = _mp;
>   if (evsel->attr.write_backward) {
>   output = _output_backward;
>   maps = evlist->backward_mmap;
>  +rdonly_mp = *_mp;
>  +rdonly_mp.prot &= ~PROT_WRITE;
>  +mp = _mp;
>   if (!maps) {
>   maps = perf_evlist__alloc_mmap(evlist);
>  --
> >>> What about this instead (not tested)?
> >>>
> >>>
> >>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>> index c6c891e154a6..27ebe355e794 100644
> >>> --- a/tools/perf/util/evlist.c
> >>> +++ b/tools/perf/util/evlist.c
> >>> @@ -838,6 +838,11 @@ static int perf_evlist__mmap_per_evsel(struct
> perf_evlist *evlist, int idx,
> >>>   if (*output == -1) {
> >>>   *output = fd;
> >>> +   if (evsel->attr.write_backward)
> >>> +   mp->prot = PROT_READ;
> >>> +   else
> >>> +   mp->prot = PROT_READ | PROT_WRITE;
> >>> +
> >> If evlist->overwrite is true, PROT_WRITE should be unset even if
> >> write_backward is
> >> not set. If you want to delay the setting of mp->prot, you need to consider
> >> both evlist->overwrite and evsel->attr.write_backward.
> > I thought evsel->attr.write_backward should be set when
> > evlist->overwrite is set.  Do you mean following case?
> >
> >perf record --overwrite -e 'cycles/no-overwrite/'
> >
> 
> No. evlist->overwrite is unrelated to '--overwrite'. This is why I
> said the concept of 'overwrite' and 'backward' is ambiguous.
>

Yes, I think we should make it clear.

As we discussed previously, there are four possible combinations
to access ring buffer , 'forward non-overwrite', 'forward overwrite',
'backward non-overwrite' and 'backward overwrite'.

Actually, not all of the combinations are necessary.
- 'forward overwrite' mode brings some problems which were mentioned
  in commit ID 9ecda41acb97 ("perf/core: Add ::write_backward attribute
  to perf event").
- 'backward non-overwrite' mode is very similar as 'forward non-overwrite'.
   There is no extra advantage. Only keep one non-overwrite mode is enough.
So 'forward non-overwrite' and 'backward overwrite' are enough for all perf 
tools.

Furthermore, 'forward' and 'backward' only indicate the direction of the
ring buffer. They don't impact the result and performance. It is not
important as the concept of overwrite/non-overwrite.

To simplify the concept, only 'non-overwrite' mode and 'overwrite' mode should
be kept. 'non-overwrite' mode indicates the forward ring buffer. 'overwrite' 
mode
indicates the backward ring buffer.

> perf record never sets 'evlist->overwrite'. What '--overwrite' actually
> does is setting write_backward. Some testcases needs overwrite evlist.
> 

There are only four test cases which set overwrite, sw-clock,task-exit,
mmap-basic, backward-ring-buffer.
Only backward-ring-buffer is 'backward 

RE: [PATCH 1/2] perf mmap: Fix perf backward recording

2017-11-01 Thread Liang, Kan
> On 2017/11/1 20:00, Namhyung Kim wrote:
> > On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> >>
> >> On 2017/11/1 17:49, Namhyung Kim wrote:
> >>> Hi,
> >>>
> >>> On Wed, Nov 01, 2017 at 05:53:26AM +, Wang Nan wrote:
>  perf record backward recording doesn't work as we expected: it never
>  overwrite when ring buffer full.
> 
>  Test:
> 
>  (Run a busy printing python task background like this:
> 
> while True:
> print 123
> 
>  send SIGUSR2 to perf to capture snapshot.)
> >> [SNIP]
> >>
>  Signed-off-by: Wang Nan 
>  ---
> tools/perf/util/evlist.c | 8 +++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
> 
>  diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>  index c6c891e..4c5daba 100644
>  --- a/tools/perf/util/evlist.c
>  +++ b/tools/perf/util/evlist.c
>  @@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist
> *evlist __maybe_unused,
> }
> static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int
> idx,
>  -   struct mmap_params *mp, int 
>  cpu_idx,
>  +   struct mmap_params *_mp, int 
>  cpu_idx,
>  int thread, int *_output, int
> *_output_backward)
> {
>   struct perf_evsel *evsel;
>   int revent;
>   int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
>  +struct mmap_params *mp;
>   evlist__for_each_entry(evlist, evsel) {
>   struct perf_mmap *maps = evlist->mmap;
>  +struct mmap_params rdonly_mp;
>   int *output = _output;
>   int fd;
>   int cpu;
>  +mp = _mp;
>   if (evsel->attr.write_backward) {
>   output = _output_backward;
>   maps = evlist->backward_mmap;
>  +rdonly_mp = *_mp;
>  +rdonly_mp.prot &= ~PROT_WRITE;
>  +mp = _mp;
>   if (!maps) {
>   maps = perf_evlist__alloc_mmap(evlist);
>  --
> >>> What about this instead (not tested)?
> >>>
> >>>
> >>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>> index c6c891e154a6..27ebe355e794 100644
> >>> --- a/tools/perf/util/evlist.c
> >>> +++ b/tools/perf/util/evlist.c
> >>> @@ -838,6 +838,11 @@ static int perf_evlist__mmap_per_evsel(struct
> perf_evlist *evlist, int idx,
> >>>   if (*output == -1) {
> >>>   *output = fd;
> >>> +   if (evsel->attr.write_backward)
> >>> +   mp->prot = PROT_READ;
> >>> +   else
> >>> +   mp->prot = PROT_READ | PROT_WRITE;
> >>> +
> >> If evlist->overwrite is true, PROT_WRITE should be unset even if
> >> write_backward is
> >> not set. If you want to delay the setting of mp->prot, you need to consider
> >> both evlist->overwrite and evsel->attr.write_backward.
> > I thought evsel->attr.write_backward should be set when
> > evlist->overwrite is set.  Do you mean following case?
> >
> >perf record --overwrite -e 'cycles/no-overwrite/'
> >
> 
> No. evlist->overwrite is unrelated to '--overwrite'. This is why I
> said the concept of 'overwrite' and 'backward' is ambiguous.
>

Yes, I think we should make it clear.

As we discussed previously, there are four possible combinations
to access ring buffer , 'forward non-overwrite', 'forward overwrite',
'backward non-overwrite' and 'backward overwrite'.

Actually, not all of the combinations are necessary.
- 'forward overwrite' mode brings some problems which were mentioned
  in commit ID 9ecda41acb97 ("perf/core: Add ::write_backward attribute
  to perf event").
- 'backward non-overwrite' mode is very similar as 'forward non-overwrite'.
   There is no extra advantage. Only keep one non-overwrite mode is enough.
So 'forward non-overwrite' and 'backward overwrite' are enough for all perf 
tools.

Furthermore, 'forward' and 'backward' only indicate the direction of the
ring buffer. They don't impact the result and performance. It is not
important as the concept of overwrite/non-overwrite.

To simplify the concept, only 'non-overwrite' mode and 'overwrite' mode should
be kept. 'non-overwrite' mode indicates the forward ring buffer. 'overwrite' 
mode
indicates the backward ring buffer.

> perf record never sets 'evlist->overwrite'. What '--overwrite' actually
> does is setting write_backward. Some testcases needs overwrite evlist.
> 

There are only four test cases which set overwrite, sw-clock,task-exit,
mmap-basic, backward-ring-buffer.
Only backward-ring-buffer is 'backward overwrite'.
The rest 

RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> The meaning of perf record's "overwrite" option and many "overwrite" in
> source code are not clear. In perf's code, the 'overwrite' has 2 meanings:
>  1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
>  2. Set evsel's "backward" attribute (in apply_config_terms).
> 
> perf record doesn't use meaning 1 at all, but have a overwrite option, its
> real meaning is setting backward.
>

I don't understand here.
'overwrite' has 2 meanings. perf record only support 1.
It should be a bug, and need to be fixed.
Why do we need a new mode? 

I think a new mode just make the codes more complex.

Thanks,
Kan

> This patch separates these two concepts, introduce 'flightrecorder' mode
> which is what we really want. It combines these 2 concept together, wraps
> them into a record mode. In flight recorder mode, perf only dumps data
> before
> something happen.
> 
> Signed-off-by: Wang Nan 
> ---
>  tools/perf/Documentation/perf-record.txt |  8 
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/perf.h|  2 +-
>  tools/perf/util/evsel.c  |  6 +++---
>  tools/perf/util/evsel.h  |  4 ++--
>  tools/perf/util/parse-events.c   | 20 ++--
>  tools/perf/util/parse-events.h   |  4 ++--
>  tools/perf/util/parse-events.l   |  4 ++--
>  8 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt
> b/tools/perf/Documentation/perf-record.txt
> index 5a626ef..463c2d3 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -467,19 +467,19 @@ the beginning of record, collect them during
> finalizing an output file.
>  The collected non-sample events reflects the status of the system when
>  record is finished.
> 
> ---overwrite::
> +--flight-recorder::
>  Makes all events use an overwritable ring buffer. An overwritable ring
>  buffer works like a flight recorder: when it gets full, the kernel will
>  overwrite the oldest records, that thus will never make it to the
>  perf.data file.
> 
> -When '--overwrite' and '--switch-output' are used perf records and drops
> +When '--flight-recorder' and '--switch-output' are used perf records and
> drops
>  events until it receives a signal, meaning that something unusual was
>  detected that warrants taking a snapshot of the most current events,
>  those fitting in the ring buffer at that moment.
> 
> -'overwrite' attribute can also be set or canceled for an event using
> -config terms. For example: 'cycles/overwrite/' and 'instructions/no-
> overwrite/'.
> +'flightrecorder' attribute can also be set or canceled separately for an 
> event
> using
> +config terms. For example: 'cycles/flightrecorder/' and 'instructions/no-
> flightrecorder/'.
> 
>  Implies --tail-synthesize.
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc5..315ea09 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1489,7 +1489,7 @@ static struct option __record_options[] = {
>   "child tasks do not inherit counters"),
>   OPT_BOOLEAN(0, "tail-synthesize", _synthesize,
>   "synthesize non-sample events at the end of output"),
> - OPT_BOOLEAN(0, "overwrite", , "use
> overwrite mode"),
> + OPT_BOOLEAN(0, "flight-recoder", _recorder,
> "use flight recoder mode"),
>   OPT_UINTEGER('F', "freq", _freq, "profile at this
> frequency"),
>   OPT_CALLBACK('m', "mmap-pages", , "pages[,pages]",
>"number of mmap data pages and AUX area tracing mmap
> pages",
> @@ -1733,7 +1733,7 @@ int cmd_record(int argc, const char **argv)
>   }
>   }
> 
> - if (record.opts.overwrite)
> + if (record.opts.flight_recorder)
>   record.opts.tail_synthesize = true;
> 
>   if (rec->evlist->nr_entries == 0 &&
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index fbb0a9c..a7f7618 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -57,7 +57,7 @@ struct record_opts {
>   bool all_kernel;
>   bool all_user;
>   bool tail_synthesize;
> - bool overwrite;
> + bool flight_recorder;
>   bool ignore_missing_thread;
>   unsigned int freq;
>   unsigned int mmap_pages;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f894893..0e1e8e8 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -772,8 +772,8 @@ static void apply_config_terms(struct perf_evsel
> *evsel,
>*/
>   attr->inherit = term->val.inherit ? 1 : 0;
>   break;
> - case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
> - attr->write_backward = term->val.overwrite ? 1 : 0;
> + case PERF_EVSEL__CONFIG_TERM_FLIGHTRECORDER:
> +

RE: [PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-11-01 Thread Liang, Kan
> The meaning of perf record's "overwrite" option and many "overwrite" in
> source code are not clear. In perf's code, the 'overwrite' has 2 meanings:
>  1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
>  2. Set evsel's "backward" attribute (in apply_config_terms).
> 
> perf record doesn't use meaning 1 at all, but have a overwrite option, its
> real meaning is setting backward.
>

I don't understand here.
'overwrite' has 2 meanings. perf record only support 1.
It should be a bug, and need to be fixed.
Why do we need a new mode? 

I think a new mode just make the codes more complex.

Thanks,
Kan

> This patch separates these two concepts, introduce 'flightrecorder' mode
> which is what we really want. It combines these 2 concept together, wraps
> them into a record mode. In flight recorder mode, perf only dumps data
> before
> something happen.
> 
> Signed-off-by: Wang Nan 
> ---
>  tools/perf/Documentation/perf-record.txt |  8 
>  tools/perf/builtin-record.c  |  4 ++--
>  tools/perf/perf.h|  2 +-
>  tools/perf/util/evsel.c  |  6 +++---
>  tools/perf/util/evsel.h  |  4 ++--
>  tools/perf/util/parse-events.c   | 20 ++--
>  tools/perf/util/parse-events.h   |  4 ++--
>  tools/perf/util/parse-events.l   |  4 ++--
>  8 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt
> b/tools/perf/Documentation/perf-record.txt
> index 5a626ef..463c2d3 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -467,19 +467,19 @@ the beginning of record, collect them during
> finalizing an output file.
>  The collected non-sample events reflects the status of the system when
>  record is finished.
> 
> ---overwrite::
> +--flight-recorder::
>  Makes all events use an overwritable ring buffer. An overwritable ring
>  buffer works like a flight recorder: when it gets full, the kernel will
>  overwrite the oldest records, that thus will never make it to the
>  perf.data file.
> 
> -When '--overwrite' and '--switch-output' are used perf records and drops
> +When '--flight-recorder' and '--switch-output' are used perf records and
> drops
>  events until it receives a signal, meaning that something unusual was
>  detected that warrants taking a snapshot of the most current events,
>  those fitting in the ring buffer at that moment.
> 
> -'overwrite' attribute can also be set or canceled for an event using
> -config terms. For example: 'cycles/overwrite/' and 'instructions/no-
> overwrite/'.
> +'flightrecorder' attribute can also be set or canceled separately for an 
> event
> using
> +config terms. For example: 'cycles/flightrecorder/' and 'instructions/no-
> flightrecorder/'.
> 
>  Implies --tail-synthesize.
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc5..315ea09 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1489,7 +1489,7 @@ static struct option __record_options[] = {
>   "child tasks do not inherit counters"),
>   OPT_BOOLEAN(0, "tail-synthesize", _synthesize,
>   "synthesize non-sample events at the end of output"),
> - OPT_BOOLEAN(0, "overwrite", , "use
> overwrite mode"),
> + OPT_BOOLEAN(0, "flight-recoder", _recorder,
> "use flight recoder mode"),
>   OPT_UINTEGER('F', "freq", _freq, "profile at this
> frequency"),
>   OPT_CALLBACK('m', "mmap-pages", , "pages[,pages]",
>"number of mmap data pages and AUX area tracing mmap
> pages",
> @@ -1733,7 +1733,7 @@ int cmd_record(int argc, const char **argv)
>   }
>   }
> 
> - if (record.opts.overwrite)
> + if (record.opts.flight_recorder)
>   record.opts.tail_synthesize = true;
> 
>   if (rec->evlist->nr_entries == 0 &&
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index fbb0a9c..a7f7618 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -57,7 +57,7 @@ struct record_opts {
>   bool all_kernel;
>   bool all_user;
>   bool tail_synthesize;
> - bool overwrite;
> + bool flight_recorder;
>   bool ignore_missing_thread;
>   unsigned int freq;
>   unsigned int mmap_pages;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f894893..0e1e8e8 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -772,8 +772,8 @@ static void apply_config_terms(struct perf_evsel
> *evsel,
>*/
>   attr->inherit = term->val.inherit ? 1 : 0;
>   break;
> - case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
> - attr->write_backward = term->val.overwrite ? 1 : 0;
> + case PERF_EVSEL__CONFIG_TERM_FLIGHTRECORDER:
> + 

RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-24 Thread Liang, Kan
> On Tue, Oct 24, 2017 at 11:22:00AM +0200, Ingo Molnar wrote:
> >
> > * Liang, Kan <kan.li...@intel.com> wrote:
> >
> > > For 'all', do you mean the whole process?
> >
> > Yeah.
> >
> > > I think that's the ultimate goal.  Eventually there will be per-CPU
> > > recording threads created at the beginning of perf record and go through
> the whole process.
> > > The plan is to do the multithreading step by step from the simplest case.
> > > Synthesizing stage is just a start.
> >
> > So, why not do it like the kernel did: add all the threads, create the
> > percpu files, and introduce a 'big perf lock' (big mutex) that is
> > taken for all the current non-threaded perf functionality. This should
> > be fairly straightforward to do and should be 'obviously correct'.
> >
> > _Then_ start doing the hard threading work on top of this, like
> > threading the synthesizing phase.
> >
> > Doing the whole per CPU thread setup/teardown for just the
> > synthesizing part of it looks like the wrong design.
> >
> > I.e. what I'm suggesting is no extra threading work, just organizing
> > it in a different fashion and increasing the life-time of the per CPU
> > threads from 'perf startup' to 'perf shutdown'.
> 
> I recently made some changes on threaded record, which are based on
> Namhyungs time* API, which is needed to read/sort the data afterwards
> 
> but I wasn't able to get any substantial and constant reduce of LOST events
> and then I got sidetracked and did not finish, but it's in here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/data
> 
> I'll try to rebase and send it out for comments
> 

I think I will wait for your patches, and rebase this series. :) 

Thanks,
Kan



RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-24 Thread Liang, Kan
> On Tue, Oct 24, 2017 at 11:22:00AM +0200, Ingo Molnar wrote:
> >
> > * Liang, Kan  wrote:
> >
> > > For 'all', do you mean the whole process?
> >
> > Yeah.
> >
> > > I think that's the ultimate goal.  Eventually there will be per-CPU
> > > recording threads created at the beginning of perf record and go through
> the whole process.
> > > The plan is to do the multithreading step by step from the simplest case.
> > > Synthesizing stage is just a start.
> >
> > So, why not do it like the kernel did: add all the threads, create the
> > percpu files, and introduce a 'big perf lock' (big mutex) that is
> > taken for all the current non-threaded perf functionality. This should
> > be fairly straightforward to do and should be 'obviously correct'.
> >
> > _Then_ start doing the hard threading work on top of this, like
> > threading the synthesizing phase.
> >
> > Doing the whole per CPU thread setup/teardown for just the
> > synthesizing part of it looks like the wrong design.
> >
> > I.e. what I'm suggesting is no extra threading work, just organizing
> > it in a different fashion and increasing the life-time of the per CPU
> > threads from 'perf startup' to 'perf shutdown'.
> 
> I recently made some changes on threaded record, which are based on
> Namhyungs time* API, which is needed to read/sort the data afterwards
> 
> but I wasn't able to get any substantial and constant reduce of LOST events
> and then I got sidetracked and did not finish, but it's in here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/data
> 
> I'll try to rebase and send it out for comments
> 

I think I will wait for your patches, and rebase this series. :) 

Thanks,
Kan



RE: [PATCH] perf vendor events: Add Goldmont Plus V1 event file

2017-10-23 Thread Liang, Kan
> Em Thu, Oct 19, 2017 at 01:37:55PM -0700, Andi Kleen escreveu:
> > On Thu, Oct 19, 2017 at 04:58:33PM -0300, Arnaldo Carvalho de Melo
> wrote:
> > > Em Wed, Oct 18, 2017 at 06:05:07AM -0700, kan.li...@intel.com
> escreveu:
> > > > From: Kan Liang 
> > > >
> > > > Add a Intel event file for perf.
> > >
> > > Andi, can I have your Acked-by or reviewed-by?
> >
> > Acked-by: Andi Kleen 
> 
> Thanks, applied.
> 
Hi Arnaldo,

Will this patch be pushed to perf/urgent for 4.14?

Thanks,
Kan



RE: [PATCH] perf vendor events: Add Goldmont Plus V1 event file

2017-10-23 Thread Liang, Kan
> Em Thu, Oct 19, 2017 at 01:37:55PM -0700, Andi Kleen escreveu:
> > On Thu, Oct 19, 2017 at 04:58:33PM -0300, Arnaldo Carvalho de Melo
> wrote:
> > > Em Wed, Oct 18, 2017 at 06:05:07AM -0700, kan.li...@intel.com
> escreveu:
> > > > From: Kan Liang 
> > > >
> > > > Add a Intel event file for perf.
> > >
> > > Andi, can I have your Acked-by or reviewed-by?
> >
> > Acked-by: Andi Kleen 
> 
> Thanks, applied.
> 
Hi Arnaldo,

Will this patch be pushed to perf/urgent for 4.14?

Thanks,
Kan



RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-23 Thread Liang, Kan
> Em Mon, Oct 23, 2017 at 01:43:39PM +0000, Liang, Kan escreveu:
> > The plan is to do the multithreading step by step from the simplest case.
> > Synthesizing stage is just a start.
> 
> > Only for synthesizing stage, I think the patch series should already
> > cover all the 'synthesizing' steps which can do multithreading. For
> > the rest 'synthesizing' steps, it only need to be done by single thread.
> 
> > Since there is only multithreading for 'synthesizing' step, the
> > threads creation related code is event.c for now. It's better to move
> > it to a dedicate file and make it generic for recording threads. I think we 
> > can
> do it later separately.
> 
> Yes, and I'm happy that you plan to continue working in this area, right? :-)
> 

After reviewing my schedule, I'm afraid that I will not have the bandwidth
in this quarter to continue to support full multithreading.
I still have some other high priority items not finished yet.
I may revisit it after that. I'm sorry about that.

Thanks,
Kan 



RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-23 Thread Liang, Kan
> Em Mon, Oct 23, 2017 at 01:43:39PM +0000, Liang, Kan escreveu:
> > The plan is to do the multithreading step by step from the simplest case.
> > Synthesizing stage is just a start.
> 
> > Only for synthesizing stage, I think the patch series should already
> > cover all the 'synthesizing' steps which can do multithreading. For
> > the rest 'synthesizing' steps, it only need to be done by single thread.
> 
> > Since there is only multithreading for 'synthesizing' step, the
> > threads creation related code is event.c for now. It's better to move
> > it to a dedicate file and make it generic for recording threads. I think we 
> > can
> do it later separately.
> 
> Yes, and I'm happy that you plan to continue working in this area, right? :-)
> 

After reviewing my schedule, I'm afraid that I will not have the bandwidth
in this quarter to continue to support full multithreading.
I still have some other high priority items not finished yet.
I may revisit it after that. I'm sorry about that.

Thanks,
Kan 



RE: [PATCH V3 4/6] perf tools: add perf_data_file__open_tmp

2017-10-23 Thread Liang, Kan
> SNIP
> 
> > > >  ssize_t perf_data_file__write(struct perf_data_file *file, diff
> > > > --git a/tools/perf/util/data.h b/tools/perf/util/data.h index
> > > > ae510ce..892b3d5 100644
> > > > --- a/tools/perf/util/data.h
> > > > +++ b/tools/perf/util/data.h
> > > > @@ -10,6 +10,7 @@ enum perf_data_mode {
> > > >
> > > >  struct perf_data_file {
> > > > const char  *path;
> > > > +   char*tmp_path;
> > >
> > > could we add is_tmp instead of new path pointer and keep the path
> > > for the name..?
> >
> > The 'path' is const char. I think it's not good for tmp file which
> > generate the file name in real time.
> 
> then change path to 'char *' ? I just dont think having
> 2 name pointers for path will keep this simple
>

I tried, but it will impact almost all the perf tools.
The input/output file name is const char*, which means that the file
name should not be change.
I just don't think it's right to cast it to char *.

How about introducing a new dedicated struct for tmp files only?
struct perf_data_tmp_file {
chat *tmp_patch
int fd
}

And three interfaces to open, write and close.
perf_data_file__open_tmp()
perf_data_file__write_tmp()
perf_data_file__close_tmp()

Thanks,
Kan


RE: [PATCH V3 4/6] perf tools: add perf_data_file__open_tmp

2017-10-23 Thread Liang, Kan
> SNIP
> 
> > > >  ssize_t perf_data_file__write(struct perf_data_file *file, diff
> > > > --git a/tools/perf/util/data.h b/tools/perf/util/data.h index
> > > > ae510ce..892b3d5 100644
> > > > --- a/tools/perf/util/data.h
> > > > +++ b/tools/perf/util/data.h
> > > > @@ -10,6 +10,7 @@ enum perf_data_mode {
> > > >
> > > >  struct perf_data_file {
> > > > const char  *path;
> > > > +   char*tmp_path;
> > >
> > > could we add is_tmp instead of new path pointer and keep the path
> > > for the name..?
> >
> > The 'path' is const char. I think it's not good for tmp file which
> > generate the file name in real time.
> 
> then change path to 'char *' ? I just dont think having
> 2 name pointers for path will keep this simple
>

I tried, but it will impact almost all the perf tools.
The input/output file name is const char*, which means that the file
name should not be change.
I just don't think it's right to cast it to char *.

How about introducing a new dedicated struct for tmp files only?
struct perf_data_tmp_file {
chat *tmp_patch
int fd
}

And three interfaces to open, write and close.
perf_data_file__open_tmp()
perf_data_file__write_tmp()
perf_data_file__close_tmp()

Thanks,
Kan


RE: [PATCH V3 4/6] perf tools: add perf_data_file__open_tmp

2017-10-23 Thread Liang, Kan
> > From: Kan Liang 
> >
> > And an interface for perf_data_file to open tmp file.
> > Automatically generate the tmp file name if it's not assigned. The
> > name cannot be const char. Introduce char *tmp_path for it.
> > The tmp file will be deleted after close.
> >
> > It will be used as per-thread file to temporarily store event
> > synthesizd result in the following patch.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/data.c | 26 ++
> > tools/perf/util/data.h |  2 ++
> >  2 files changed, 28 insertions(+)
> >
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index
> > 1123b30..cd6fdf9 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -139,9 +139,35 @@ int perf_data_file__open(struct perf_data_file
> *file)
> > return open_file(file);
> >  }
> >
> > +int perf_data_file__open_tmp(struct perf_data_file *file) {
> > +   int fd;
> > +
> > +   if (!file->tmp_path &&
> > +   (asprintf(>tmp_path, "perf.tmp.XX") < 0))
> > +   goto err;
> > +
> > +   fd = mkstemp(file->tmp_path);
> > +   if (fd < 0) {
> > +   free(file->tmp_path);
> > +   goto err;
> > +   }
> > +
> > +   file->fd = fd;
> > +   return 0;
> > +err:
> > +   file->tmp_path = NULL;
> > +   return -1;
> > +}
> > +
> >  void perf_data_file__close(struct perf_data_file *file)  {
> > close(file->fd);
> > +   if (file->tmp_path) {
> > +   unlink(file->tmp_path);
> > +   free(file->tmp_path);
> > +   file->tmp_path = NULL;
> > +   }
> >  }
> >
> >  ssize_t perf_data_file__write(struct perf_data_file *file, diff --git
> > a/tools/perf/util/data.h b/tools/perf/util/data.h index
> > ae510ce..892b3d5 100644
> > --- a/tools/perf/util/data.h
> > +++ b/tools/perf/util/data.h
> > @@ -10,6 +10,7 @@ enum perf_data_mode {
> >
> >  struct perf_data_file {
> > const char  *path;
> > +   char*tmp_path;
> 
> could we add is_tmp instead of new path pointer and keep the path for the
> name..?

The 'path' is const char. I think it's not good for tmp file which
generate the file name in real time.

Thanks,
Kan


RE: [PATCH V3 4/6] perf tools: add perf_data_file__open_tmp

2017-10-23 Thread Liang, Kan
> > From: Kan Liang 
> >
> > And an interface for perf_data_file to open tmp file.
> > Automatically generate the tmp file name if it's not assigned. The
> > name cannot be const char. Introduce char *tmp_path for it.
> > The tmp file will be deleted after close.
> >
> > It will be used as per-thread file to temporarily store event
> > synthesizd result in the following patch.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/data.c | 26 ++
> > tools/perf/util/data.h |  2 ++
> >  2 files changed, 28 insertions(+)
> >
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index
> > 1123b30..cd6fdf9 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -139,9 +139,35 @@ int perf_data_file__open(struct perf_data_file
> *file)
> > return open_file(file);
> >  }
> >
> > +int perf_data_file__open_tmp(struct perf_data_file *file) {
> > +   int fd;
> > +
> > +   if (!file->tmp_path &&
> > +   (asprintf(>tmp_path, "perf.tmp.XX") < 0))
> > +   goto err;
> > +
> > +   fd = mkstemp(file->tmp_path);
> > +   if (fd < 0) {
> > +   free(file->tmp_path);
> > +   goto err;
> > +   }
> > +
> > +   file->fd = fd;
> > +   return 0;
> > +err:
> > +   file->tmp_path = NULL;
> > +   return -1;
> > +}
> > +
> >  void perf_data_file__close(struct perf_data_file *file)  {
> > close(file->fd);
> > +   if (file->tmp_path) {
> > +   unlink(file->tmp_path);
> > +   free(file->tmp_path);
> > +   file->tmp_path = NULL;
> > +   }
> >  }
> >
> >  ssize_t perf_data_file__write(struct perf_data_file *file, diff --git
> > a/tools/perf/util/data.h b/tools/perf/util/data.h index
> > ae510ce..892b3d5 100644
> > --- a/tools/perf/util/data.h
> > +++ b/tools/perf/util/data.h
> > @@ -10,6 +10,7 @@ enum perf_data_mode {
> >
> >  struct perf_data_file {
> > const char  *path;
> > +   char*tmp_path;
> 
> could we add is_tmp instead of new path pointer and keep the path for the
> name..?

The 'path' is const char. I think it's not good for tmp file which
generate the file name in real time.

Thanks,
Kan


RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-23 Thread Liang, Kan
> * kan.li...@intel.com  wrote:
> 
> > The latency is the time cost of __machine__synthesize_threads or its
> > multithreading replacement, record__multithread_synthesize.
> >
> > Original:  original single thread synthesize
> > With patch(not merge): multithread synthesize without final file merge
> >(intermediate results for scalability measurement)
> > With patch(merge): multithread synthesize with file merge
> >(final result)
> >
> > - Latency on Knights Mill (272 CPUs)
> >
> > Original(s) With patch(not merge)(s)With patch(merge)(s)
> > 12.76.6 7.76
> >
> > - Latency on Skylake server (192 CPUs)
> >
> > Original(s) With patch(not merge)(s)With patch(merge)(s)
> > 0.340.210.23
> 
> Ok, I think I mis-understood some aspects of the patch series.
> 
> It multi-threads a certain stage of processing (synthesizing), but not the
> _whole_ process of recording events, right?

Right.

> 
> So I'm wondering, in the context of 'perf record -a' and 'perf top' CPU-
> granular profiling at least (but maybe also in the context of inherited
> workload 'perf record' profiling), could we simply record with per-CPU
> recording threads created early on, which would record into the percpu files
> quite naturally, which would also offer natural multithreading of any
> 'synthesizing' steps later on?
> 
> I.e. instead of multithreading perf record piecemeal wise, why not
> multithread it all - and win big in terms of scalable, low overhead profiling?
>

For 'all', do you mean the whole process?
I think that's the ultimate goal.  Eventually there will be per-CPU recording
threads created at the beginning of perf record and go through the whole 
process.
The plan is to do the multithreading step by step from the simplest case.
Synthesizing stage is just a start.

Only for synthesizing stage, I think the patch series should already cover all 
the
'synthesizing' steps which can do multithreading. For the rest 'synthesizing' 
steps,
it only need to be done by single thread.

Since there is only multithreading for 'synthesizing' step, the threads 
creation related
code is event.c for now. It's better to move it to a dedicate file and make it 
generic for
recording threads. I think we can do it later separately. 


Thanks,
Kan


RE: [PATCH V3 0/6] event synthesization multithreading for perf record

2017-10-23 Thread Liang, Kan
> * kan.li...@intel.com  wrote:
> 
> > The latency is the time cost of __machine__synthesize_threads or its
> > multithreading replacement, record__multithread_synthesize.
> >
> > Original:  original single thread synthesize
> > With patch(not merge): multithread synthesize without final file merge
> >(intermediate results for scalability measurement)
> > With patch(merge): multithread synthesize with file merge
> >(final result)
> >
> > - Latency on Knights Mill (272 CPUs)
> >
> > Original(s) With patch(not merge)(s)With patch(merge)(s)
> > 12.76.6 7.76
> >
> > - Latency on Skylake server (192 CPUs)
> >
> > Original(s) With patch(not merge)(s)With patch(merge)(s)
> > 0.340.210.23
> 
> Ok, I think I mis-understood some aspects of the patch series.
> 
> It multi-threads a certain stage of processing (synthesizing), but not the
> _whole_ process of recording events, right?

Right.

> 
> So I'm wondering, in the context of 'perf record -a' and 'perf top' CPU-
> granular profiling at least (but maybe also in the context of inherited
> workload 'perf record' profiling), could we simply record with per-CPU
> recording threads created early on, which would record into the percpu files
> quite naturally, which would also offer natural multithreading of any
> 'synthesizing' steps later on?
> 
> I.e. instead of multithreading perf record piecemeal wise, why not
> multithread it all - and win big in terms of scalable, low overhead profiling?
>

For 'all', do you mean the whole process?
I think that's the ultimate goal.  Eventually there will be per-CPU recording
threads created at the beginning of perf record and go through the whole 
process.
The plan is to do the multithreading step by step from the simplest case.
Synthesizing stage is just a start.

Only for synthesizing stage, I think the patch series should already cover all 
the
'synthesizing' steps which can do multithreading. For the rest 'synthesizing' 
steps,
it only need to be done by single thread.

Since there is only multithreading for 'synthesizing' step, the threads 
creation related
code is event.c for now. It's better to move it to a dedicate file and make it 
generic for
recording threads. I think we can do it later separately. 


Thanks,
Kan


RE: [PATCH V2 0/5] event synthesization multithreading for perf record

2017-10-20 Thread Liang, Kan
> 
> * kan.li...@intel.com  wrote:
> 
> > From: Kan Liang 
> >
> > The event synthesization multithreading is introduced in ("perf top
> > optimization") https://lkml.org/lkml/2017/9/29/269
> > But it was not enabled for perf record. Because the process function
> > process_synthesized_event was not multithreading friendly.
> >
> > The patch series temporarily stores the process result in per-thread
> > file, which make the processing in parallel. Then it dumps the file
> > one by one to the perf.data at the end of event synthesization.
> >
> > The source code is also available at
> > https://github.com/kliang2/perf.git perf_record_opt
> >
> > Usually, the event synthesization only happens once on either start or end.
> > With the snapshotting code, we synthesize events multiple times, once
> > per each new perf.data file. Both of the cases are verified.
> >
> > Here are the latency test result on Knights Mill and Skylake server
> >
> > The workload is to compile Linux kernel as below "sudo nice make
> > -j$(grep -c '^processor' /proc/cpuinfo)"
> > Then, "sudo perf record -e cycles -a -- sleep 1"
> >
> > The latency is the time cost of __machine__synthesize_threads or its
> > multithreading replacement, record__multithread_synthesize.
> >
> > - Latency on Knights Mill (272 CPUs)
> >
> > Original(s) With patch(s)   Speedup
> > 12.74   5.542.3X
> >
> > - Latency on Skylake server (192 CPUs)
> >
> > Original(s) With patch(s)   Speedup
> > 0.360.251.47X
> 
> Btw., just as an interesting experiment, could you try to measure how it
> performs to create just the per-CPU files, and *not* dump them into a single
> file?
>

Sure, please find the experiment result in the cover letter of V3 patch series.

Thanks,
Kan
 
> I.e. how much faster will it get if the serialization at the end is avoided?
> 
> Of course nothing can read such per-CPU files yet, so this is just for 
> scalability
> measurement.
> 
> Thanks,
> 
>   Ingo


RE: [PATCH V2 0/5] event synthesization multithreading for perf record

2017-10-20 Thread Liang, Kan
> 
> * kan.li...@intel.com  wrote:
> 
> > From: Kan Liang 
> >
> > The event synthesization multithreading is introduced in ("perf top
> > optimization") https://lkml.org/lkml/2017/9/29/269
> > But it was not enabled for perf record. Because the process function
> > process_synthesized_event was not multithreading friendly.
> >
> > The patch series temporarily stores the process result in per-thread
> > file, which make the processing in parallel. Then it dumps the file
> > one by one to the perf.data at the end of event synthesization.
> >
> > The source code is also available at
> > https://github.com/kliang2/perf.git perf_record_opt
> >
> > Usually, the event synthesization only happens once on either start or end.
> > With the snapshotting code, we synthesize events multiple times, once
> > per each new perf.data file. Both of the cases are verified.
> >
> > Here are the latency test result on Knights Mill and Skylake server
> >
> > The workload is to compile Linux kernel as below "sudo nice make
> > -j$(grep -c '^processor' /proc/cpuinfo)"
> > Then, "sudo perf record -e cycles -a -- sleep 1"
> >
> > The latency is the time cost of __machine__synthesize_threads or its
> > multithreading replacement, record__multithread_synthesize.
> >
> > - Latency on Knights Mill (272 CPUs)
> >
> > Original(s) With patch(s)   Speedup
> > 12.74   5.542.3X
> >
> > - Latency on Skylake server (192 CPUs)
> >
> > Original(s) With patch(s)   Speedup
> > 0.360.251.47X
> 
> Btw., just as an interesting experiment, could you try to measure how it
> performs to create just the per-CPU files, and *not* dump them into a single
> file?
>

Sure, please find the experiment result in the cover letter of V3 patch series.

Thanks,
Kan
 
> I.e. how much faster will it get if the serialization at the end is avoided?
> 
> Of course nothing can read such per-CPU files yet, so this is just for 
> scalability
> measurement.
> 
> Thanks,
> 
>   Ingo


RE: [PATCH V2 1/4] perf/x86/intel/uncore: use same idx for clinet IMC uncore events

2017-10-20 Thread Liang, Kan
> > To specially handle it, event->hw.idx >= UNCORE_PMC_IDX_FIXED is used
> to
> > check fixed counters in the generic uncore_perf_event_update.
> > It does not have problem in current code.
> 
> I disagree. While it has no functional problem, it's a obscure hack which
> means it is a code quality problem.
> 
> > Because there are no counters whose idx is larger than fixed
> > counters. However, it will have problem if new counter type is introduced
> > in generic code. For example, freerunning counters.
> >
> > Actually, the 'fixed counters' in the clinet IMC uncore is not
> > traditional fixed counter. They are freerunning counters, which don't
> > need the idx to indicate which counter is assigned. They also have same
> > bits wide. So it's OK to let them use the same idx. event_base is good
> 
> s/wide/width/
> 
> > enough to select the proper freerunning counter.
> 
> So why are they named fixed counters in the first place? If they are not
> fixed, but freerunning then please clean that up as well.
>

Sure, I will clean it up and make it part of the new free running 
infrastructure.
I will also modify all changelog according to your comments.

Thank you for the detailed review and your patience.
Kan

 
> I pointed out to you that this is crap. So please don't try to justify this
> crap. Just fix it up.
> 
> > There is no traditional fixed counter in clinet IMC uncore. Let them use
> > the same idx as fixed event for clinet IMC uncore events.
> 
> I have no idea what's traditional about counters, but that's a nit pick.
> 
> > The following patch will remove the special codes in generic
> > uncore_perf_event_update.
> 
> I told you more than once, that 'The ... patch', 'This patch' is not part
> of a proper changelog.
> 
> See Documentation/process/submitting-patches.rst:
> 
> Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour.
> 
> along with the rest of the document.
> 
> Thanks,
> 
>   tglx


RE: [PATCH V2 1/4] perf/x86/intel/uncore: use same idx for clinet IMC uncore events

2017-10-20 Thread Liang, Kan
> > To specially handle it, event->hw.idx >= UNCORE_PMC_IDX_FIXED is used
> to
> > check fixed counters in the generic uncore_perf_event_update.
> > It does not have problem in current code.
> 
> I disagree. While it has no functional problem, it's a obscure hack which
> means it is a code quality problem.
> 
> > Because there are no counters whose idx is larger than fixed
> > counters. However, it will have problem if new counter type is introduced
> > in generic code. For example, freerunning counters.
> >
> > Actually, the 'fixed counters' in the clinet IMC uncore is not
> > traditional fixed counter. They are freerunning counters, which don't
> > need the idx to indicate which counter is assigned. They also have same
> > bits wide. So it's OK to let them use the same idx. event_base is good
> 
> s/wide/width/
> 
> > enough to select the proper freerunning counter.
> 
> So why are they named fixed counters in the first place? If they are not
> fixed, but freerunning then please clean that up as well.
>

Sure, I will clean it up and make it part of the new free running 
infrastructure.
I will also modify all changelog according to your comments.

Thank you for the detailed review and your patience.
Kan

 
> I pointed out to you that this is crap. So please don't try to justify this
> crap. Just fix it up.
> 
> > There is no traditional fixed counter in clinet IMC uncore. Let them use
> > the same idx as fixed event for clinet IMC uncore events.
> 
> I have no idea what's traditional about counters, but that's a nit pick.
> 
> > The following patch will remove the special codes in generic
> > uncore_perf_event_update.
> 
> I told you more than once, that 'The ... patch', 'This patch' is not part
> of a proper changelog.
> 
> See Documentation/process/submitting-patches.rst:
> 
> Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour.
> 
> along with the rest of the document.
> 
> Thanks,
> 
>   tglx


RE: [PATCH V2 4/5] perf record: synthesize event multithreading support

2017-10-18 Thread Liang, Kan
> On Wed, Oct 18, 2017 at 07:29:32AM -0700, kan.li...@intel.com wrote:
> 
> SNIP
> 
> > +   rec->synthesized_file = calloc(nr_thread, sizeof(struct
> perf_data_file));
> > +   if (rec->synthesized_file == NULL) {
> > +   pr_debug("Could not do multithread synthesize."
> > +"Roll back to single thread\n");
> > +   nr_thread = 1;
> > +   } else {
> > +   perf_set_multithreaded();
> > +   for (i = 0; i < nr_thread; i++) {
> > +   snprintf(name, sizeof(name), "%s.%d",
> > +SYNTHESIZED_PATH, i);
> 
> hum, I think we want some uniq temp names in here..

Any suggestions for the name?
perf.tmp.X?

> 
> > +   rec->synthesized_file[i].path = name;
> > +   err = perf_data_file__open(>synthesized_file[i]);
> > +   if (err) {
> > +   pr_err("Failed to open file %s\n",
> > +  rec->synthesized_file[i].path);
> > +   goto free;
> > +   }
> > +   }
> > +   }
> > +
> > +   err = __machine__synthesize_threads(machine, tool, >target,
> > +   rec->evlist->threads,
> > +   process_synthesized_event,
> > +   opts->sample_address,
> > +   opts->proc_map_timeout,
> nr_thread);
> > +   if (err < 0)
> > +   goto free;
> > +
> > +   if (nr_thread > 1) {
> > +   int fd_from, fd_to;
> > +
> > +   fd_to = rec->session->file->fd;
> > +   for (i = 0; i < nr_thread; i++) {
> > +   fd_from = rec->synthesized_file[i].fd;
> > +
> > +   fstat(fd_from, );
> > +   if (st.st_size == 0)
> > +   continue;
> > +   err = copyfile_offset(fd_from, 0, fd_to,
> > + lseek(fd_to, 0, SEEK_END),
> > + st.st_size);
> > +   update_bytes_written(rec, st.st_size);
> > +   }
> > +   }
> > +
> > +free:
> > +   if (nr_thread > 1) {
> > +   for (i = 0; i < nr_thread; i++) {
> > +   if (rec->synthesized_file[i].fd > 0)
> > +   perf_data_file__close(
> >synthesized_file[i]);
> 
> also those files should be removed

Sure.
I will modify the perf_data_file__close(file, bool remove) to do it.

Thanks,
Kan



RE: [PATCH V2 4/5] perf record: synthesize event multithreading support

2017-10-18 Thread Liang, Kan
> On Wed, Oct 18, 2017 at 07:29:32AM -0700, kan.li...@intel.com wrote:
> 
> SNIP
> 
> > +   rec->synthesized_file = calloc(nr_thread, sizeof(struct
> perf_data_file));
> > +   if (rec->synthesized_file == NULL) {
> > +   pr_debug("Could not do multithread synthesize."
> > +"Roll back to single thread\n");
> > +   nr_thread = 1;
> > +   } else {
> > +   perf_set_multithreaded();
> > +   for (i = 0; i < nr_thread; i++) {
> > +   snprintf(name, sizeof(name), "%s.%d",
> > +SYNTHESIZED_PATH, i);
> 
> hum, I think we want some uniq temp names in here..

Any suggestions for the name?
perf.tmp.X?

> 
> > +   rec->synthesized_file[i].path = name;
> > +   err = perf_data_file__open(>synthesized_file[i]);
> > +   if (err) {
> > +   pr_err("Failed to open file %s\n",
> > +  rec->synthesized_file[i].path);
> > +   goto free;
> > +   }
> > +   }
> > +   }
> > +
> > +   err = __machine__synthesize_threads(machine, tool, >target,
> > +   rec->evlist->threads,
> > +   process_synthesized_event,
> > +   opts->sample_address,
> > +   opts->proc_map_timeout,
> nr_thread);
> > +   if (err < 0)
> > +   goto free;
> > +
> > +   if (nr_thread > 1) {
> > +   int fd_from, fd_to;
> > +
> > +   fd_to = rec->session->file->fd;
> > +   for (i = 0; i < nr_thread; i++) {
> > +   fd_from = rec->synthesized_file[i].fd;
> > +
> > +   fstat(fd_from, );
> > +   if (st.st_size == 0)
> > +   continue;
> > +   err = copyfile_offset(fd_from, 0, fd_to,
> > + lseek(fd_to, 0, SEEK_END),
> > + st.st_size);
> > +   update_bytes_written(rec, st.st_size);
> > +   }
> > +   }
> > +
> > +free:
> > +   if (nr_thread > 1) {
> > +   for (i = 0; i < nr_thread; i++) {
> > +   if (rec->synthesized_file[i].fd > 0)
> > +   perf_data_file__close(
> >synthesized_file[i]);
> 
> also those files should be removed

Sure.
I will modify the perf_data_file__close(file, bool remove) to do it.

Thanks,
Kan



RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
 > >
> > Right, it doesn’t need load latency. 0x81d0 should be a better choice.
> > I will use 0x81d0 and 0x82d0 as default event for V2.
> 
> That's model specific. You would need to check the model number if you do
> that.
> 
> Also with modern perf you can use the correct event names of course.

The event names are model specific as well.
For Mem load events,
MEM_INST_RETIRED.ALL_LOADS is the event name for skylake.
MEM_UOPS_RETIRED.ALL_LOADS is the event name for the rest of platforms.
I think it still need to check the model number.

Thanks,
Kan


RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
 > >
> > Right, it doesn’t need load latency. 0x81d0 should be a better choice.
> > I will use 0x81d0 and 0x82d0 as default event for V2.
> 
> That's model specific. You would need to check the model number if you do
> that.
> 
> Also with modern perf you can use the correct event names of course.

The event names are model specific as well.
For Mem load events,
MEM_INST_RETIRED.ALL_LOADS is the event name for skylake.
MEM_UOPS_RETIRED.ALL_LOADS is the event name for the rest of platforms.
I think it still need to check the model number.

Thanks,
Kan


RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
> On Tue, Oct 17, 2017 at 12:54 PM, Liang, Kan <kan.li...@intel.com> wrote:
> >> On Mon, Oct 16, 2017 at 3:26 PM,  <kan.li...@intel.com> wrote:
> >> > From: Kan Liang <kan.li...@intel.com>
> >> >
> >> > There could be different types of memory in the system. E.g normal
> >> > System Memory, Persistent Memory. To understand how the workload
> >> maps to
> >> > those memories, it's important to know the I/O statistics on
> >> > different type of memorys. Perf can collect address maps with
> >> > physical addresses, but those are raw data. It still needs extra
> >> > work to resolve the physical addresses.
> >> > Providing a script to facilitate the physical addresses resolving
> >> > and I/O statistics.
> >> >
> >> > Profiling with mem-loads and mem-stores if they are available.
> >> > Looking up the physical address samples in /proc/iomem Providing
> >> > memory type summary
> >> >
> >> > Here is an example
> >> >  #perf script record mem-phys-addr -- pmem_test_kernel  [ perf
> >> > record: Woken up 32 times to write data ]  [ perf record: Captured
> >> > and wrote 7.797 MB perf.data (101995 samples) ]  #perf script
> >> > report mem-phys-addr Memory type summary
> >> >
> >> > Event: mem-loads
> >> > Memory typecount  percentage
> >> >   ---  ---
> >> > Persistent Memory  4374060.6%
> >> > System RAM 2717937.7%
> >> > N/A 1268 1.8%
> >> >
> >> > Event: mem-stores
> >> > Memory typecount  percentage
> >> >   ---  ---
> >> > System RAM 2450882.2%
> >> > N/A 514017.2%
> >> > Persistent Memory160 0.5%
> >> >
> >> > Signed-off-by: Kan Liang <kan.li...@intel.com>
> >> > ---
> >> >  tools/perf/scripts/python/bin/mem-phys-addr-record |  30 ++
> >> >  tools/perf/scripts/python/bin/mem-phys-addr-report |   3 +
> >> >  tools/perf/scripts/python/mem-phys-addr.py | 109
> >> +
> >> >  .../util/scripting-engines/trace-event-python.c|   2 +
> >> >  4 files changed, 144 insertions(+)  create mode 100644
> >> > tools/perf/scripts/python/bin/mem-phys-addr-
> >> record
> >> >  create mode 100644
> >> > tools/perf/scripts/python/bin/mem-phys-addr-report
> >> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >> >
> >> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> > new file mode 100644
> >> > index 000..395b256
> >> > --- /dev/null
> >> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> > @@ -0,0 +1,30 @@
> >> > +#!/bin/bash
> >> > +
> >> > +#
> >> > +# Profiling physical memory accesses #
> >> > +
> >> > +load=`perf list pmu | grep mem-loads` store=`perf list pmu | grep
> >> > +mem-stores` if [ -z "$load" ] && [ -z "$store" ] ; then
> >> > +   echo "There is no mem-loads or mem-stores support"
> >> > +   exit 1
> >> > +fi
> >> > +
> >> > +arg="-e"
> >> > +if [ ! -z "$store" ] ; then
> >> > +   arg="$arg mem-stores:P"
> >> > +fi
> >> > +
> >> > +if [ ! -z "$load" ] ; then
> >> > +   if [ ! -z "$store" ] ; then
> >> > +   arg="$arg,mem-loads:P"
> >> > +   else
> >> > +   arg="$arg mem-loads:P"
> >> > +   fi
> >> > +   arg="$arg -W"
> >> > +fi
> >> > +
> >> > +arg="$arg -d --phys-data"
> >> > +
> >> > +perf record $arg $@
> >> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> b/tools/perf/scr

RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
> On Tue, Oct 17, 2017 at 12:54 PM, Liang, Kan  wrote:
> >> On Mon, Oct 16, 2017 at 3:26 PM,   wrote:
> >> > From: Kan Liang 
> >> >
> >> > There could be different types of memory in the system. E.g normal
> >> > System Memory, Persistent Memory. To understand how the workload
> >> maps to
> >> > those memories, it's important to know the I/O statistics on
> >> > different type of memorys. Perf can collect address maps with
> >> > physical addresses, but those are raw data. It still needs extra
> >> > work to resolve the physical addresses.
> >> > Providing a script to facilitate the physical addresses resolving
> >> > and I/O statistics.
> >> >
> >> > Profiling with mem-loads and mem-stores if they are available.
> >> > Looking up the physical address samples in /proc/iomem Providing
> >> > memory type summary
> >> >
> >> > Here is an example
> >> >  #perf script record mem-phys-addr -- pmem_test_kernel  [ perf
> >> > record: Woken up 32 times to write data ]  [ perf record: Captured
> >> > and wrote 7.797 MB perf.data (101995 samples) ]  #perf script
> >> > report mem-phys-addr Memory type summary
> >> >
> >> > Event: mem-loads
> >> > Memory typecount  percentage
> >> >   ---  ---
> >> > Persistent Memory  4374060.6%
> >> > System RAM 2717937.7%
> >> > N/A 1268 1.8%
> >> >
> >> > Event: mem-stores
> >> > Memory typecount  percentage
> >> >   ---  ---
> >> > System RAM 2450882.2%
> >> > N/A 514017.2%
> >> > Persistent Memory160 0.5%
> >> >
> >> > Signed-off-by: Kan Liang 
> >> > ---
> >> >  tools/perf/scripts/python/bin/mem-phys-addr-record |  30 ++
> >> >  tools/perf/scripts/python/bin/mem-phys-addr-report |   3 +
> >> >  tools/perf/scripts/python/mem-phys-addr.py | 109
> >> +
> >> >  .../util/scripting-engines/trace-event-python.c|   2 +
> >> >  4 files changed, 144 insertions(+)  create mode 100644
> >> > tools/perf/scripts/python/bin/mem-phys-addr-
> >> record
> >> >  create mode 100644
> >> > tools/perf/scripts/python/bin/mem-phys-addr-report
> >> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >> >
> >> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> > new file mode 100644
> >> > index 000..395b256
> >> > --- /dev/null
> >> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> >> > @@ -0,0 +1,30 @@
> >> > +#!/bin/bash
> >> > +
> >> > +#
> >> > +# Profiling physical memory accesses #
> >> > +
> >> > +load=`perf list pmu | grep mem-loads` store=`perf list pmu | grep
> >> > +mem-stores` if [ -z "$load" ] && [ -z "$store" ] ; then
> >> > +   echo "There is no mem-loads or mem-stores support"
> >> > +   exit 1
> >> > +fi
> >> > +
> >> > +arg="-e"
> >> > +if [ ! -z "$store" ] ; then
> >> > +   arg="$arg mem-stores:P"
> >> > +fi
> >> > +
> >> > +if [ ! -z "$load" ] ; then
> >> > +   if [ ! -z "$store" ] ; then
> >> > +   arg="$arg,mem-loads:P"
> >> > +   else
> >> > +   arg="$arg mem-loads:P"
> >> > +   fi
> >> > +   arg="$arg -W"
> >> > +fi
> >> > +
> >> > +arg="$arg -d --phys-data"
> >> > +
> >> > +perf record $arg $@
> >> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> >> > new file mode 100644
> >> &g

RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
> On Mon, Oct 16, 2017 at 3:26 PM,   wrote:
> > From: Kan Liang 
> >
> > There could be different types of memory in the system. E.g normal
> > System Memory, Persistent Memory. To understand how the workload
> maps to
> > those memories, it's important to know the I/O statistics on different
> > type of memorys. Perf can collect address maps with physical addresses,
> > but those are raw data. It still needs extra work to resolve the
> > physical addresses.
> > Providing a script to facilitate the physical addresses resolving and
> > I/O statistics.
> >
> > Profiling with mem-loads and mem-stores if they are available.
> > Looking up the physical address samples in /proc/iomem
> > Providing memory type summary
> >
> > Here is an example
> >  #perf script record mem-phys-addr -- pmem_test_kernel
> >  [ perf record: Woken up 32 times to write data ]
> >  [ perf record: Captured and wrote 7.797 MB perf.data (101995 samples) ]
> >  #perf script report mem-phys-addr
> > Memory type summary
> >
> > Event: mem-loads
> > Memory typecount  percentage
> >   ---  ---
> > Persistent Memory  4374060.6%
> > System RAM 2717937.7%
> > N/A 1268 1.8%
> >
> > Event: mem-stores
> > Memory typecount  percentage
> >   ---  ---
> > System RAM 2450882.2%
> > N/A 514017.2%
> > Persistent Memory160 0.5%
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/scripts/python/bin/mem-phys-addr-record |  30 ++
> >  tools/perf/scripts/python/bin/mem-phys-addr-report |   3 +
> >  tools/perf/scripts/python/mem-phys-addr.py | 109
> +
> >  .../util/scripting-engines/trace-event-python.c|   2 +
> >  4 files changed, 144 insertions(+)
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > new file mode 100644
> > index 000..395b256
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > @@ -0,0 +1,30 @@
> > +#!/bin/bash
> > +
> > +#
> > +# Profiling physical memory accesses
> > +#
> > +
> > +load=`perf list pmu | grep mem-loads`
> > +store=`perf list pmu | grep mem-stores`
> > +if [ -z "$load" ] && [ -z "$store" ] ; then
> > +   echo "There is no mem-loads or mem-stores support"
> > +   exit 1
> > +fi
> > +
> > +arg="-e"
> > +if [ ! -z "$store" ] ; then
> > +   arg="$arg mem-stores:P"
> > +fi
> > +
> > +if [ ! -z "$load" ] ; then
> > +   if [ ! -z "$store" ] ; then
> > +   arg="$arg,mem-loads:P"
> > +   else
> > +   arg="$arg mem-loads:P"
> > +   fi
> > +   arg="$arg -W"
> > +fi
> > +
> > +arg="$arg -d --phys-data"
> > +
> > +perf record $arg $@
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > new file mode 100644
> > index 000..3f2b847
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > @@ -0,0 +1,3 @@
> > +#!/bin/bash
> > +# description: resolve physical address samples
> > +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> > diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> > new file mode 100644
> > index 000..73b3a63
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/mem-phys-addr.py
> > @@ -0,0 +1,109 @@
> > +# mem-phys-addr.py: Resolve physical address samples
> > +# Copyright (c) 2017, Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify it
> > +# under the terms and conditions of the GNU General Public License,
> > +# version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but WITHOUT
> > +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> > +# more details.
> > +
> > +from __future__ import division
> > +import os
> > +import sys
> > +import struct
> > +import re
> > +import bisect
> > +import collections
> > +
> > +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> > +   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> > +
> > +system_ram = []
> > +pmem = []
> > +f = None
> > +load_event = 

RE: [PATCH] perf script: add script to profile and resolve physical mem type

2017-10-17 Thread Liang, Kan
> On Mon, Oct 16, 2017 at 3:26 PM,   wrote:
> > From: Kan Liang 
> >
> > There could be different types of memory in the system. E.g normal
> > System Memory, Persistent Memory. To understand how the workload
> maps to
> > those memories, it's important to know the I/O statistics on different
> > type of memorys. Perf can collect address maps with physical addresses,
> > but those are raw data. It still needs extra work to resolve the
> > physical addresses.
> > Providing a script to facilitate the physical addresses resolving and
> > I/O statistics.
> >
> > Profiling with mem-loads and mem-stores if they are available.
> > Looking up the physical address samples in /proc/iomem
> > Providing memory type summary
> >
> > Here is an example
> >  #perf script record mem-phys-addr -- pmem_test_kernel
> >  [ perf record: Woken up 32 times to write data ]
> >  [ perf record: Captured and wrote 7.797 MB perf.data (101995 samples) ]
> >  #perf script report mem-phys-addr
> > Memory type summary
> >
> > Event: mem-loads
> > Memory typecount  percentage
> >   ---  ---
> > Persistent Memory  4374060.6%
> > System RAM 2717937.7%
> > N/A 1268 1.8%
> >
> > Event: mem-stores
> > Memory typecount  percentage
> >   ---  ---
> > System RAM 2450882.2%
> > N/A 514017.2%
> > Persistent Memory160 0.5%
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/scripts/python/bin/mem-phys-addr-record |  30 ++
> >  tools/perf/scripts/python/bin/mem-phys-addr-report |   3 +
> >  tools/perf/scripts/python/mem-phys-addr.py | 109
> +
> >  .../util/scripting-engines/trace-event-python.c|   2 +
> >  4 files changed, 144 insertions(+)
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-
> record
> >  create mode 100644 tools/perf/scripts/python/bin/mem-phys-addr-report
> >  create mode 100644 tools/perf/scripts/python/mem-phys-addr.py
> >
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-record
> b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > new file mode 100644
> > index 000..395b256
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-record
> > @@ -0,0 +1,30 @@
> > +#!/bin/bash
> > +
> > +#
> > +# Profiling physical memory accesses
> > +#
> > +
> > +load=`perf list pmu | grep mem-loads`
> > +store=`perf list pmu | grep mem-stores`
> > +if [ -z "$load" ] && [ -z "$store" ] ; then
> > +   echo "There is no mem-loads or mem-stores support"
> > +   exit 1
> > +fi
> > +
> > +arg="-e"
> > +if [ ! -z "$store" ] ; then
> > +   arg="$arg mem-stores:P"
> > +fi
> > +
> > +if [ ! -z "$load" ] ; then
> > +   if [ ! -z "$store" ] ; then
> > +   arg="$arg,mem-loads:P"
> > +   else
> > +   arg="$arg mem-loads:P"
> > +   fi
> > +   arg="$arg -W"
> > +fi
> > +
> > +arg="$arg -d --phys-data"
> > +
> > +perf record $arg $@
> > diff --git a/tools/perf/scripts/python/bin/mem-phys-addr-report
> b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > new file mode 100644
> > index 000..3f2b847
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/bin/mem-phys-addr-report
> > @@ -0,0 +1,3 @@
> > +#!/bin/bash
> > +# description: resolve physical address samples
> > +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/mem-phys-addr.py
> > diff --git a/tools/perf/scripts/python/mem-phys-addr.py
> b/tools/perf/scripts/python/mem-phys-addr.py
> > new file mode 100644
> > index 000..73b3a63
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/mem-phys-addr.py
> > @@ -0,0 +1,109 @@
> > +# mem-phys-addr.py: Resolve physical address samples
> > +# Copyright (c) 2017, Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify it
> > +# under the terms and conditions of the GNU General Public License,
> > +# version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but WITHOUT
> > +# ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> > +# more details.
> > +
> > +from __future__ import division
> > +import os
> > +import sys
> > +import struct
> > +import re
> > +import bisect
> > +import collections
> > +
> > +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> > +   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> > +
> > +system_ram = []
> > +pmem = []
> > +f = None
> > +load_event = ('mem-loads', '0x1cd')
> > +store_event = ('mem-stores', 

RE: [PATCH 2/2] perf/x86/intel/uncore: support IIO freerunning counter for SKX

2017-10-16 Thread Liang, Kan

Ping.
Any comments for this patch?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> As of Skylake Server, there are a number of free-running counters in
> each IIO Box that collect counts for per box IO clocks and per Port
> Input/Output x BW/Utilization.
> 
> The event code of free running event is shared with fixed event, which
> is 0xff.
> The umask of free running event starts from 0x10. The umask less than
> 0x10 is reserved for fixed event.
> 
> The Free running counters could have different MSR location and offset.
> Accordingly, they are divided into different types. Each type is limited
> to only have at most 16 events.
> So the umask of the first free running events type starts from 0x10. The
> umask of the second starts from 0x20. The rest can be done in the same
> manner.
> 
> Freerunning counters cannot be written by SW. Counting will be suspended
> only when the IIO Box is powered down. They are specially handled in
> uncore_pmu_event_add/del/start/stop and not added in box->events list.
> 
> The bit width of freerunning counter is 36-bit.
> 
> Signed-off-by: Kan Liang 
> ---
>  arch/x86/events/intel/uncore.c   | 33 +-
>  arch/x86/events/intel/uncore.h   | 67
> +++-
>  arch/x86/events/intel/uncore_snbep.c | 58
> +++
>  3 files changed, 156 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 1c5390f..8d3e46c 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -218,7 +218,9 @@ void uncore_perf_event_update(struct
> intel_uncore_box *box, struct perf_event *e
>   u64 prev_count, new_count, delta;
>   int shift;
> 
> - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> + if (event->hw.idx >= UNCORE_PMC_IDX_FREERUNNING)
> + shift = 64 - uncore_free_running_bits(box, event);
> + else if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
>   shift = 64 - uncore_fixed_ctr_bits(box);
>   else
>   shift = 64 - uncore_perf_ctr_bits(box);
> @@ -362,6 +364,9 @@ uncore_collect_events(struct intel_uncore_box *box,
> struct perf_event *leader,
>   if (n >= max_count)
>   return -EINVAL;
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING)
> + continue;
> +
>   box->event_list[n] = event;
>   n++;
>   }
> @@ -454,6 +459,12 @@ static void uncore_pmu_event_start(struct
> perf_event *event, int flags)
>   struct intel_uncore_box *box = uncore_event_to_box(event);
>   int idx = event->hw.idx;
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING) {
> + local64_set(>hw.prev_count,
> + uncore_read_counter(box, event));
> + return;
> + }
> +
>   if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
>   return;
> 
> @@ -479,6 +490,11 @@ static void uncore_pmu_event_stop(struct
> perf_event *event, int flags)
>   struct intel_uncore_box *box = uncore_event_to_box(event);
>   struct hw_perf_event *hwc = >hw;
> 
> + if (hwc->idx == UNCORE_PMC_IDX_FREERUNNING) {
> + uncore_perf_event_update(box, event);
> + return;
> + }
> +
>   if (__test_and_clear_bit(hwc->idx, box->active_mask)) {
>   uncore_disable_event(box, event);
>   box->n_active--;
> @@ -512,6 +528,13 @@ static int uncore_pmu_event_add(struct perf_event
> *event, int flags)
>   if (!box)
>   return -ENODEV;
> 
> + if (hwc->idx == UNCORE_PMC_IDX_FREERUNNING) {
> + event->hw.event_base = uncore_free_running_msr(box,
> event);
> + if (flags & PERF_EF_START)
> + uncore_pmu_event_start(event, 0);
> + return 0;
> + }
> +
>   ret = n = uncore_collect_events(box, event, false);
>   if (ret < 0)
>   return ret;
> @@ -570,6 +593,9 @@ static void uncore_pmu_event_del(struct perf_event
> *event, int flags)
> 
>   uncore_pmu_event_stop(event, PERF_EF_UPDATE);
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING)
> + return;
> +
>   for (i = 0; i < box->n_events; i++) {
>   if (event == box->event_list[i]) {
>   uncore_put_event_constraint(box, event);
> @@ -690,6 +716,11 @@ static int uncore_pmu_event_init(struct perf_event
> *event)
> 
>   /* fixed counters have event field hardcoded to zero */
>   hwc->config = 0ULL;
> + } else if (is_free_running_event(event)) {
> + if (UNCORE_FREE_RUNNING_MSR_IDX(event->attr.config) >
> + uncore_num_free_running(box, event))
> + return -EINVAL;
> + event->hw.idx = UNCORE_PMC_IDX_FREERUNNING;
>   } else {
>   hwc->config = event->attr.config &
> 

RE: [PATCH 2/2] perf/x86/intel/uncore: support IIO freerunning counter for SKX

2017-10-16 Thread Liang, Kan

Ping.
Any comments for this patch?

Thanks,
Kan
> 
> From: Kan Liang 
> 
> As of Skylake Server, there are a number of free-running counters in
> each IIO Box that collect counts for per box IO clocks and per Port
> Input/Output x BW/Utilization.
> 
> The event code of free running event is shared with fixed event, which
> is 0xff.
> The umask of free running event starts from 0x10. The umask less than
> 0x10 is reserved for fixed event.
> 
> The Free running counters could have different MSR location and offset.
> Accordingly, they are divided into different types. Each type is limited
> to only have at most 16 events.
> So the umask of the first free running events type starts from 0x10. The
> umask of the second starts from 0x20. The rest can be done in the same
> manner.
> 
> Freerunning counters cannot be written by SW. Counting will be suspended
> only when the IIO Box is powered down. They are specially handled in
> uncore_pmu_event_add/del/start/stop and not added in box->events list.
> 
> The bit width of freerunning counter is 36-bit.
> 
> Signed-off-by: Kan Liang 
> ---
>  arch/x86/events/intel/uncore.c   | 33 +-
>  arch/x86/events/intel/uncore.h   | 67
> +++-
>  arch/x86/events/intel/uncore_snbep.c | 58
> +++
>  3 files changed, 156 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 1c5390f..8d3e46c 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -218,7 +218,9 @@ void uncore_perf_event_update(struct
> intel_uncore_box *box, struct perf_event *e
>   u64 prev_count, new_count, delta;
>   int shift;
> 
> - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
> + if (event->hw.idx >= UNCORE_PMC_IDX_FREERUNNING)
> + shift = 64 - uncore_free_running_bits(box, event);
> + else if (event->hw.idx == UNCORE_PMC_IDX_FIXED)
>   shift = 64 - uncore_fixed_ctr_bits(box);
>   else
>   shift = 64 - uncore_perf_ctr_bits(box);
> @@ -362,6 +364,9 @@ uncore_collect_events(struct intel_uncore_box *box,
> struct perf_event *leader,
>   if (n >= max_count)
>   return -EINVAL;
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING)
> + continue;
> +
>   box->event_list[n] = event;
>   n++;
>   }
> @@ -454,6 +459,12 @@ static void uncore_pmu_event_start(struct
> perf_event *event, int flags)
>   struct intel_uncore_box *box = uncore_event_to_box(event);
>   int idx = event->hw.idx;
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING) {
> + local64_set(>hw.prev_count,
> + uncore_read_counter(box, event));
> + return;
> + }
> +
>   if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
>   return;
> 
> @@ -479,6 +490,11 @@ static void uncore_pmu_event_stop(struct
> perf_event *event, int flags)
>   struct intel_uncore_box *box = uncore_event_to_box(event);
>   struct hw_perf_event *hwc = >hw;
> 
> + if (hwc->idx == UNCORE_PMC_IDX_FREERUNNING) {
> + uncore_perf_event_update(box, event);
> + return;
> + }
> +
>   if (__test_and_clear_bit(hwc->idx, box->active_mask)) {
>   uncore_disable_event(box, event);
>   box->n_active--;
> @@ -512,6 +528,13 @@ static int uncore_pmu_event_add(struct perf_event
> *event, int flags)
>   if (!box)
>   return -ENODEV;
> 
> + if (hwc->idx == UNCORE_PMC_IDX_FREERUNNING) {
> + event->hw.event_base = uncore_free_running_msr(box,
> event);
> + if (flags & PERF_EF_START)
> + uncore_pmu_event_start(event, 0);
> + return 0;
> + }
> +
>   ret = n = uncore_collect_events(box, event, false);
>   if (ret < 0)
>   return ret;
> @@ -570,6 +593,9 @@ static void uncore_pmu_event_del(struct perf_event
> *event, int flags)
> 
>   uncore_pmu_event_stop(event, PERF_EF_UPDATE);
> 
> + if (event->hw.idx == UNCORE_PMC_IDX_FREERUNNING)
> + return;
> +
>   for (i = 0; i < box->n_events; i++) {
>   if (event == box->event_list[i]) {
>   uncore_put_event_constraint(box, event);
> @@ -690,6 +716,11 @@ static int uncore_pmu_event_init(struct perf_event
> *event)
> 
>   /* fixed counters have event field hardcoded to zero */
>   hwc->config = 0ULL;
> + } else if (is_free_running_event(event)) {
> + if (UNCORE_FREE_RUNNING_MSR_IDX(event->attr.config) >
> + uncore_num_free_running(box, event))
> + return -EINVAL;
> + event->hw.idx = UNCORE_PMC_IDX_FREERUNNING;
>   } else {
>   hwc->config = event->attr.config &
> (pmu->type->event_mask | ((u64)pmu->type-
> 

RE: [PATCH 3/4] perf record: event synthesization multithreading support

2017-10-13 Thread Liang, Kan
> Em Fri, Oct 13, 2017 at 07:09:26AM -0700, kan.li...@intel.com escreveu:
> > From: Kan Liang 
> >
> > The process function process_synthesized_event writes the process
> > result to perf.data, which is not multithreading friendly.
> >
> > Realloc buffer for each thread to temporarily keep the processing
> > result. Write them to the perf.data at the end of event synthesization.
> > The new method doesn't impact the final result, because
> >  - The order of the synthesized event is not important.
> >  - The number of synthesized event is limited. Usually, it only needs
> >hundreds of Kilobyte to store all the synthesized event.
> >It's unlikly failed because of lack of memory.
> 
> Why not write to a per cpu file and then at the end merge them? 

I just thought merging from memory should be faster than merging from files.

> Leave
> the memory management to the kernel, i.e. in most cases you may even not
> end up touching the disk, when memory is plentiful, just rewind the per
> event files and go on dumping to the main perf.data file.
>

Agree. I will do it.

> At some point we may just don't do this merging, and keep per cpu files
> all the way to perf report, etc. This would be a first foray into
> that...
>

Yes, but it's a little bit complex.
I think I will do it in a separate improvement patch series later.
For now, I will do the file merge.

Thanks,
Kan
 
> - Arnaldo
> 
> > The threads number hard code to online CPU number. The following patch
> > will introduce an option to set it.
> >
> > The multithreading synthesize is only available for per cpu monitoring.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/builtin-record.c | 86
> ++---
> >  1 file changed, 81 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 4ede9bf..1978021 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -62,6 +62,11 @@ struct switch_output {
> > bool set;
> >  };
> >
> > +struct synthesize_buf {
> > +   void*entry;
> > +   size_t  size;
> > +};
> > +
> >  struct record {
> > struct perf_tooltool;
> > struct record_opts  opts;
> > @@ -80,6 +85,7 @@ struct record {
> > booltimestamp_filename;
> > struct switch_outputswitch_output;
> > unsigned long long  samples;
> > +   struct synthesize_buf   *buf;
> >  };
> >
> >  static volatile int auxtrace_record__snapshot_started;
> > @@ -120,14 +126,35 @@ static int record__write(struct record *rec, void
> *bf, size_t size)
> > return 0;
> >  }
> >
> > +static int buf__write(struct record *rec, int idx, void *bf, size_t size)
> > +{
> > +   size_t target_size = rec->buf[idx].size;
> > +   void *target;
> > +
> > +   target = realloc(rec->buf[idx].entry, target_size + size);
> > +   if (target == NULL)
> > +   return -ENOMEM;
> > +
> > +   memcpy(target + target_size, bf, size);
> > +
> > +   rec->buf[idx].entry = target;
> > +   rec->buf[idx].size += size;
> > +
> > +   return 0;
> > +}
> > +
> >  static int process_synthesized_event(struct perf_tool *tool,
> >  union perf_event *event,
> >  struct perf_sample *sample
> __maybe_unused,
> >  struct machine *machine
> __maybe_unused,
> > -struct thread_info *thread
> __maybe_unused)
> > +struct thread_info *thread)
> >  {
> > struct record *rec = container_of(tool, struct record, tool);
> > -   return record__write(rec, event, event->header.size);
> > +
> > +   if (!perf_singlethreaded && thread)
> > +   return buf__write(rec, thread->idx, event, event-
> >header.size);
> > +   else
> > +   return record__write(rec, event, event->header.size);
> >  }
> >
> >  static int record__pushfn(void *to, void *bf, size_t size)
> > @@ -690,6 +717,48 @@ static const struct perf_event_mmap_page
> *record__pick_pc(struct record *rec)
> > return NULL;
> >  }
> >
> > +static int record__multithread_synthesize(struct record *rec,
> > + struct machine *machine,
> > + struct perf_tool *tool,
> > + struct record_opts *opts)
> > +{
> > +   int i, err, nr_thread = sysconf(_SC_NPROCESSORS_ONLN);
> > +
> > +   if (nr_thread > 1) {
> > +   perf_set_multithreaded();
> > +
> > +   rec->buf = calloc(nr_thread, sizeof(struct synthesize_buf));
> > +   if (rec->buf == NULL) {
> > +   pr_err("Could not do multithread synthesize\n");
> > +   nr_thread = 1;
> > +   perf_set_singlethreaded();
> > +   }
> > +   }
> > +
> > +   err = __machine__synthesize_threads(machine, tool, >target,
> > +

RE: [PATCH 3/4] perf record: event synthesization multithreading support

2017-10-13 Thread Liang, Kan
> Em Fri, Oct 13, 2017 at 07:09:26AM -0700, kan.li...@intel.com escreveu:
> > From: Kan Liang 
> >
> > The process function process_synthesized_event writes the process
> > result to perf.data, which is not multithreading friendly.
> >
> > Realloc buffer for each thread to temporarily keep the processing
> > result. Write them to the perf.data at the end of event synthesization.
> > The new method doesn't impact the final result, because
> >  - The order of the synthesized event is not important.
> >  - The number of synthesized event is limited. Usually, it only needs
> >hundreds of Kilobyte to store all the synthesized event.
> >It's unlikly failed because of lack of memory.
> 
> Why not write to a per cpu file and then at the end merge them? 

I just thought merging from memory should be faster than merging from files.

> Leave
> the memory management to the kernel, i.e. in most cases you may even not
> end up touching the disk, when memory is plentiful, just rewind the per
> event files and go on dumping to the main perf.data file.
>

Agree. I will do it.

> At some point we may just don't do this merging, and keep per cpu files
> all the way to perf report, etc. This would be a first foray into
> that...
>

Yes, but it's a little bit complex.
I think I will do it in a separate improvement patch series later.
For now, I will do the file merge.

Thanks,
Kan
 
> - Arnaldo
> 
> > The threads number hard code to online CPU number. The following patch
> > will introduce an option to set it.
> >
> > The multithreading synthesize is only available for per cpu monitoring.
> >
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/builtin-record.c | 86
> ++---
> >  1 file changed, 81 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 4ede9bf..1978021 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -62,6 +62,11 @@ struct switch_output {
> > bool set;
> >  };
> >
> > +struct synthesize_buf {
> > +   void*entry;
> > +   size_t  size;
> > +};
> > +
> >  struct record {
> > struct perf_tooltool;
> > struct record_opts  opts;
> > @@ -80,6 +85,7 @@ struct record {
> > booltimestamp_filename;
> > struct switch_outputswitch_output;
> > unsigned long long  samples;
> > +   struct synthesize_buf   *buf;
> >  };
> >
> >  static volatile int auxtrace_record__snapshot_started;
> > @@ -120,14 +126,35 @@ static int record__write(struct record *rec, void
> *bf, size_t size)
> > return 0;
> >  }
> >
> > +static int buf__write(struct record *rec, int idx, void *bf, size_t size)
> > +{
> > +   size_t target_size = rec->buf[idx].size;
> > +   void *target;
> > +
> > +   target = realloc(rec->buf[idx].entry, target_size + size);
> > +   if (target == NULL)
> > +   return -ENOMEM;
> > +
> > +   memcpy(target + target_size, bf, size);
> > +
> > +   rec->buf[idx].entry = target;
> > +   rec->buf[idx].size += size;
> > +
> > +   return 0;
> > +}
> > +
> >  static int process_synthesized_event(struct perf_tool *tool,
> >  union perf_event *event,
> >  struct perf_sample *sample
> __maybe_unused,
> >  struct machine *machine
> __maybe_unused,
> > -struct thread_info *thread
> __maybe_unused)
> > +struct thread_info *thread)
> >  {
> > struct record *rec = container_of(tool, struct record, tool);
> > -   return record__write(rec, event, event->header.size);
> > +
> > +   if (!perf_singlethreaded && thread)
> > +   return buf__write(rec, thread->idx, event, event-
> >header.size);
> > +   else
> > +   return record__write(rec, event, event->header.size);
> >  }
> >
> >  static int record__pushfn(void *to, void *bf, size_t size)
> > @@ -690,6 +717,48 @@ static const struct perf_event_mmap_page
> *record__pick_pc(struct record *rec)
> > return NULL;
> >  }
> >
> > +static int record__multithread_synthesize(struct record *rec,
> > + struct machine *machine,
> > + struct perf_tool *tool,
> > + struct record_opts *opts)
> > +{
> > +   int i, err, nr_thread = sysconf(_SC_NPROCESSORS_ONLN);
> > +
> > +   if (nr_thread > 1) {
> > +   perf_set_multithreaded();
> > +
> > +   rec->buf = calloc(nr_thread, sizeof(struct synthesize_buf));
> > +   if (rec->buf == NULL) {
> > +   pr_err("Could not do multithread synthesize\n");
> > +   nr_thread = 1;
> > +   perf_set_singlethreaded();
> > +   }
> > +   }
> > +
> > +   err = __machine__synthesize_threads(machine, tool, >target,
> > +   

RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode

2017-10-13 Thread Liang, Kan
> Em Fri, Oct 13, 2017 at 12:55:34PM +0000, Liang, Kan escreveu:
> > > Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.li...@intel.com escreveu:
> > > > From: Kan Liang <kan.li...@intel.com>
> > > >
> > > > Perf record can switch output. The new output should only store
> > > > the data after switching. However, in overwrite backward mode, the
> > > > new output still have the data from old output.
> > > >
> > > > At the end of mmap_read, the position of processed ring buffer is
> > > > saved in md->prev. Next mmap_read should be end in md->prev.
> > > > However, the md->prev is discarded. So next mmap_read has to
> > > > process whole valid ring buffer, which definitely include the old
> > > > processed data.
> > > >
> > > > Set the prev as the end of the range in backward mode.
> > >
> > > Do you think this should go together with the rest of this patchkit?
> > > Probably it should be processed ASAP, i.e. perf/urgent, no?
> > >
> >
> > Hi Arnaldo,
> >
> > After some discussion, Wang Nan proposed a new patch to fix this issue.
> > https://lkml.org/lkml/2017/10/12/441
> > I did some tests. It fixed the issue.
> 
> Ok, I'll look at it and stick a "Tested-by: Kan", ok?

Sure.

Thanks,
Kan

> 
> - Arnaldo
> 
> > Could you please take a look?
> > If it's OK for you, I think it could be merged separately.
> >
> > Thanks,
> > Kan
> >
> > > - Arnaldo
> > >
> > > > Signed-off-by: Kan Liang <kan.li...@intel.com>
> > > > ---
> > > >  tools/perf/util/evlist.c | 14 +-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > > > index
> > > > 33b8837..7d23cf5 100644
> > > > --- a/tools/perf/util/evlist.c
> > > > +++ b/tools/perf/util/evlist.c
> > > > @@ -742,13 +742,25 @@ static int
> > > >  rb_find_range(void *data, int mask, u64 head, u64 old,
> > > >   u64 *start, u64 *end, bool backward)  {
> > > > +   int ret;
> > > > +
> > > > if (!backward) {
> > > > *start = old;
> > > > *end = head;
> > > > return 0;
> > > > }
> > > >
> > > > -   return backward_rb_find_range(data, mask, head, start, end);
> > > > +   ret = backward_rb_find_range(data, mask, head, start, end);
> > > > +
> > > > +   /*
> > > > +* The start and end from backward_rb_find_range is the range
> > > > +for
> > > all
> > > > +* valid data in ring buffer.
> > > > +* However, part of the data is processed previously.
> > > > +* Reset the end to drop the processed data
> > > > +*/
> > > > +   *end = old;
> > > > +
> > > > +   return ret;
> > > >  }
> > > >
> > > >  /*
> > > > --
> > > > 2.5.5


RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode

2017-10-13 Thread Liang, Kan
> Em Fri, Oct 13, 2017 at 12:55:34PM +0000, Liang, Kan escreveu:
> > > Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.li...@intel.com escreveu:
> > > > From: Kan Liang 
> > > >
> > > > Perf record can switch output. The new output should only store
> > > > the data after switching. However, in overwrite backward mode, the
> > > > new output still have the data from old output.
> > > >
> > > > At the end of mmap_read, the position of processed ring buffer is
> > > > saved in md->prev. Next mmap_read should be end in md->prev.
> > > > However, the md->prev is discarded. So next mmap_read has to
> > > > process whole valid ring buffer, which definitely include the old
> > > > processed data.
> > > >
> > > > Set the prev as the end of the range in backward mode.
> > >
> > > Do you think this should go together with the rest of this patchkit?
> > > Probably it should be processed ASAP, i.e. perf/urgent, no?
> > >
> >
> > Hi Arnaldo,
> >
> > After some discussion, Wang Nan proposed a new patch to fix this issue.
> > https://lkml.org/lkml/2017/10/12/441
> > I did some tests. It fixed the issue.
> 
> Ok, I'll look at it and stick a "Tested-by: Kan", ok?

Sure.

Thanks,
Kan

> 
> - Arnaldo
> 
> > Could you please take a look?
> > If it's OK for you, I think it could be merged separately.
> >
> > Thanks,
> > Kan
> >
> > > - Arnaldo
> > >
> > > > Signed-off-by: Kan Liang 
> > > > ---
> > > >  tools/perf/util/evlist.c | 14 +-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > > > index
> > > > 33b8837..7d23cf5 100644
> > > > --- a/tools/perf/util/evlist.c
> > > > +++ b/tools/perf/util/evlist.c
> > > > @@ -742,13 +742,25 @@ static int
> > > >  rb_find_range(void *data, int mask, u64 head, u64 old,
> > > >   u64 *start, u64 *end, bool backward)  {
> > > > +   int ret;
> > > > +
> > > > if (!backward) {
> > > > *start = old;
> > > > *end = head;
> > > > return 0;
> > > > }
> > > >
> > > > -   return backward_rb_find_range(data, mask, head, start, end);
> > > > +   ret = backward_rb_find_range(data, mask, head, start, end);
> > > > +
> > > > +   /*
> > > > +* The start and end from backward_rb_find_range is the range
> > > > +for
> > > all
> > > > +* valid data in ring buffer.
> > > > +* However, part of the data is processed previously.
> > > > +* Reset the end to drop the processed data
> > > > +*/
> > > > +   *end = old;
> > > > +
> > > > +   return ret;
> > > >  }
> > > >
> > > >  /*
> > > > --
> > > > 2.5.5


RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode

2017-10-13 Thread Liang, Kan
> Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.li...@intel.com escreveu:
> > From: Kan Liang 
> >
> > Perf record can switch output. The new output should only store the
> > data after switching. However, in overwrite backward mode, the new
> > output still have the data from old output.
> >
> > At the end of mmap_read, the position of processed ring buffer is
> > saved in md->prev. Next mmap_read should be end in md->prev.
> > However, the md->prev is discarded. So next mmap_read has to process
> > whole valid ring buffer, which definitely include the old processed
> > data.
> >
> > Set the prev as the end of the range in backward mode.
> 
> Do you think this should go together with the rest of this patchkit?
> Probably it should be processed ASAP, i.e. perf/urgent, no?
>

Hi Arnaldo,

After some discussion, Wang Nan proposed a new patch to fix this issue.
https://lkml.org/lkml/2017/10/12/441
I did some tests. It fixed the issue.

Could you please take a look?
If it's OK for you, I think it could be merged separately.

Thanks,
Kan
 
> - Arnaldo
> 
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/evlist.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > 33b8837..7d23cf5 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -742,13 +742,25 @@ static int
> >  rb_find_range(void *data, int mask, u64 head, u64 old,
> >   u64 *start, u64 *end, bool backward)  {
> > +   int ret;
> > +
> > if (!backward) {
> > *start = old;
> > *end = head;
> > return 0;
> > }
> >
> > -   return backward_rb_find_range(data, mask, head, start, end);
> > +   ret = backward_rb_find_range(data, mask, head, start, end);
> > +
> > +   /*
> > +* The start and end from backward_rb_find_range is the range for
> all
> > +* valid data in ring buffer.
> > +* However, part of the data is processed previously.
> > +* Reset the end to drop the processed data
> > +*/
> > +   *end = old;
> > +
> > +   return ret;
> >  }
> >
> >  /*
> > --
> > 2.5.5


RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode

2017-10-13 Thread Liang, Kan
> Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.li...@intel.com escreveu:
> > From: Kan Liang 
> >
> > Perf record can switch output. The new output should only store the
> > data after switching. However, in overwrite backward mode, the new
> > output still have the data from old output.
> >
> > At the end of mmap_read, the position of processed ring buffer is
> > saved in md->prev. Next mmap_read should be end in md->prev.
> > However, the md->prev is discarded. So next mmap_read has to process
> > whole valid ring buffer, which definitely include the old processed
> > data.
> >
> > Set the prev as the end of the range in backward mode.
> 
> Do you think this should go together with the rest of this patchkit?
> Probably it should be processed ASAP, i.e. perf/urgent, no?
>

Hi Arnaldo,

After some discussion, Wang Nan proposed a new patch to fix this issue.
https://lkml.org/lkml/2017/10/12/441
I did some tests. It fixed the issue.

Could you please take a look?
If it's OK for you, I think it could be merged separately.

Thanks,
Kan
 
> - Arnaldo
> 
> > Signed-off-by: Kan Liang 
> > ---
> >  tools/perf/util/evlist.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > 33b8837..7d23cf5 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -742,13 +742,25 @@ static int
> >  rb_find_range(void *data, int mask, u64 head, u64 old,
> >   u64 *start, u64 *end, bool backward)  {
> > +   int ret;
> > +
> > if (!backward) {
> > *start = old;
> > *end = head;
> > return 0;
> > }
> >
> > -   return backward_rb_find_range(data, mask, head, start, end);
> > +   ret = backward_rb_find_range(data, mask, head, start, end);
> > +
> > +   /*
> > +* The start and end from backward_rb_find_range is the range for
> all
> > +* valid data in ring buffer.
> > +* However, part of the data is processed previously.
> > +* Reset the end to drop the processed data
> > +*/
> > +   *end = old;
> > +
> > +   return ret;
> >  }
> >
> >  /*
> > --
> > 2.5.5


RE: [PATCH] perf tool: Don't discard prev in backward mode

2017-10-12 Thread Liang, Kan
> Perf record can switch output. The new output should only store the data
> after switching. However, in overwrite backward mode, the new output still
> have the data from old output. That also brings extra overhead.
> 
> At the end of mmap_read, the position of processed ring buffer is saved in
> md->prev. Next mmap_read should be end in md->prev if it is not overwriten.
> That avoids to process duplicate data.
> However, the md->prev is discarded. So next mmap_read has to process
> whole valid ring buffer, which probably include the old processed data.
> 
> Avoid calling backward_rb_find_range() when md->prev is still available.
> 
> Signed-off-by: Wang Nan <wangn...@huawei.com>
> Cc: Liang Kan <kan.li...@intel.com>
> ---

The patch looks good to me.

Tested-by: Kan Liang <kan.li...@intel.com>


>  tools/perf/util/mmap.c | 33 +++--
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> 9fe5f9c..df1de55 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -287,18 +287,6 @@ static int backward_rb_find_range(void *buf, int
> mask, u64 head, u64 *start, u64
>   return -1;
>  }
> 
> -static int rb_find_range(void *data, int mask, u64 head, u64 old,
> -  u64 *start, u64 *end, bool backward)
> -{
> - if (!backward) {
> - *start = old;
> - *end = head;
> - return 0;
> - }
> -
> - return backward_rb_find_range(data, mask, head, start, end);
> -}
> -
>  int perf_mmap__push(struct perf_mmap *md, bool overwrite, bool
> backward,
>   void *to, int push(void *to, void *buf, size_t size))  { @@ 
> -
> 310,19 +298,28 @@ int perf_mmap__push(struct perf_mmap *md, bool
> overwrite, bool backward,
>   void *buf;
>   int rc = 0;
> 
> - if (rb_find_range(data, md->mask, head, old, , ,
> backward))
> - return -1;
> + start = backward ? head : old;
> + end = backward ? old : head;
> 
>   if (start == end)
>   return 0;
> 
>   size = end - start;
>   if (size > (unsigned long)(md->mask) + 1) {
> - WARN_ONCE(1, "failed to keep up with mmap data. (warn
> only once)\n");
> + if (!backward) {
> + WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only
> +once)\n");
> 
> - md->prev = head;
> - perf_mmap__consume(md, overwrite || backward);
> - return 0;
> + md->prev = head;
> + perf_mmap__consume(md, overwrite || backward);
> + return 0;
> + }
> +
> + /*
> +  * Backward ring buffer is full. We still have a chance to read
> +  * most of data from it.
> +  */
> + if (backward_rb_find_range(data, md->mask, head, ,
> ))
> + return -1;
>   }
> 
>   if ((start & md->mask) + size != (end & md->mask)) {
> --
> 2.9.3



RE: [PATCH] perf tool: Don't discard prev in backward mode

2017-10-12 Thread Liang, Kan
> Perf record can switch output. The new output should only store the data
> after switching. However, in overwrite backward mode, the new output still
> have the data from old output. That also brings extra overhead.
> 
> At the end of mmap_read, the position of processed ring buffer is saved in
> md->prev. Next mmap_read should be end in md->prev if it is not overwriten.
> That avoids to process duplicate data.
> However, the md->prev is discarded. So next mmap_read has to process
> whole valid ring buffer, which probably include the old processed data.
> 
> Avoid calling backward_rb_find_range() when md->prev is still available.
> 
> Signed-off-by: Wang Nan 
> Cc: Liang Kan 
> ---

The patch looks good to me.

Tested-by: Kan Liang 


>  tools/perf/util/mmap.c | 33 +++--
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> 9fe5f9c..df1de55 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -287,18 +287,6 @@ static int backward_rb_find_range(void *buf, int
> mask, u64 head, u64 *start, u64
>   return -1;
>  }
> 
> -static int rb_find_range(void *data, int mask, u64 head, u64 old,
> -  u64 *start, u64 *end, bool backward)
> -{
> - if (!backward) {
> - *start = old;
> - *end = head;
> - return 0;
> - }
> -
> - return backward_rb_find_range(data, mask, head, start, end);
> -}
> -
>  int perf_mmap__push(struct perf_mmap *md, bool overwrite, bool
> backward,
>   void *to, int push(void *to, void *buf, size_t size))  { @@ 
> -
> 310,19 +298,28 @@ int perf_mmap__push(struct perf_mmap *md, bool
> overwrite, bool backward,
>   void *buf;
>   int rc = 0;
> 
> - if (rb_find_range(data, md->mask, head, old, , ,
> backward))
> - return -1;
> + start = backward ? head : old;
> + end = backward ? old : head;
> 
>   if (start == end)
>   return 0;
> 
>   size = end - start;
>   if (size > (unsigned long)(md->mask) + 1) {
> - WARN_ONCE(1, "failed to keep up with mmap data. (warn
> only once)\n");
> + if (!backward) {
> + WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only
> +once)\n");
> 
> - md->prev = head;
> - perf_mmap__consume(md, overwrite || backward);
> - return 0;
> + md->prev = head;
> + perf_mmap__consume(md, overwrite || backward);
> + return 0;
> + }
> +
> + /*
> +  * Backward ring buffer is full. We still have a chance to read
> +  * most of data from it.
> +  */
> + if (backward_rb_find_range(data, md->mask, head, ,
> ))
> + return -1;
>   }
> 
>   if ((start & md->mask) + size != (end & md->mask)) {
> --
> 2.9.3



<    1   2   3   4   5   6   7   8   9   10   >