Re: [PATCH v2 1/2] perf tools: add 'perf irq' to measure the hardware interrupts

2021-01-14 Thread Alexei Budankov


On 14.01.2021 10:48, Bixuan Cui wrote:
> Add 'perf irq' to trace/measure the hardware interrupts.
> 
> Now three functions are provided:
>   1. 'perf irq record ' to record the irq handler events.
>   2. 'perf irq script' to see a detailed trace of the workload that
>was recorded.
>   3. 'perf irq report' to calculate the time consumed by each
>hardware interrupt processing function.
> 
> Signed-off-by: Bixuan Cui 
> ---
>  tools/perf/Build |   1 +
>  tools/perf/builtin-irq.c | 287 +++
>  tools/perf/builtin.h |   1 +
>  tools/perf/perf.c|   1 +
>  4 files changed, 290 insertions(+)
>  create mode 100644 tools/perf/builtin-irq.c



> +static int __cmd_record(int argc, const char **argv)
> +{
> + unsigned int rec_argc, i, j;
> + const char **rec_argv;
> + const char * const record_args[] = {
> + "record",
> + "-a",

Could you please make it configurable from the command line 
jointly with -p option?

> + "-R",
> + "-c", "1",
> + "-e", "irq:irq_handler_entry",
> + "-e", "irq:irq_handler_exit",
> + };

Thanks,
Alexei


Re: [PATCH v2 2/2] perf tools: Add documentation for 'perf irq' command

2021-01-14 Thread Alexei Budankov


On 14.01.2021 10:48, Bixuan Cui wrote:
> Add documentation for 'perf irq' command.
> 
> Signed-off-by: Bixuan Cui 
> ---
>  tools/perf/Documentation/perf-irq.txt | 58 +++
>  tools/perf/command-list.txt   |  1 +
>  2 files changed, 59 insertions(+)
>  create mode 100644 tools/perf/Documentation/perf-irq.txt



> +
> +OPTIONS for 'perf irq report'
> +
> +
> +--cpus::
> + Show just entries with activities for the given CPUs.

perf report mode already has -C,--cpu  option so 
it makes sense to reuse the option for perf irq report.

Regards,
Alexei


Re: [PATCH 1/2] perf tools: add 'perf irq' to measure the hardware interrupts

2021-01-12 Thread Alexei Budankov


Hi Bixuan,

On 12.01.2021 15:55, Bixuan Cui wrote:
> Add 'perf irq' to trace/measure the hardware interrupts.
> 
> Now three functions are provided:
>   1. 'perf irq record ' to record the irq handler events.
>   2. 'perf irq script' to see a detailed trace of the workload that
>was recorded.
>   3. 'perf irq timeconsume' to calculate the time consumed by each
>hardware interrupt processing function.
> 
> Signed-off-by: Bixuan Cui 

Thanks for the patches. There is still something that could be improved.

> ---
>  tools/perf/Build |   1 +
>  tools/perf/builtin-irq.c | 288 +++
>  tools/perf/builtin.h |   1 +
>  tools/perf/perf.c|   1 +
>  4 files changed, 291 insertions(+)
>  create mode 100644 tools/perf/builtin-irq.c
> 



> +
> +static int __cmd_record(int argc, const char **argv)
> +{
> + unsigned int rec_argc, i, j;
> + const char **rec_argv;
> + const char * const record_args[] = {
> + "record",
> + "-a",

I see it works also like this:

sudo perf record -p PID -c 1 -e irq:irq_handler_entry,irq:irq_handler_exit
sudo perf record -R -c 1 -e irq:irq_handler_entry,irq:irq_handler_exit -- find /

This -a option jointly with -p option could be made configurable from
the command line for perf irq mode.

> + "-R",
> + "-m", "1024",

Do you see data losses with default buffer size of 512KB
when capturing trace in your specific use case?

If not then this -m could be avoided or made configurable
if you still need it.

Regards,
Alexei


Re: [PATCH 2/2] perf tools: Add documentation for 'perf irq' command

2021-01-12 Thread Alexei Budankov


Hi,

On 12.01.2021 15:55, Bixuan Cui wrote:
> Add documentation for 'perf irq' command.
> 
> Signed-off-by: Bixuan Cui 
> ---
>  tools/perf/Documentation/perf-irq.txt | 58 +++
>  tools/perf/command-list.txt   |  1 +
>  2 files changed, 59 insertions(+)
>  create mode 100644 tools/perf/Documentation/perf-irq.txt
> 
> diff --git a/tools/perf/Documentation/perf-irq.txt 
> b/tools/perf/Documentation/perf-irq.txt
> new file mode 100644
> index ..8c0e388dad59
> --- /dev/null
> +++ b/tools/perf/Documentation/perf-irq.txt
> @@ -0,0 +1,58 @@
> +perf-irq(1)
> +=
> +
> +NAME
> +
> +perf-irq - Tool to trace/measure hardware interrupts
> +
> +SYNOPSIS
> +
> +[verse]
> +'perf irq' {record|timeconsume|script}
> +
> +DESCRIPTION
> +---
> +There are several variants of 'perf irq':
> +
> +  'perf irq record ' to record the irq handler events
> +  of an arbitrary workload.
> +
> +  'perf irq script' to see a detailed trace of the workload that
> +   was recorded (aliased to 'perf script' for now).
> +
> +  'perf irq timeconsume' to calculate the time consumed by each
> +   hardware interrupt processing function.
> +
> +Example usage:
> +perf irq record -- sleep 1
> +perf irq timeconsume

This timeconsume mode looks more like: perf irq report 

> +
> +   By default it shows the individual irq events, including the irq name,
> +   cpu(execute the hardware interrupt processing function), time consumed,
> +   entry time and exit time for the each hardware irq:
> +
> +   
> ---
> + Irq name |  CPU   | Time consume us | Handler entry time | 
> Handler exit time
> +   
> ---
> + enp2s0f2-tx-0| [0006] |  0.01 s |   6631263.313329 s |   
> 6631263.313330 s
> +
> +   
> ---
> + Irq name |  CPU   | Time consume us | Handler entry time | 
> Handler exit time
> +   
> ---
> + megasas  | [0013] |  0.03 s |   6631263.209564 s |   
> 6631263.209567 s
> +
> +   
> ---
> + Irq name |  CPU   | Time consume us | Handler entry time | 
> Handler exit time
> +   
> ---
> + acpi | [0016] |  0.18 s |   6631263.085787 s |   
> 6631263.085805 s
> +
> +
> +OPTIONS for 'perf irq'
> +
> +
> +--cpus::
> + Show just entries with activities for the given CPUs.

I am getting this:

tools/perf/perf irq --cpus=0 record -- find /
  Error: unknown option `cpus=0'

 Usage: perf irq [] {record|timeconsume|script}

Regards,
Alexei


Re: [PATCH 4/8] perf daemon: Add daemon command

2020-12-15 Thread Alexei Budankov


On 15.12.2020 22:43, Jiri Olsa wrote:
> On Tue, Dec 15, 2020 at 06:40:26PM +0300, Alexei Budankov wrote:
>> Hi,
>>
>> On 12.12.2020 13:43, Jiri Olsa wrote:
>>> Adding daemon command that allows to run record sessions
>>> on background. Each session represents one perf record
>>> process and is configured in config file.
>>>
>>> Example:
>>>
>>>   # cat config.daemon
>>>   [daemon]
>>>   base=/opt/perfdata
>>
>> It could probably make sense to consider using locations at /var/
>> directory, similar to other already existing daemon processes in
>> system so admin and user experience would be easily reusabe for
>> performance monitoring daemon (service).
> 
> hm, you can specify any /var path in there if you like,
> do you suggest to hardcode it?

This thing: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
Since Perf is a part of OS it would better use some standardized locations.

> 
>>
>>>
>>>   [session-1]
>>>   run = -m 10M -e cycles -o /opt/perfdata/1/perf.data --overwrite 
>>> --switch-output -a
>>>
>>>   [session-2]
>>>   run = -m 20M -e sched:* -o /opt/perfdata/2/perf.data --overwrite 
>>> --switch-output -a
>>>
>>> Default perf config has the same daemon base:
>>>
>>>   # cat ~/.perfconfig
>>>   [daemon]
>>>   base=/opt/perfdata
>>>
>>> Starting the daemon:
>>>
>>>   # perf daemon --config config.daemon
>>
>> It could make sense to name daemon config file similar to .perfconfig
>> e.g. like .perfconfig.daemon. perf daemon command would then assume, by
>> default, usage of .perfconfig.daemon config or the one specified on the
>> command line via --config option. It also would be helpfull have loaded
>> config file path printed into console:
>> # perf daemon
>> Daemon process  started with config /path/to/.perfconfig.daemon
> 
> so the current way is, that following creates daemon:
> 
>   # perf daemon --config 
> 
> and any other 'non --config' option' is used to 'query/control' daemon:
> 
>   # perf daemon
>   # perf daemon --signal
>   # perf daemon --stop
>   ...
> 
> 
> I'd like to keep short way checking on daemon, without too many
> options, like:
> 
>   # perf daemon
>   [690174:daemon] base: /opt/perfdata
>   [690175:top] perf record -e cycles --switch-output=1m --switch-max-files=6 
> -a
> 
> 
> I think maybe we don't need any other .perfconfig, we could have
> all in standard .perfconfig, like:
> 
>   # cat .perfconfig:
>   [daemon]
>   base=/opt/perfdata
> 
>   [session-1]
>   run = -m 1M -e cycles --overwrite --switch-output -a
>   [session-2]
>   run = -m 1M -e sched:* --overwrite --switch-output -a
> 
> 
> and to run daemon on top of it:
> 
>   # perf daemon --start
> 
> 
> to run daemon with alternate config:
> 
>   # perf daemon --start=
> 
> or:
> 
>   # perf daemon --start --config=
> 
> 
> and checking on daemon with default .perfconfig setup:
> 
>   # perf daemon
> 
> 
> checking on daemon with different base or config:
> 
>   # perf daemon --base=
>   # perf daemon --config=
>   # perf daemon --base= --stop
>   # perf daemon --base= --signal
>   # perf daemon --config= --stop
>   # perf daemon --config= --signal
> 
> how about that?

Extending .perfconfig would look simpler for users, IHMO.
It looks like --base option actually implements --sandbox
or similar semantics.

> 
> SNIP
> 
>>> +static struct session*
>>> +daemon__find_session(struct daemon *daemon, char *name)
>>> +{
>>> +   struct session *session;
>>> +
>>> +   list_for_each_entry(session, &daemon->sessions, list) {
>>> +   if (!strcmp(session->name, name))
>>> +   return session;
>>> +   }
>>> +
>>> +   return NULL;
>>> +}
>>> +
>>> +static int session_name(const char *var, char *session, int len)
>>
>> should possibly name it get_session_name.
> 
> ok
> 
>>
>>> +{
>>> +   const char *p = var + sizeof("session-") - 1;
>>
>> should possibly check that p still points inside [var, var+len).
> 
> ok
> 
> SNIP
> 
>>> +static int session__wait(struct session *session, struct daemon *daemon,
>>> +int secs)
>>> +{
>>> +   time_t current, start = 0;
>>> +   int cnt;
>>> +
>>> +   start = current = time(N

Re: [PATCH 4/8] perf daemon: Add daemon command

2020-12-15 Thread Alexei Budankov
Hi,

On 12.12.2020 13:43, Jiri Olsa wrote:
> Adding daemon command that allows to run record sessions
> on background. Each session represents one perf record
> process and is configured in config file.
> 
> Example:
> 
>   # cat config.daemon
>   [daemon]
>   base=/opt/perfdata

It could probably make sense to consider using locations at /var/
directory, similar to other already existing daemon processes in
system so admin and user experience would be easily reusabe for
performance monitoring daemon (service).

> 
>   [session-1]
>   run = -m 10M -e cycles -o /opt/perfdata/1/perf.data --overwrite 
> --switch-output -a
> 
>   [session-2]
>   run = -m 20M -e sched:* -o /opt/perfdata/2/perf.data --overwrite 
> --switch-output -a
> 
> Default perf config has the same daemon base:
> 
>   # cat ~/.perfconfig
>   [daemon]
>   base=/opt/perfdata
> 
> Starting the daemon:
> 
>   # perf daemon --config config.daemon

It could make sense to name daemon config file similar to .perfconfig
e.g. like .perfconfig.daemon. perf daemon command would then assume, by
default, usage of .perfconfig.daemon config or the one specified on the
command line via --config option. It also would be helpfull have loaded
config file path printed into console:
# perf daemon
Daemon process  started with config /path/to/.perfconfig.daemon

> 
> Check sessions:
> 
>   # perf daemon
>   [1:92187] perf record -m 11M -e cycles -o /opt/perfdata/1/perf.data 
> --overwrite --switch-output -a
>   [2:92188] perf record -m 20M -e sched:* -o /opt/perfdata/2/perf.data 
> --overwrite --switch-output -a
> 
> Check sessions with more info:
> 
>   # perf daemon -v
>   [1:92187] perf record -m 11M -e cycles -o /opt/perfdata/1/perf.data 
> --overwrite --switch-output -a
> output:  /opt/perfdata/1/output
>   [2:92188] perf record -m 20M -e sched:* -o /opt/perfdata/2/perf.data 
> --overwrite --switch-output -a
> output:  /opt/perfdata/2/output
> 
> The 'output' file is perf record output for specific session.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/Build |   3 +
>  tools/perf/Documentation/perf-daemon.txt |  97 +++
>  tools/perf/builtin-daemon.c  | 794 +++
>  tools/perf/builtin.h |   1 +
>  tools/perf/command-list.txt  |   1 +
>  tools/perf/perf.c|   1 +
>  6 files changed, 897 insertions(+)
>  create mode 100644 tools/perf/Documentation/perf-daemon.txt
>  create mode 100644 tools/perf/builtin-daemon.c
> 
> diff --git a/tools/perf/Build b/tools/perf/Build
> index 5f392dbb88fc..54aa38996fff 100644
> --- a/tools/perf/Build
> +++ b/tools/perf/Build
> @@ -24,6 +24,7 @@ perf-y += builtin-mem.o
>  perf-y += builtin-data.o
>  perf-y += builtin-version.o
>  perf-y += builtin-c2c.o
> +perf-y += builtin-daemon.o
>  
>  perf-$(CONFIG_TRACE) += builtin-trace.o
>  perf-$(CONFIG_LIBELF) += builtin-probe.o
> @@ -53,3 +54,5 @@ perf-y += scripts/
>  perf-$(CONFIG_TRACE) += trace/beauty/
>  
>  gtk-y += ui/gtk/
> +
> +CFLAGS_builtin-daemon.o += -DPERF="BUILD_STR($(bindir_SQ)/perf)"
> diff --git a/tools/perf/Documentation/perf-daemon.txt 
> b/tools/perf/Documentation/perf-daemon.txt
> new file mode 100644
> index ..dee39be110ba
> --- /dev/null
> +++ b/tools/perf/Documentation/perf-daemon.txt
> @@ -0,0 +1,97 @@
> +perf-daemon(1)
> +==
> +
> +NAME
> +
> +perf-daemon - Run record sessions on background
> +
> +SYNOPSIS
> +
> +[verse]
> +'perf daemon'
> +'perf daemon' []
> +
> +DESCRIPTION
> +---
> +This command allows to run simple daemon process that starts and
> +monitors configured record sessions.
> +
> +Each session represents one perf record process.
> +
> +These sessions are configured through config file, see CONFIG FILE
> +section with EXAMPLES.
> +
> +OPTIONS
> +---
> +--config=::
> + Config file path.
> +
> +-f::
> +--foreground::
> + Do not put the process in background.
> +
> +-v::
> +--verbose::
> + Be more verbose.
> +
> +CONFIG FILE
> +---
> +The daemon is configured within standard perf config file by
> +following new variables:
> +
> +daemon.base:
> + Base path for daemon data. All sessions data are
> + stored under this path.
> +
> +session-.run:
> + Defines new record session. The value is record's command
> + line without the 'record' keyword.
> +
> +EXAMPLES
> +
> +Example with 2 record sessions:
> +
> +  # cat config.daemon
> +  [daemon]
> +  base=/opt/perfdata
> +
> +  [session-1]
> +  run = -m 10M -e cycles -o /opt/perfdata/1/perf.data --overwrite 
> --switch-output -a
> +
> +  [session-2]
> +  run = -m 20M -e sched:* -o /opt/perfdata/2/perf.data --overwrite 
> --switch-output -a
> +
> +
> +Default perf config has the same daemon base:
> +
> +  # cat ~/.perfconfig
> +  [daemon]
> +  base=/opt/perfdata
> +
> +
> +Starting the daemon:
> +
> +  # perf daemon --config config.daemon
> +
> +
> +Check sessions:
> +
> +  # perf daemon
> +  [1:92187

Re: [PATCH v3 00/12] Introduce threaded trace streaming for basic perf record operation

2020-12-15 Thread Alexei Budankov
Hi,

On 20.11.2020 12:45, Namhyung Kim wrote:
> Hi,
> 
> Thanks for working on this!

Thanks for your review.
Just spotted comments for this version.
Sorry for delay.

> 
> On Mon, Nov 16, 2020 at 03:12:47PM +0300, Alexey Budankov wrote:
>>
>> Changes in v3:
>> - avoided skipped redundant patch 3/15
>> - applied "data file" and "data directory" terms allover the patch set
>> - captured Acked-by: tags by Namhyung Kim
>> - avoided braces where don't needed
>> - employed thread local variable for serial trace streaming 
>> - added specs for --thread option - core, socket, numa and user defined
>> - added parallel loading of data directory files similar to the prototype [1]
> 
> Can you please consider splitting tracing records (FORK/MMAP/...) into
> a separate file?  I think this change would put too much burden to the
> perf report side.  I'm saying this repeatedly because I'm afraid that
> it'd be harder to change later once we accept this approach/format..

Alexey Bayduraev in To/Cc is going to proceed with this work
so there are chances to have updated version soon.

Thanks,
Alexei


Re: [PATCH 2/3] perf tools: Allow to enable/disable events via control file

2020-12-10 Thread Alexei Budankov



On 10.12.2020 21:20, Alexei Budankov wrote:
> 
> On 10.12.2020 21:06, Jiri Olsa wrote:
>> On Thu, Dec 10, 2020 at 05:24:30PM +0100, Jiri Olsa wrote:
>>> On Mon, Dec 07, 2020 at 08:02:20PM +0300, Alexei Budankov wrote:
>>>> Hi,
>>>>
>>>> On 06.12.2020 20:05, Jiri Olsa wrote:
>>>>> Adding new control events to enable/disable specific event.
>>>>> The interface string for control file are:
>>>>>
>>>>>   'enable-'
>>>>>   'disable-'
>>>>
>>>> 
>>>>
>>>>>
>>>>> when received the command, perf will scan the current evlist
>>>>> for  and if found it's enabled/disabled.
>>>>
>>>> 
>>>>
>>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>>>> index 70aff26612a9..05723227bebf 100644
>>>>> --- a/tools/perf/util/evlist.c
>>>>> +++ b/tools/perf/util/evlist.c
>>>>> @@ -1915,7 +1915,13 @@ static int evlist__ctlfd_recv(struct evlist 
>>>>> *evlist, enum evlist_ctl_cmd *cmd,
>>>>>bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
>>>>>  
>>>>>   if (bytes_read > 0) {
>>>>> - if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>>>>> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_EVSEL_TAG,
>>>>> + 
>>>>> (sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG)-1))) {
>>>>> + *cmd = EVLIST_CTL_CMD_ENABLE_EVSEL;
>>>>> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_EVSEL_TAG,
>>>>> + 
>>>>> (sizeof(EVLIST_CTL_CMD_DISABLE_EVSEL_TAG)-1))) {
>>>>> + *cmd = EVLIST_CTL_CMD_DISABLE_EVSEL;
>>>>> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>>>>>(sizeof(EVLIST_CTL_CMD_ENABLE_TAG)-1))) {
>>>>>   *cmd = EVLIST_CTL_CMD_ENABLE;
>>>>>   } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
>>>>> @@ -1952,6 +1958,8 @@ int evlist__ctlfd_process(struct evlist *evlist, 
>>>>> enum evlist_ctl_cmd *cmd)
>>>>>   char cmd_data[EVLIST_CTL_CMD_MAX_LEN];
>>>>>   int ctlfd_pos = evlist->ctl_fd.pos;
>>>>>   struct pollfd *entries = evlist->core.pollfd.entries;
>>>>> + struct evsel *evsel;
>>>>> + char *evsel_name;
>>>>>  
>>>>>   if (!evlist__ctlfd_initialized(evlist) || !entries[ctlfd_pos].revents)
>>>>>   return 0;
>>>>> @@ -1967,6 +1975,26 @@ int evlist__ctlfd_process(struct evlist *evlist, 
>>>>> enum evlist_ctl_cmd *cmd)
>>>>>   case EVLIST_CTL_CMD_DISABLE:
>>>>>   evlist__disable(evlist);
>>>>>   break;
>>>>> + case EVLIST_CTL_CMD_ENABLE_EVSEL:
>>>>> + evsel_name = cmd_data + 
>>>>> sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG) - 1;
>>>>
>>>> It makes sense to check that evsel_name still points
>>>> into cmd_data buffer after assigning to event name.
>>>
>>> right, will add that
>>
>> actualy it's already checked in evlist__ctlfd_recv, evsel_name at
>> worst will be empty string so evlist__find_evsel_by_str will fail
>>
>> I'll add '' around %s in the error output string:
>>
>>   failed: can't find '%s' event
>>
>> so it's obvious when it's empty

Acked-by: Alexei Budankov 

Thanks,
Alexei


Re: [PATCH 2/3] perf tools: Allow to enable/disable events via control file

2020-12-10 Thread Alexei Budankov


On 10.12.2020 21:06, Jiri Olsa wrote:
> On Thu, Dec 10, 2020 at 05:24:30PM +0100, Jiri Olsa wrote:
>> On Mon, Dec 07, 2020 at 08:02:20PM +0300, Alexei Budankov wrote:
>>> Hi,
>>>
>>> On 06.12.2020 20:05, Jiri Olsa wrote:
>>>> Adding new control events to enable/disable specific event.
>>>> The interface string for control file are:
>>>>
>>>>   'enable-'
>>>>   'disable-'
>>>
>>> 
>>>
>>>>
>>>> when received the command, perf will scan the current evlist
>>>> for  and if found it's enabled/disabled.
>>>
>>> 
>>>
>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>>> index 70aff26612a9..05723227bebf 100644
>>>> --- a/tools/perf/util/evlist.c
>>>> +++ b/tools/perf/util/evlist.c
>>>> @@ -1915,7 +1915,13 @@ static int evlist__ctlfd_recv(struct evlist 
>>>> *evlist, enum evlist_ctl_cmd *cmd,
>>>> bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
>>>>  
>>>>if (bytes_read > 0) {
>>>> -  if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>>>> +  if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_EVSEL_TAG,
>>>> +  
>>>> (sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG)-1))) {
>>>> +  *cmd = EVLIST_CTL_CMD_ENABLE_EVSEL;
>>>> +  } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_EVSEL_TAG,
>>>> +  
>>>> (sizeof(EVLIST_CTL_CMD_DISABLE_EVSEL_TAG)-1))) {
>>>> +  *cmd = EVLIST_CTL_CMD_DISABLE_EVSEL;
>>>> +  } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>>>> (sizeof(EVLIST_CTL_CMD_ENABLE_TAG)-1))) {
>>>>*cmd = EVLIST_CTL_CMD_ENABLE;
>>>>} else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
>>>> @@ -1952,6 +1958,8 @@ int evlist__ctlfd_process(struct evlist *evlist, 
>>>> enum evlist_ctl_cmd *cmd)
>>>>char cmd_data[EVLIST_CTL_CMD_MAX_LEN];
>>>>int ctlfd_pos = evlist->ctl_fd.pos;
>>>>struct pollfd *entries = evlist->core.pollfd.entries;
>>>> +  struct evsel *evsel;
>>>> +  char *evsel_name;
>>>>  
>>>>if (!evlist__ctlfd_initialized(evlist) || !entries[ctlfd_pos].revents)
>>>>return 0;
>>>> @@ -1967,6 +1975,26 @@ int evlist__ctlfd_process(struct evlist *evlist, 
>>>> enum evlist_ctl_cmd *cmd)
>>>>case EVLIST_CTL_CMD_DISABLE:
>>>>evlist__disable(evlist);
>>>>break;
>>>> +  case EVLIST_CTL_CMD_ENABLE_EVSEL:
>>>> +  evsel_name = cmd_data + 
>>>> sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG) - 1;
>>>
>>> It makes sense to check that evsel_name still points
>>> into cmd_data buffer after assigning to event name.
>>
>> right, will add that
> 
> actualy it's already checked in evlist__ctlfd_recv, evsel_name at
> worst will be empty string so evlist__find_evsel_by_str will fail
> 
> I'll add '' around %s in the error output string:
> 
>   failed: can't find '%s' event
> 
> so it's obvious when it's empty

Looks good to me. Thanks!

Alexei

> 
> jirka
> 
> .
> 


Re: [PATCH 1/3] perf tools: Add evlist__disable_evsel/evlist__enable_evsel

2020-12-07 Thread Alexei Budankov
Hi,

On 06.12.2020 20:05, Jiri Olsa wrote:
> Adding interface to enable/disable single event in the
> evlist based on its name. It will be used later in new
> control enable/disable interface.
> 
> Keeping the evlist::enabled true when one or more events
> are enabled so the toggle can work properly and toggle
> evlist to disabled state.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/evlist.c | 69 ++--
>  tools/perf/util/evlist.h |  2 ++
>  2 files changed, 68 insertions(+), 3 deletions(-)

Acked-by: Alexei Budankov 

Regards,
Alexei


Re: [PATCH 3/3] perf tools: Allow to list events via control file

2020-12-07 Thread Alexei Budankov
Hi,

On 07.12.2020 19:28, Arnaldo Carvalho de Melo wrote:
> Em Sun, Dec 06, 2020 at 06:05:19PM +0100, Jiri Olsa escreveu:
>> Adding new control event to display all evlist events.
>>
>> The interface string for control file is 'list'. When
>> received, perf will scan and print current evlist into
>> perf record terminal.
>>
>> Example session:
>>
>>   terminal 1:
>> # mkfifo control ack perf.pipe
>> # perf record --control=fifo:control,ack -D -1 --no-buffering -e 
>> 'sched:*' -o - > perf.pipe
>> Events disabled
>>
>>   terminal 2:
>> # echo list > control
>>
>>   terminal 1:
>> # perf record --control=fifo:control,ack -D -1 --no-buffering -e 
>> 'sched:*' -o - > perf.pipe
>> ...
>> sched:sched_kthread_stop
>> sched:sched_kthread_stop_ret
>> sched:sched_waking
>> sched:sched_wakeup
>> sched:sched_wakeup_new
>> sched:sched_switch
>> sched:sched_migrate_task
>> sched:sched_process_free
>> sched:sched_process_exit
>> sched:sched_wait_task
>> sched:sched_process_wait
>> sched:sched_process_fork
>> sched:sched_process_exec
>> sched:sched_stat_wait
>> sched:sched_stat_sleep
>> sched:sched_stat_iowait
>> sched:sched_stat_blocked
>> sched:sched_stat_runtime
>> sched:sched_pi_setprio
>> sched:sched_move_numa
>> sched:sched_stick_numa
>> sched:sched_swap_numa
>> sched:sched_wake_idle_without_ipi
>> dummy:HG
>>
>> This new command is handy to get real event names when
>> wildcards are used.
> 
> Ok, would be nice to have a verbose mode like:
> 
> [acme@five ~]$ sudo ~acme/bin/perf record -e 'sched:*' sleep 0.001
> [ perf record: Woken up 14 times to write data ]
> [ perf record: Captured and wrote 0.023 MB perf.data (16 samples) ]
> [acme@five ~]$ sudo ~acme/bin/perf evlist
> sched:sched_kthread_stop
> sched:sched_kthread_stop_ret
> sched:sched_waking
> sched:sched_wakeup
> sched:sched_wakeup_new
> sched:sched_switch
> sched:sched_migrate_task
> sched:sched_process_free
> sched:sched_process_exit
> sched:sched_wait_task
> sched:sched_process_wait
> sched:sched_process_fork
> sched:sched_process_exec
> sched:sched_stat_wait
> sched:sched_stat_sleep
> sched:sched_stat_iowait
> sched:sched_stat_blocked
> sched:sched_stat_runtime
> sched:sched_pi_setprio
> sched:sched_move_numa
> sched:sched_stick_numa
> sched:sched_swap_numa
> sched:sched_wake_idle_without_ipi
> # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
> [acme@five ~]$ sudo ~acme/bin/perf evlist -v
> sched:sched_kthread_stop: type: 2, size: 120, config: 0x13f, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, 
> sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, 
> bpf_event: 1
> sched:sched_kthread_stop_ret: type: 2, size: 120, config: 0x13e, { 
> sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, 
> read_format: ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 
> 1, exclude_guest: 1
> sched:sched_waking: type: 2, size: 120, config: 0x13d, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_wakeup: type: 2, size: 120, config: 0x13c, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_wakeup_new: type: 2, size: 120, config: 0x13b, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_switch: type: 2, size: 120, config: 0x13a, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_migrate_task: type: 2, size: 120, config: 0x139, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_process_free: type: 2, size: 120, config: 0x138, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_process_exit: type: 2, size: 120, config: 0x137, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inherit: 1, enable_on_exec: 1, sample_id_all: 1, 
> exclude_guest: 1
> sched:sched_wait_task: type: 2, size: 120, config: 0x136, { sample_period, 
> sample_freq }: 1, sample_type: IP|TID|TIME|ID|CPU|PERIOD|RAW, read_format: 
> ID, disabled: 1, inhe

Re: [PATCH 2/3] perf tools: Allow to enable/disable events via control file

2020-12-07 Thread Alexei Budankov
Hi,

On 06.12.2020 20:05, Jiri Olsa wrote:
> Adding new control events to enable/disable specific event.
> The interface string for control file are:
> 
>   'enable-'
>   'disable-'



> 
> when received the command, perf will scan the current evlist
> for  and if found it's enabled/disabled.



> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 70aff26612a9..05723227bebf 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1915,7 +1915,13 @@ static int evlist__ctlfd_recv(struct evlist *evlist, 
> enum evlist_ctl_cmd *cmd,
>bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
>  
>   if (bytes_read > 0) {
> - if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_EVSEL_TAG,
> + 
> (sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG)-1))) {
> + *cmd = EVLIST_CTL_CMD_ENABLE_EVSEL;
> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_EVSEL_TAG,
> + 
> (sizeof(EVLIST_CTL_CMD_DISABLE_EVSEL_TAG)-1))) {
> + *cmd = EVLIST_CTL_CMD_DISABLE_EVSEL;
> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>(sizeof(EVLIST_CTL_CMD_ENABLE_TAG)-1))) {
>   *cmd = EVLIST_CTL_CMD_ENABLE;
>   } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
> @@ -1952,6 +1958,8 @@ int evlist__ctlfd_process(struct evlist *evlist, enum 
> evlist_ctl_cmd *cmd)
>   char cmd_data[EVLIST_CTL_CMD_MAX_LEN];
>   int ctlfd_pos = evlist->ctl_fd.pos;
>   struct pollfd *entries = evlist->core.pollfd.entries;
> + struct evsel *evsel;
> + char *evsel_name;
>  
>   if (!evlist__ctlfd_initialized(evlist) || !entries[ctlfd_pos].revents)
>   return 0;
> @@ -1967,6 +1975,26 @@ int evlist__ctlfd_process(struct evlist *evlist, enum 
> evlist_ctl_cmd *cmd)
>   case EVLIST_CTL_CMD_DISABLE:
>   evlist__disable(evlist);
>   break;
> + case EVLIST_CTL_CMD_ENABLE_EVSEL:
> + evsel_name = cmd_data + 
> sizeof(EVLIST_CTL_CMD_ENABLE_EVSEL_TAG) - 1;

It makes sense to check that evsel_name still points
into cmd_data buffer after assigning to event name.

Regards,
Alexei


Re: [PATCH v2 1/3] Revert "perf session: Fix decompression of PERF_RECORD_COMPRESSED records"

2020-12-02 Thread Alexei Budankov


Hi Alexey. Please see below.

On 02.12.2020 17:04, Bayduraev, Alexey V wrote:
> Hi,
> 
> I was able to reproduce "Couldn't allocate memory for decompression" on 
> 32-bit 
> perf with long perf.data.
> 
> On my side mmap() in perf_session__process_compressed_event() fails with 
> ENOMEM,
> due to exceeded memory limit for 32-bit applications.
> This happens with or without Petr's patch.
> 
> As I can see, these mappings are only released in perf_session__delete().
> We should think how to release them early.
> 
> On 02.12.2020 0:28, Alexei Budankov wrote:
>>
>> Eventually sending to the proper Alexey's address.
>>
>> On 02.12.2020 0:04, Alexei Budankov wrote:
>>>
>>> On 01.12.2020 22:09, Jiri Olsa wrote:
>>>> On Mon, Nov 30, 2020 at 12:40:20PM +0100, Petr Malat wrote:
>>>>> Hi Jiří,
>>>>> were you able to reproduce the issue? I may also upload perf-archive
>>>>> if that would help.
>>>>
>>>> oh yea ;-) seems like those 2 commits you reverted broke 32 bits
>>>> perf for data files > 32MB
>>>>
>>>> but the fix you did does not work for Alexey's test he mentioned
>>>> in the commit:
>>>>
>>>>   $ perf record -z -- some_long_running_workload
>>>>   $ perf report --stdio -vv
>>>>
>>>> it's failing for me with:
>>>>
>>>># ./perf report
>>>>Couldn't allocate memory for decompression
>>>>0xfe6f3a [0x60]: failed to process type: 81 [Operation not permitted]
>>>>Error:
>>>>failed to process sample
>>>># To display the perf.data header info, please use 
>>>> --header/--header-only options.
>>>>#
>>>>
>>>> I think that's why here's special handling for compressed
>>>> events, but I'll need to check on that in more detail,
>>>> I was hoping for Alexey to answer ;-)
>>>
>>> Sorry for delay. Alexey Bayduraev could possibly help with that sooner.
>>> Alexey, could you please follow up.

Next time please avoid top posting and reply in line.
For this specific case it is right here just below my
previous answer.

Thanks,
Alexei



Re: [PATCH v2 1/3] Revert "perf session: Fix decompression of PERF_RECORD_COMPRESSED records"

2020-12-01 Thread Alexei Budankov


Eventually sending to the proper Alexey's address.

On 02.12.2020 0:04, Alexei Budankov wrote:
> 
> On 01.12.2020 22:09, Jiri Olsa wrote:
>> On Mon, Nov 30, 2020 at 12:40:20PM +0100, Petr Malat wrote:
>>> Hi Jiří,
>>> were you able to reproduce the issue? I may also upload perf-archive
>>> if that would help.
>>
>> oh yea ;-) seems like those 2 commits you reverted broke 32 bits
>> perf for data files > 32MB
>>
>> but the fix you did does not work for Alexey's test he mentioned
>> in the commit:
>>
>>   $ perf record -z -- some_long_running_workload
>>   $ perf report --stdio -vv
>>
>> it's failing for me with:
>>
>>  # ./perf report
>>  Couldn't allocate memory for decompression
>>  0xfe6f3a [0x60]: failed to process type: 81 [Operation not permitted]
>>  Error:
>>  failed to process sample
>>  # To display the perf.data header info, please use 
>> --header/--header-only options.
>>  #
>>
>> I think that's why here's special handling for compressed
>> events, but I'll need to check on that in more detail,
>> I was hoping for Alexey to answer ;-)
> 
> Sorry for delay. Alexey Bayduraev could possibly help with that sooner.
> Alexey, could you please follow up.
> 
> Thanks,
> Alexei


Re: [PATCH v2 1/3] Revert "perf session: Fix decompression of PERF_RECORD_COMPRESSED records"

2020-12-01 Thread Alexei Budankov


On 01.12.2020 22:09, Jiri Olsa wrote:
> On Mon, Nov 30, 2020 at 12:40:20PM +0100, Petr Malat wrote:
>> Hi Jiří,
>> were you able to reproduce the issue? I may also upload perf-archive
>> if that would help.
> 
> oh yea ;-) seems like those 2 commits you reverted broke 32 bits
> perf for data files > 32MB
> 
> but the fix you did does not work for Alexey's test he mentioned
> in the commit:
> 
>   $ perf record -z -- some_long_running_workload
>   $ perf report --stdio -vv
> 
> it's failing for me with:
> 
>   # ./perf report
>   Couldn't allocate memory for decompression
>   0xfe6f3a [0x60]: failed to process type: 81 [Operation not permitted]
>   Error:
>   failed to process sample
>   # To display the perf.data header info, please use 
> --header/--header-only options.
>   #
> 
> I think that's why here's special handling for compressed
> events, but I'll need to check on that in more detail,
> I was hoping for Alexey to answer ;-)

Sorry for delay. Alexey Bayduraev could possibly help with that sooner.
Alexey, could you please follow up.

Thanks,
Alexei


Re: [PATCH v2 12/15] perf record: introduce thread local variable for trace streaming

2020-10-26 Thread Alexei Budankov


On 26.10.2020 13:34, Jiri Olsa wrote:
> On Mon, Oct 26, 2020 at 11:21:28AM +0300, Alexei Budankov wrote:
>>
>> On 24.10.2020 18:43, Jiri Olsa wrote:
>>> On Wed, Oct 21, 2020 at 07:07:00PM +0300, Alexey Budankov wrote:
>>>>
>>>> Introduce thread local variable and use it for threaded trace streaming.
>>>>
>>>> Signed-off-by: Alexey Budankov 
>>>> ---
>>>>  tools/perf/builtin-record.c | 71 -
>>>>  1 file changed, 62 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>>> index 89cb8e913fb3..3b7e9026f25b 100644
>>>> --- a/tools/perf/builtin-record.c
>>>> +++ b/tools/perf/builtin-record.c
>>>> @@ -101,6 +101,8 @@ struct thread_data {
>>>>u64bytes_written;
>>>>  };
>>>>  
>>>> +static __thread struct thread_data *thread;
>>>> +
>>>>  struct record {
>>>>struct perf_tooltool;
>>>>struct record_opts  opts;
>>>> @@ -587,7 +589,11 @@ static int record__pushfn(struct mmap *map, void *to, 
>>>> void *bf, size_t size)
>>>>}
>>>>}
>>>>  
>>>> -  rec->samples++;
>>>> +  if (thread)
>>>> +  thread->samples++;
>>>> +  else
>>>> +  rec->samples++;
>>>
>>> this is really wrong, let's keep just single samples counter
>>> ditto for all the other places in this patch
>>
>> This does look like data parallelism [1] which is very true for
>> threaded trace streaming so your prototype design looks optimal.
>>
>> For this specific place incrementing global counter in memory is
>> less performant and faces scalability limitations as a number of
>> cores grow.
>>
>> Not sure why you have changed your mind.
> 
> I'm not sure I follow.. what I'm complaining about is to have
> 'samples' stat variable in separate locations for --threads
> and --no-threads mode

It is optimal to have samples variable as per thread one
and then sum up the total in the end of data collection.

Single global variable design has scalability and performance
drawbacks.

Why do you complain about per thread variable in this case?
It looks like ideally fits these specific needs.

Alexei

> 
> jirka
> 
>>
>> Alexei
>>
>> [1] 
>> https://en.wikipedia.org/wiki/Data_parallelism#:~:text=Data%20parallelism%20is%20parallelization%20across,on%20each%20element%20in%20parallel.
>>
> 


Re: [PATCH v2 08/15] perf record: write trace data into mmap trace files

2020-10-26 Thread Alexei Budankov


On 26.10.2020 13:32, Jiri Olsa wrote:
> On Mon, Oct 26, 2020 at 11:52:21AM +0300, Alexei Budankov wrote:
>>
>> On 24.10.2020 18:44, Jiri Olsa wrote:
>>> On Wed, Oct 21, 2020 at 07:02:56PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>>  
>>>>record__synthesize(rec, true);
>>>> -  /* this will be recalculated during process_buildids() */
>>>> -  rec->samples = 0;
>>>>  
>>>>if (!err) {
>>>>if (!rec->timestamp_filename) {
>>>> @@ -2680,9 +2709,12 @@ int cmd_record(int argc, const char **argv)
>>>>  
>>>>}
>>>>  
>>>> -  if (rec->opts.kcore)
>>>> +  if (rec->opts.kcore || record__threads_enabled(rec))
>>>>rec->data.is_dir = true;
>>>>  
>>>> +  if (record__threads_enabled(rec))
>>>> +  rec->opts.affinity = PERF_AFFINITY_CPU;
>>>
>>> so all the threads will pin to cpu and back before reading?
>>
>> No, they will not back. Thread mask compares to mmap mask before
>> read and the thread migrates if masks don't match. This happens
>> once on the first mmap read. So explicit pinning can be avoided.
> 
> hum, is that right? the check in record__adjust_affinity
> is checking global 'rec->affinity_mask', at lest I assume
> it's still global ;-)

Yes, rec->affinity_mask should also be per-thread. Good catch. Thanks!

Alexei

> 
> if (rec->opts.affinity != PERF_AFFINITY_SYS &&
> !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
>   rec->affinity_mask.nbits)) {
> 
> I think this can never be equal if you have more than one map
> 
> when I check on sched_setaffinity syscalls:
> 
>   # perf trace -e syscalls:sys_enter_sched_setaffinity
> 
> while running record --threads, I see sched_setaffinity
> calls all the time
> 
> jirka
> 


Re: [PATCH v2 05/15] perf session: introduce decompressor into trace reader object

2020-10-26 Thread Alexei Budankov


On 24.10.2020 18:44, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:00:30PM +0300, Alexey Budankov wrote:
>>
>> Introduce decompressor to trace reader object so that decompression
>> could be executed on per trace file basis separately for every
>> trace file located in trace directory.
>>
>> Signed-off-by: Alexey Budankov 
>> ---
>>  tools/perf/util/session.c | 4 +++-
>>  tools/perf/util/session.h | 1 +
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 911b2dbcd0ac..6afc670fdf0c 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -44,6 +44,8 @@ static int perf_session__process_compressed_event(struct 
>> perf_session *session,
>>  u64 decomp_last_rem = 0;
>>  size_t mmap_len, decomp_len = session->header.env.comp_mmap_len;
>>  struct decomp *decomp, *decomp_last = session->decomp_last;
>> +struct zstd_data *zstd_data = session->reader ?
>> +&(session->reader->zstd_data) : &(session->zstd_data);
> 
> I don't think we're using braces in these cases, they are not necessary

Corrected in v3.

Thanks,
Alexei

> 
> jirka
> 
>>  
>>  if (decomp_last) {
>>  decomp_last_rem = decomp_last->size - decomp_last->head;
>> @@ -71,7 +73,7 @@ static int perf_session__process_compressed_event(struct 
>> perf_session *session,
>>  src = (void *)event + sizeof(struct perf_record_compressed);
>>  src_size = event->pack.header.size - sizeof(struct 
>> perf_record_compressed);
>>  
>> -decomp_size = zstd_decompress_stream(&(session->zstd_data), src, 
>> src_size,
>> +decomp_size = zstd_decompress_stream(zstd_data, src, src_size,
>>  &(decomp->data[decomp_last_rem]), decomp_len - 
>> decomp_last_rem);
>>  if (!decomp_size) {
>>  munmap(decomp, mmap_len);
>> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
>> index abdb8518a81f..4fc9ccdf7970 100644
>> --- a/tools/perf/util/session.h
>> +++ b/tools/perf/util/session.h
>> @@ -42,6 +42,7 @@ struct reader {
>>  u64  data_size;
>>  u64  data_offset;
>>  reader_cb_t  process;
>> +struct zstd_data zstd_data;
>>  };
>>  
>>  struct perf_session {
>> -- 
>> 2.24.1
>>
>>
> 


Re: [PATCH v2 09/15] perf record: introduce thread specific objects for trace streaming

2020-10-26 Thread Alexei Budankov



On 24.10.2020 18:44, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:03:48PM +0300, Alexey Budankov wrote:
>>
>> Introduce thread local data object and its array to be used for
>> threaded trace streaming.
>>
>> Signed-off-by: Alexey Budankov 
>> ---
>>  tools/perf/builtin-record.c | 18 ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index ba26d75c51d6..8e512096a060 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -85,11 +85,29 @@ struct switch_output {
>>  int  cur_file;
>>  };
>>  
>> +struct thread_data {
>> +pid_t  tid;
>> +struct {
>> +intmsg[2];
>> +intack[2];
>> +} comm;
>> +struct fdarray pollfd;
>> +intctlfd_pos;
>> +struct mmap*maps;
>> +intnr_mmaps;
>> +struct record  *rec;
>> +unsigned long long samples;
>> +unsigned long  waking;
>> +u64bytes_written;
>> +};
> 
> please merge the struct with the code that's using it

Corrected in v3.

Thanks,
Alexei

> 
> jirka
> 
>> +
>>  struct record {
>>  struct perf_tooltool;
>>  struct record_opts  opts;
>>  u64 bytes_written;
>>  struct perf_datadata;
>> +struct thread_data  *thread_data;
>> +int nr_thread_data;
>>  struct auxtrace_record  *itr;
>>  struct evlist   *evlist;
>>  struct perf_session *session;
>> -- 
>> 2.24.1
>>
> 


Re: [PATCH v2 08/15] perf record: write trace data into mmap trace files

2020-10-26 Thread Alexei Budankov


On 24.10.2020 18:44, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:02:56PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  
>>  record__synthesize(rec, true);
>> -/* this will be recalculated during process_buildids() */
>> -rec->samples = 0;
>>  
>>  if (!err) {
>>  if (!rec->timestamp_filename) {
>> @@ -2680,9 +2709,12 @@ int cmd_record(int argc, const char **argv)
>>  
>>  }
>>  
>> -if (rec->opts.kcore)
>> +if (rec->opts.kcore || record__threads_enabled(rec))
>>  rec->data.is_dir = true;
>>  
>> +if (record__threads_enabled(rec))
>> +rec->opts.affinity = PERF_AFFINITY_CPU;
> 
> so all the threads will pin to cpu and back before reading?

No, they will not back. Thread mask compares to mmap mask before
read and the thread migrates if masks don't match. This happens
once on the first mmap read. So explicit pinning can be avoided.

Alexei


Re: [PATCH v2 14/15] perf record: start threads in the beginning of trace streaming

2020-10-26 Thread Alexei Budankov


On 24.10.2020 18:44, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:10:09PM +0300, Alexey Budankov wrote:
>>
>> Start threads in detached state because its management is possible
>> via messaging. Block signals prior the threads start so only main
>> tool thread would be notified on external async signals during data
>> collection. Streaming threads connect one-to-one to mapped data
>> buffers and write into per-CPU trace files located at data directory.
>> Data buffers and threads are affined to local NUMA nodes and monitored
>> CPUs according to system topology. --cpu option can be used to specify
>> CPUs to be monitored.
>>
>> Signed-off-by: Alexey Budankov 
>> ---
>>  tools/perf/builtin-record.c | 128 +---
>>  1 file changed, 120 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index a15642656066..1d41e996a994 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -56,6 +56,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #ifdef HAVE_EVENTFD_SUPPORT
>> @@ -1377,6 +1378,62 @@ static void record__thread_munmap_filtered(struct 
>> fdarray *fda, int fd,
>>  perf_mmap__put(map);
>>  }
>>  
>> +static void *record__thread(void *arg)
>> +{
>> +enum thread_msg msg = THREAD_MSG__READY;
>> +bool terminate = false;
>> +struct fdarray *pollfd;
>> +int err, ctlfd_pos;
>> +
>> +thread = arg;
>> +thread->tid = syscall(SYS_gettid);
>> +
>> +err = write(thread->comm.ack[1], &msg, sizeof(msg));
>> +if (err == -1)
>> +pr_err("threads: %d failed to notify on start. Error %m", 
>> thread->tid);
>> +
>> +pollfd = &(thread->pollfd);
> 
> I don't think braces are necessary in here

Corrected in v3.

Alexei


Re: [PATCH v2 10/15] perf record: manage thread specific data array

2020-10-26 Thread Alexei Budankov


On 24.10.2020 18:44, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:04:26PM +0300, Alexey Budankov wrote:
>>
>> Provide allocation, initialization, finalization and releasing of
>> thread specific objects at thread specific data array. Allocate
>> thread specific object for every data buffer making one-to-one
>> relation between data buffer and a thread processing the buffer.
>> Deliver event fd related signals to thread's pollfd object.
>> Deliver thread control commands to ctlfd_pos fd of thread 1+.
>> Deliver tool external control commands via ctlfd_pos fd of thread 0.
>>
>> Signed-off-by: Alexey Budankov 
>> ---
>>  tools/perf/builtin-record.c | 101 ++--
>>  1 file changed, 98 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 8e512096a060..89cb8e913fb3 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -884,6 +884,94 @@ static int record__kcore_copy(struct machine *machine, 
>> struct perf_data *data)
>>  return kcore_copy(from_dir, kcore_dir);
>>  }
>>  
>> +static int record__alloc_thread_data(struct record *rec, struct mmap 
>> *mmaps, int nr_mmaps,
>> + struct fdarray *evlist_pollfd, int 
>> ctlfd_pos)
>> +{
>> +int i, j, k, nr_thread_data;
>> +struct thread_data *thread_data;
>> +
>> +rec->nr_thread_data = nr_thread_data = nr_mmaps;
>> +rec->thread_data = thread_data = zalloc(rec->nr_thread_data * 
>> sizeof(*(rec->thread_data)));
>> +if (!thread_data) {
>> +pr_err("Failed to allocate thread data\n");
>> +return -ENOMEM;
>> +}
>> +
>> +for (i = 0; i < nr_thread_data; i++) {
>> +short revents;
>> +int pos, fd;
>> +
>> +thread_data[i].tid = -1;
>> +
>> +if (pipe(thread_data[i].comm.msg) ||
>> +pipe(thread_data[i].comm.ack)) {
>> +pr_err("Failed to create thread comm pipes, errno 
>> %d\n", errno);
>> +return -ENOMEM;
>> +}
> 
> the original code was using state flag and pthread_cond,
> which I think is more readable
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=perf/record_threads&id=a7da527ff8be69572c6d17525c03c6fe394503c8
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=perf/record_threads&id=eb85ce4da64a885fdb6c77cfc5bd71312fe02e2a

Yes, right, but messaging scales better and that critical section
to just increment global counter in memory looked as over complication.

> 
>> +
>> +thread_data[i].maps = &mmaps[i];
>> +thread_data[i].nr_mmaps = 1;
>> +
>> +thread_data[i].rec = rec;
>> +
>> +fdarray__init(&(thread_data[i].pollfd), 64);
>> +
>> +for (j = 0; j < thread_data[i].nr_mmaps; j++) {
>> +struct mmap *map = &(thread_data[i].maps[j]);
>> +
>> +for (k = 0; k < evlist_pollfd->nr; k++) {
>> +if (evlist_pollfd->priv[k].ptr != map)
>> +continue;
>> +
>> +fd = evlist_pollfd->entries[k].fd;
>> +revents = evlist_pollfd->entries[k].events;
>> +pos = fdarray__add(&(thread_data[i].pollfd),
>> +fd, revents | POLLERR | POLLHUP,
>> +fdarray_flag__default);
>> +if (pos >= 0)
>> +thread_data[i].pollfd.priv[pos].ptr = 
>> map;
>> +else
>> +return -ENOMEM;
> 
> I added function for that:
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=perf/record_threads&id=8aa6e68a7471b9d25af1a9eebfa9321433366a17

Ok, but it is not clone operation it is more like ordinary adding of
fd into another pollfd object. It could be wrapped into a function
e.g. fdarray__dup(pollfd_dst, fd_src, fd_revents)

Alexei


Re: [PATCH v2 12/15] perf record: introduce thread local variable for trace streaming

2020-10-26 Thread Alexei Budankov


On 24.10.2020 18:43, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 07:07:00PM +0300, Alexey Budankov wrote:
>>
>> Introduce thread local variable and use it for threaded trace streaming.
>>
>> Signed-off-by: Alexey Budankov 
>> ---
>>  tools/perf/builtin-record.c | 71 -
>>  1 file changed, 62 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 89cb8e913fb3..3b7e9026f25b 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -101,6 +101,8 @@ struct thread_data {
>>  u64bytes_written;
>>  };
>>  
>> +static __thread struct thread_data *thread;
>> +
>>  struct record {
>>  struct perf_tooltool;
>>  struct record_opts  opts;
>> @@ -587,7 +589,11 @@ static int record__pushfn(struct mmap *map, void *to, 
>> void *bf, size_t size)
>>  }
>>  }
>>  
>> -rec->samples++;
>> +if (thread)
>> +thread->samples++;
>> +else
>> +rec->samples++;
> 
> this is really wrong, let's keep just single samples counter
> ditto for all the other places in this patch

This does look like data parallelism [1] which is very true for
threaded trace streaming so your prototype design looks optimal.

For this specific place incrementing global counter in memory is
less performant and faces scalability limitations as a number of
cores grow.

Not sure why you have changed your mind.

Alexei

[1] 
https://en.wikipedia.org/wiki/Data_parallelism#:~:text=Data%20parallelism%20is%20parallelization%20across,on%20each%20element%20in%20parallel.