Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-03-01 Thread Ravi Bangoria
Thank you Brendan for reviewing,

On Wednesday 01 March 2017 10:34 AM, Brendan Gregg wrote:
> On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
>  wrote:
>> G'Day Ravi,
>>
> [...]
>> Now retrying perf:
>>
>> # ./perf record -e sdt_node:http__server__request -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
>> # ./perf script
>> node  7646 [002]   361.012364:
>> sdt_node:http__server__request: (dc2e69)
>> node  7646 [002]   361.204718:
>> sdt_node:http__server__request: (dc2e69)
>> node  7646 [002]   361.363043:
>> sdt_node:http__server__request: (dc2e69)
>>
>> Now perf works.
>>
>> If I restart the node process, it goes back to the broken state.
>>
> Oh sorry, I forgot about that these Node.js probes are behind an
> is-enabled semaphore.

Yes. Perf does not support "is-enabled" markers yet.

> $ readelf -n `which node`
> [...]
>   stapsdt  0x0089NT_STAPSDT (SystemTap probe descriptors)
> Provider: node
> Name: http__server__request
> Location: 0x00dc2e69, Base: 0x0112e064, Semaphore:
> 0x01470954
> Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
> 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
> 2>/dev/null | xxd
> : 00   .
>  # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
> 0x01470954 )) 2>/dev/null
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
> 2>/dev/null | xxd
> : 01   .
> # ./perf record -e sdt_node:http__server__request -a
> Matching event(s) from uprobe_events:
>sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
> Use 'perf probe -d ' to delete event(s).
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
> # ./perf script
> node 31695 [003] 24947.168761:
> sdt_node:http__server__request: (dc2e69)
> node 31695 [003] 24947.476143:
> sdt_node:http__server__request: (dc2e69)
> node 31695 [003] 24947.679090:
> sdt_node:http__server__request: (dc2e69)
>
> So setting that to 1 made the probe work from perf. I guess this is
> not a problem with this patch set, but rather a feature request for
> the next one: is-enabled SDT support.

Yes. Actually I'm thinking about how this can be accomplished. Perf is a 
userspace
tool and, unlike systemtap, it cannot change semaphore value easily. This is 
what
I was thinking:

'perf record', at the start of session will increments semaphore in 
/proc//mem.
and at the end of session, perf will decrement it (same as bcc). This does not 
require
any support from kernel infrastructure. But there are challenges with this 
approach:

1. What if user starts workload after starting 'perf record'. How perf will be 
able
to increment semaphore value.

2. Systemwide record. We have to loop over all pids and check if any process is
   using SDT with semaphore that is being recorded.

3. Dynamic library loading. How to handle SDT probes in library that is not 
loaded
   at the time of 'perf record'?

Please let me know your thoughts.

> Were probe arguments supposed to work? I don't notice them in the perf
> script output.

Not yet. Alexis[1] (and followed by me[2]) had sent patches for that. Please
have a look at them.

[1] https://lkml.org/lkml/2016/12/13/784
[2] https://lkml.org/lkml/2017/2/2/145


So, why perf is able to record data after recording them with bcc?

Ideally, bcc should increment semaphore value at the start of session and
it should decrement at the end of the session. So after bcc process exits,
semaphore value should be zero. But actually it's not happening.

I've seen this when I was experimenting bcc with is-enabled markers.
See this example,

  $ readelf -n /usr/bin/node | grep -A2 Provider
  Provider: node
  Name: http__server__request
  Location: 0x00e5f484, Base: 0x011a0bc4, Semaphore: 
0x01558cf2

  $ sudo cat /proc/1426/maps
0040-01306000 r-xp  08:02 1083365/usr/bin/node
01506000-01551000 r--p 00f06000 08:02 1083365/usr/bin/node
01551000-01559000 rw-p 00f51000 08:02 1083365/usr/bin/node
...

  [TERMINAL-1]$ gdb 1426
(gdb) x/1 0x1558cf2
0x1558cf2:0

  [TERMINAL-2]$ sudo ./trace.py -p 1426 'u:node:http__server__request'
PIDTIDCOMM FUNC
/* Do not exit yet. */

  [TERMINAL-1]
(gdb) x/1 0x1558cf2
0x1558cf2:2

  [TERMINAL-2]
 ^C /* Exit bcc trace.py */

  [TERMINAL-1]
(gdb) x/1 0x1558cf2
0x1558cf2:2

Here it's maintaining value 2 as it is. it should be 0. I suspect this is a bug 
in
bcc. Please let me know if I'm understanding it wrong.

>
> PS, if it's helpful, here's the commands to build node 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-03-01 Thread Ravi Bangoria
Thank you Brendan for reviewing,

On Wednesday 01 March 2017 10:34 AM, Brendan Gregg wrote:
> On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
>  wrote:
>> G'Day Ravi,
>>
> [...]
>> Now retrying perf:
>>
>> # ./perf record -e sdt_node:http__server__request -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
>> # ./perf script
>> node  7646 [002]   361.012364:
>> sdt_node:http__server__request: (dc2e69)
>> node  7646 [002]   361.204718:
>> sdt_node:http__server__request: (dc2e69)
>> node  7646 [002]   361.363043:
>> sdt_node:http__server__request: (dc2e69)
>>
>> Now perf works.
>>
>> If I restart the node process, it goes back to the broken state.
>>
> Oh sorry, I forgot about that these Node.js probes are behind an
> is-enabled semaphore.

Yes. Perf does not support "is-enabled" markers yet.

> $ readelf -n `which node`
> [...]
>   stapsdt  0x0089NT_STAPSDT (SystemTap probe descriptors)
> Provider: node
> Name: http__server__request
> Location: 0x00dc2e69, Base: 0x0112e064, Semaphore:
> 0x01470954
> Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
> 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
> 2>/dev/null | xxd
> : 00   .
>  # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
> 0x01470954 )) 2>/dev/null
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
> 2>/dev/null | xxd
> : 01   .
> # ./perf record -e sdt_node:http__server__request -a
> Matching event(s) from uprobe_events:
>sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
> Use 'perf probe -d ' to delete event(s).
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
> # ./perf script
> node 31695 [003] 24947.168761:
> sdt_node:http__server__request: (dc2e69)
> node 31695 [003] 24947.476143:
> sdt_node:http__server__request: (dc2e69)
> node 31695 [003] 24947.679090:
> sdt_node:http__server__request: (dc2e69)
>
> So setting that to 1 made the probe work from perf. I guess this is
> not a problem with this patch set, but rather a feature request for
> the next one: is-enabled SDT support.

Yes. Actually I'm thinking about how this can be accomplished. Perf is a 
userspace
tool and, unlike systemtap, it cannot change semaphore value easily. This is 
what
I was thinking:

'perf record', at the start of session will increments semaphore in 
/proc//mem.
and at the end of session, perf will decrement it (same as bcc). This does not 
require
any support from kernel infrastructure. But there are challenges with this 
approach:

1. What if user starts workload after starting 'perf record'. How perf will be 
able
to increment semaphore value.

2. Systemwide record. We have to loop over all pids and check if any process is
   using SDT with semaphore that is being recorded.

3. Dynamic library loading. How to handle SDT probes in library that is not 
loaded
   at the time of 'perf record'?

Please let me know your thoughts.

> Were probe arguments supposed to work? I don't notice them in the perf
> script output.

Not yet. Alexis[1] (and followed by me[2]) had sent patches for that. Please
have a look at them.

[1] https://lkml.org/lkml/2016/12/13/784
[2] https://lkml.org/lkml/2017/2/2/145


So, why perf is able to record data after recording them with bcc?

Ideally, bcc should increment semaphore value at the start of session and
it should decrement at the end of the session. So after bcc process exits,
semaphore value should be zero. But actually it's not happening.

I've seen this when I was experimenting bcc with is-enabled markers.
See this example,

  $ readelf -n /usr/bin/node | grep -A2 Provider
  Provider: node
  Name: http__server__request
  Location: 0x00e5f484, Base: 0x011a0bc4, Semaphore: 
0x01558cf2

  $ sudo cat /proc/1426/maps
0040-01306000 r-xp  08:02 1083365/usr/bin/node
01506000-01551000 r--p 00f06000 08:02 1083365/usr/bin/node
01551000-01559000 rw-p 00f51000 08:02 1083365/usr/bin/node
...

  [TERMINAL-1]$ gdb 1426
(gdb) x/1 0x1558cf2
0x1558cf2:0

  [TERMINAL-2]$ sudo ./trace.py -p 1426 'u:node:http__server__request'
PIDTIDCOMM FUNC
/* Do not exit yet. */

  [TERMINAL-1]
(gdb) x/1 0x1558cf2
0x1558cf2:2

  [TERMINAL-2]
 ^C /* Exit bcc trace.py */

  [TERMINAL-1]
(gdb) x/1 0x1558cf2
0x1558cf2:2

Here it's maintaining value 2 as it is. it should be 0. I suspect this is a bug 
in
bcc. Please let me know if I'm understanding it wrong.

>
> PS, if it's helpful, here's the commands to build node with these SDT probes:
>
> $ 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Brendan Gregg
On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
 wrote:
> G'Day Ravi,
>
[...]
> Now retrying perf:
>
> # ./perf record -e sdt_node:http__server__request -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
> # ./perf script
> node  7646 [002]   361.012364:
> sdt_node:http__server__request: (dc2e69)
> node  7646 [002]   361.204718:
> sdt_node:http__server__request: (dc2e69)
> node  7646 [002]   361.363043:
> sdt_node:http__server__request: (dc2e69)
>
> Now perf works.
>
> If I restart the node process, it goes back to the broken state.
>

Oh sorry, I forgot about that these Node.js probes are behind an
is-enabled semaphore.

$ readelf -n `which node`
[...]
  stapsdt  0x0089NT_STAPSDT (SystemTap probe descriptors)
Provider: node
Name: http__server__request
Location: 0x00dc2e69, Base: 0x0112e064, Semaphore:
0x01470954
Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
2>/dev/null | xxd
: 00   .
 # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
0x01470954 )) 2>/dev/null
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
2>/dev/null | xxd
: 01   .
# ./perf record -e sdt_node:http__server__request -a
Matching event(s) from uprobe_events:
   sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
Use 'perf probe -d ' to delete event(s).
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
# ./perf script
node 31695 [003] 24947.168761:
sdt_node:http__server__request: (dc2e69)
node 31695 [003] 24947.476143:
sdt_node:http__server__request: (dc2e69)
node 31695 [003] 24947.679090:
sdt_node:http__server__request: (dc2e69)

So setting that to 1 made the probe work from perf. I guess this is
not a problem with this patch set, but rather a feature request for
the next one: is-enabled SDT support.

Were probe arguments supposed to work? I don't notice them in the perf
script output.

PS, if it's helpful, here's the commands to build node with these SDT probes:

$ sudo apt-get install systemtap-sdt-dev   # adds "dtrace", used
by node build
$ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz
$ tar xvf node-v4.4.1.tar.gz
$ cd node-v4.4.1
$ ./configure --with-dtrace
$ make -j 8

Brendan


Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Brendan Gregg
On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
 wrote:
> G'Day Ravi,
>
[...]
> Now retrying perf:
>
> # ./perf record -e sdt_node:http__server__request -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
> # ./perf script
> node  7646 [002]   361.012364:
> sdt_node:http__server__request: (dc2e69)
> node  7646 [002]   361.204718:
> sdt_node:http__server__request: (dc2e69)
> node  7646 [002]   361.363043:
> sdt_node:http__server__request: (dc2e69)
>
> Now perf works.
>
> If I restart the node process, it goes back to the broken state.
>

Oh sorry, I forgot about that these Node.js probes are behind an
is-enabled semaphore.

$ readelf -n `which node`
[...]
  stapsdt  0x0089NT_STAPSDT (SystemTap probe descriptors)
Provider: node
Name: http__server__request
Location: 0x00dc2e69, Base: 0x0112e064, Semaphore:
0x01470954
Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
2>/dev/null | xxd
: 00   .
 # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
0x01470954 )) 2>/dev/null
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 ))
2>/dev/null | xxd
: 01   .
# ./perf record -e sdt_node:http__server__request -a
Matching event(s) from uprobe_events:
   sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
Use 'perf probe -d ' to delete event(s).
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
# ./perf script
node 31695 [003] 24947.168761:
sdt_node:http__server__request: (dc2e69)
node 31695 [003] 24947.476143:
sdt_node:http__server__request: (dc2e69)
node 31695 [003] 24947.679090:
sdt_node:http__server__request: (dc2e69)

So setting that to 1 made the probe work from perf. I guess this is
not a problem with this patch set, but rather a feature request for
the next one: is-enabled SDT support.

Were probe arguments supposed to work? I don't notice them in the perf
script output.

PS, if it's helpful, here's the commands to build node with these SDT probes:

$ sudo apt-get install systemtap-sdt-dev   # adds "dtrace", used
by node build
$ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz
$ tar xvf node-v4.4.1.tar.gz
$ cd node-v4.4.1
$ ./configure --with-dtrace
$ make -j 8

Brendan


Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Masami Hiramatsu
On Tue, 28 Feb 2017 16:26:19 +0530
Ravi Bangoria  wrote:

> Thanks Masami for the review.
> Please find my replies to your comments.
> 
> On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> > On Fri, 24 Feb 2017 13:13:25 +0530
> > Ravi Bangoria  wrote:
> >
> >> After invoking 'perf record', behind the scenes, it checks whether
> >> the event specified is an SDT event by using the flag '%' or string
> >> 'sdt_'. After that, it first checks whether event already exists
> >> with that *name* in uprobe_events file. If found, it records that
> >> particular event. Otherwise, it does a lookup of the probe cache to
> >> find out the SDT event. If its not present, it throws an error.
> >> If found, it again tries to find existing events from uprobe_events
> >> file, but this time it uses *address* and *filename* for comparison.
> >> Finally it writes new events into the uprobe_events file and starts
> >> recording. It also maintains a list of the event names that were
> >> written to uprobe_events file for this session. After finishing the
> >> record session, it removes the events from the uprobe_events file
> >> using the maintained name list.
> > OK, the behavior looks good. However, the implementation seems
> > too complex, could you make it simpler?
> > I have some comments on the code, see below.
> 
> Yes, it's complex code. It's all because of multiple SDT events with same name
> exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- 
> which
> creates complex situations that has to be taken care. See my replies below for
> such examples.

OK, but in that case, could you split those support for review?
E.g. the first one just combine perf-probe and perf-record and
second one support multiple SDT events.


> 
> >> +/*
> >> + * Remove ith tev from pev->tevs list and shift remaining
> >> + * tevs(i+1 to pev->ntevs) one step.
> >> + */
> >> +static void shift_pev(struct perf_probe_event *pev, int i)
> >> +{
> >> +  int j;
> >> +
> >> +  free(pev->tevs[i].event);
> >> +  free(pev->tevs[i].group);
> >> +  free(pev->tevs[i].args);
> >> +  free(pev->tevs[i].point.realname);
> >> +  free(pev->tevs[i].point.symbol);
> >> +  free(pev->tevs[i].point.module);
> >> +
> >> +  /*
> >> +   * If ith element is last element, no need to shift,
> >> +   * just decrement pev->ntevs.
> >> +   */
> >> +  if (i == pev->ntevs - 1)
> >> +  goto ret;
> >> +
> >> +  for (j = i; j < pev->ntevs - 1; j++) {
> >> +  pev->tevs[j].event  = pev->tevs[j + 1].event;
> >> +  pev->tevs[j].group  = pev->tevs[j + 1].group;
> >> +  pev->tevs[j].nargs  = pev->tevs[j + 1].nargs;
> >> +  pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes;
> >> +  pev->tevs[j].args   = pev->tevs[j + 1].args;
> >> +  pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
> >> +  pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
> >> +  pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
> >> +  pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
> >> +  pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
> >> +  pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
> >> +  }
> >> +
> >> +ret:
> >> +  pev->ntevs--;
> >> +}
> > This code smells no good...
> > Why don't you make a list of newly added probes ?
> >
> 
> Yes. This is little ugly. Consider a following scenario.
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
> Provider: libpthread
> Name: mutex_entry
> Location: 0x9ddb, Base: 0x000139cc, ...
>   --
> Provider: libpthread
> Name: mutex_entry
> Location: 0xbcbb, Base: 0x000139cc, ...
> 
>   $ perf probe sdt_libpthread:mutex_entry
> 
>   $ cat /sys/kernel/debug/tracing/uprobe_events
> p:sdt_libpthread/mutex_entry 
> /usr/lib64/libpthread-2.24.so:0x9ddb
> p:sdt_libpthread/mutex_entry_1 
> /usr/lib64/libpthread-2.24.so:0xbcbb
> 
>   $ perf probe -d sdt_libpthread:mutex_entry
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
>  /* This should record:
>sdt_libpthread/mutex_entry  <== New entry (delete at the 
> end of session)
>sdt_libpthread/mutex_entry_1  <== Reuse (do not delete at the 
> end of session)
>   */

Hmm, OK. but this also not becomes the reason why you need 2 different
lists. After calling apply_perf_probe_events(), pev->tevs have been
updated, which have applied names. So you can pull it from the trace_event.

> 
> First of all, perf looks into uprobe_events for entry with 
> sdt_libpthread:mutex_entry
> *name*. As it's not present, perf does a lookup in probe-cache and gets two 
> entries.
> Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events 
> 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Masami Hiramatsu
On Tue, 28 Feb 2017 16:26:19 +0530
Ravi Bangoria  wrote:

> Thanks Masami for the review.
> Please find my replies to your comments.
> 
> On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> > On Fri, 24 Feb 2017 13:13:25 +0530
> > Ravi Bangoria  wrote:
> >
> >> After invoking 'perf record', behind the scenes, it checks whether
> >> the event specified is an SDT event by using the flag '%' or string
> >> 'sdt_'. After that, it first checks whether event already exists
> >> with that *name* in uprobe_events file. If found, it records that
> >> particular event. Otherwise, it does a lookup of the probe cache to
> >> find out the SDT event. If its not present, it throws an error.
> >> If found, it again tries to find existing events from uprobe_events
> >> file, but this time it uses *address* and *filename* for comparison.
> >> Finally it writes new events into the uprobe_events file and starts
> >> recording. It also maintains a list of the event names that were
> >> written to uprobe_events file for this session. After finishing the
> >> record session, it removes the events from the uprobe_events file
> >> using the maintained name list.
> > OK, the behavior looks good. However, the implementation seems
> > too complex, could you make it simpler?
> > I have some comments on the code, see below.
> 
> Yes, it's complex code. It's all because of multiple SDT events with same name
> exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- 
> which
> creates complex situations that has to be taken care. See my replies below for
> such examples.

OK, but in that case, could you split those support for review?
E.g. the first one just combine perf-probe and perf-record and
second one support multiple SDT events.


> 
> >> +/*
> >> + * Remove ith tev from pev->tevs list and shift remaining
> >> + * tevs(i+1 to pev->ntevs) one step.
> >> + */
> >> +static void shift_pev(struct perf_probe_event *pev, int i)
> >> +{
> >> +  int j;
> >> +
> >> +  free(pev->tevs[i].event);
> >> +  free(pev->tevs[i].group);
> >> +  free(pev->tevs[i].args);
> >> +  free(pev->tevs[i].point.realname);
> >> +  free(pev->tevs[i].point.symbol);
> >> +  free(pev->tevs[i].point.module);
> >> +
> >> +  /*
> >> +   * If ith element is last element, no need to shift,
> >> +   * just decrement pev->ntevs.
> >> +   */
> >> +  if (i == pev->ntevs - 1)
> >> +  goto ret;
> >> +
> >> +  for (j = i; j < pev->ntevs - 1; j++) {
> >> +  pev->tevs[j].event  = pev->tevs[j + 1].event;
> >> +  pev->tevs[j].group  = pev->tevs[j + 1].group;
> >> +  pev->tevs[j].nargs  = pev->tevs[j + 1].nargs;
> >> +  pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes;
> >> +  pev->tevs[j].args   = pev->tevs[j + 1].args;
> >> +  pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
> >> +  pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
> >> +  pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
> >> +  pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
> >> +  pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
> >> +  pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
> >> +  }
> >> +
> >> +ret:
> >> +  pev->ntevs--;
> >> +}
> > This code smells no good...
> > Why don't you make a list of newly added probes ?
> >
> 
> Yes. This is little ugly. Consider a following scenario.
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
> Provider: libpthread
> Name: mutex_entry
> Location: 0x9ddb, Base: 0x000139cc, ...
>   --
> Provider: libpthread
> Name: mutex_entry
> Location: 0xbcbb, Base: 0x000139cc, ...
> 
>   $ perf probe sdt_libpthread:mutex_entry
> 
>   $ cat /sys/kernel/debug/tracing/uprobe_events
> p:sdt_libpthread/mutex_entry 
> /usr/lib64/libpthread-2.24.so:0x9ddb
> p:sdt_libpthread/mutex_entry_1 
> /usr/lib64/libpthread-2.24.so:0xbcbb
> 
>   $ perf probe -d sdt_libpthread:mutex_entry
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
>  /* This should record:
>sdt_libpthread/mutex_entry  <== New entry (delete at the 
> end of session)
>sdt_libpthread/mutex_entry_1  <== Reuse (do not delete at the 
> end of session)
>   */

Hmm, OK. but this also not becomes the reason why you need 2 different
lists. After calling apply_perf_probe_events(), pev->tevs have been
updated, which have applied names. So you can pull it from the trace_event.

> 
> First of all, perf looks into uprobe_events for entry with 
> sdt_libpthread:mutex_entry
> *name*. As it's not present, perf does a lookup in probe-cache and gets two 
> entries.
> Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events 
> entries
> with entries found from probe-cache with *filename* and 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Brendan Gregg
G'Day Ravi,

On Thu, Feb 23, 2017 at 11:43 PM, Ravi Bangoria
 wrote:
>
> From: Hemant Kumar 
>
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
>
> For example :
>
>   $ perf list sdt
> sdt_libpthread:mutex_entry [SDT event]
> sdt_libc:setjmp[SDT event]
> ...
>
>   $ perf record -a -e sdt_libc:setjmp
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
>
>   $ perf script
>  bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
> login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>   fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> [...]


Thanks, I like the usage. I ran into trouble testing on Node.js:

# ./perf buildid-cache --add `which node`
# ./perf list | grep sdt_node
  sdt_node:gc__done  [SDT event]
  sdt_node:gc__start [SDT event]
  sdt_node:http__client__request [SDT event]
  sdt_node:http__client__response[SDT event]
  sdt_node:http__server__request [SDT event]
  sdt_node:http__server__response[SDT event]
  sdt_node:net__server__connection   [SDT event]
  sdt_node:net__stream__end  [SDT event]
# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.308 MB perf.data ]
# ./perf script
#

No events. I can see it had set it up:

# cat /sys/kernel/debug/tracing/uprobe_events
p:sdt_node/http__server__request /usr/local/bin/node:0x009c2e69

Ok. Am I sure my workload is working? Trying from bcc/eBPF:

# /mnt/src/bcc/tools/trace.py 'u:node:http__server__request'
failed to enable probe 'http__server__request'; a possible cause can
be that the probe requires a pid to enable
# /mnt/src/bcc/tools/trace.py -p `pgrep node` 'u:node:http__server__request'
In file included from /virtual/main.c:41:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/blkdev.h:9:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/genhd.h:64:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/device.h:24:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/devinfo.h:21:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/consumer.h:17:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/seq_file.h:10:
/lib/modules/4.10.0-rc8-virtual/build/include/linux/fs.h:2648:9:
warning: comparison of unsigned enum expression < 0 is always false
[-Wtautological-compare]
if (id < 0 || id >= READING_MAX_ID)
~~ ^ ~
1 warning generated.
PIDTIDCOMM FUNC
7646   7646   node http__server__request
7646   7646   node http__server__request
7646   7646   node http__server__request
^C

(ignore the warning; I just asked lkml about it). So that works. It
instrumented:

# cat /sys/kernel/debug/tracing/uprobe_events
p:uprobes/p__usr_local_bin_node_0x9c2e69_bcc_25410
/usr/local/bin/node:0x009c2e69

Now retrying perf:

# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
# ./perf script
node  7646 [002]   361.012364:
sdt_node:http__server__request: (dc2e69)
node  7646 [002]   361.204718:
sdt_node:http__server__request: (dc2e69)
node  7646 [002]   361.363043:
sdt_node:http__server__request: (dc2e69)

Now perf works.

If I restart the node process, it goes back to the broken state.

This doesn't fix it either:

# ./perf probe sdt_node:http__server__request
Added new event:
  sdt_node:http__server__request (on %http__server__request in
/usr/local/bin/node)

You can now use it in all perf tools, such as:

perf record -e sdt_node:http__server__request -aR sleep 1

Hint: SDT event can be directly recorded with 'perf record'. No need
to create probe manually.

Brendan


Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Brendan Gregg
G'Day Ravi,

On Thu, Feb 23, 2017 at 11:43 PM, Ravi Bangoria
 wrote:
>
> From: Hemant Kumar 
>
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
>
> For example :
>
>   $ perf list sdt
> sdt_libpthread:mutex_entry [SDT event]
> sdt_libc:setjmp[SDT event]
> ...
>
>   $ perf record -a -e sdt_libc:setjmp
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
>
>   $ perf script
>  bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
> login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>   fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> [...]


Thanks, I like the usage. I ran into trouble testing on Node.js:

# ./perf buildid-cache --add `which node`
# ./perf list | grep sdt_node
  sdt_node:gc__done  [SDT event]
  sdt_node:gc__start [SDT event]
  sdt_node:http__client__request [SDT event]
  sdt_node:http__client__response[SDT event]
  sdt_node:http__server__request [SDT event]
  sdt_node:http__server__response[SDT event]
  sdt_node:net__server__connection   [SDT event]
  sdt_node:net__stream__end  [SDT event]
# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.308 MB perf.data ]
# ./perf script
#

No events. I can see it had set it up:

# cat /sys/kernel/debug/tracing/uprobe_events
p:sdt_node/http__server__request /usr/local/bin/node:0x009c2e69

Ok. Am I sure my workload is working? Trying from bcc/eBPF:

# /mnt/src/bcc/tools/trace.py 'u:node:http__server__request'
failed to enable probe 'http__server__request'; a possible cause can
be that the probe requires a pid to enable
# /mnt/src/bcc/tools/trace.py -p `pgrep node` 'u:node:http__server__request'
In file included from /virtual/main.c:41:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/blkdev.h:9:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/genhd.h:64:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/device.h:24:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/devinfo.h:21:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/consumer.h:17:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/seq_file.h:10:
/lib/modules/4.10.0-rc8-virtual/build/include/linux/fs.h:2648:9:
warning: comparison of unsigned enum expression < 0 is always false
[-Wtautological-compare]
if (id < 0 || id >= READING_MAX_ID)
~~ ^ ~
1 warning generated.
PIDTIDCOMM FUNC
7646   7646   node http__server__request
7646   7646   node http__server__request
7646   7646   node http__server__request
^C

(ignore the warning; I just asked lkml about it). So that works. It
instrumented:

# cat /sys/kernel/debug/tracing/uprobe_events
p:uprobes/p__usr_local_bin_node_0x9c2e69_bcc_25410
/usr/local/bin/node:0x009c2e69

Now retrying perf:

# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
# ./perf script
node  7646 [002]   361.012364:
sdt_node:http__server__request: (dc2e69)
node  7646 [002]   361.204718:
sdt_node:http__server__request: (dc2e69)
node  7646 [002]   361.363043:
sdt_node:http__server__request: (dc2e69)

Now perf works.

If I restart the node process, it goes back to the broken state.

This doesn't fix it either:

# ./perf probe sdt_node:http__server__request
Added new event:
  sdt_node:http__server__request (on %http__server__request in
/usr/local/bin/node)

You can now use it in all perf tools, such as:

perf record -e sdt_node:http__server__request -aR sleep 1

Hint: SDT event can be directly recorded with 'perf record'. No need
to create probe manually.

Brendan


Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Ravi Bangoria
Thanks Masami for the review.
Please find my replies to your comments.

On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> On Fri, 24 Feb 2017 13:13:25 +0530
> Ravi Bangoria  wrote:
>
>> After invoking 'perf record', behind the scenes, it checks whether
>> the event specified is an SDT event by using the flag '%' or string
>> 'sdt_'. After that, it first checks whether event already exists
>> with that *name* in uprobe_events file. If found, it records that
>> particular event. Otherwise, it does a lookup of the probe cache to
>> find out the SDT event. If its not present, it throws an error.
>> If found, it again tries to find existing events from uprobe_events
>> file, but this time it uses *address* and *filename* for comparison.
>> Finally it writes new events into the uprobe_events file and starts
>> recording. It also maintains a list of the event names that were
>> written to uprobe_events file for this session. After finishing the
>> record session, it removes the events from the uprobe_events file
>> using the maintained name list.
> OK, the behavior looks good. However, the implementation seems
> too complex, could you make it simpler?
> I have some comments on the code, see below.

Yes, it's complex code. It's all because of multiple SDT events with same name
exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- 
which
creates complex situations that has to be taken care. See my replies below for
such examples.

>> +/*
>> + * Remove ith tev from pev->tevs list and shift remaining
>> + * tevs(i+1 to pev->ntevs) one step.
>> + */
>> +static void shift_pev(struct perf_probe_event *pev, int i)
>> +{
>> +int j;
>> +
>> +free(pev->tevs[i].event);
>> +free(pev->tevs[i].group);
>> +free(pev->tevs[i].args);
>> +free(pev->tevs[i].point.realname);
>> +free(pev->tevs[i].point.symbol);
>> +free(pev->tevs[i].point.module);
>> +
>> +/*
>> + * If ith element is last element, no need to shift,
>> + * just decrement pev->ntevs.
>> + */
>> +if (i == pev->ntevs - 1)
>> +goto ret;
>> +
>> +for (j = i; j < pev->ntevs - 1; j++) {
>> +pev->tevs[j].event  = pev->tevs[j + 1].event;
>> +pev->tevs[j].group  = pev->tevs[j + 1].group;
>> +pev->tevs[j].nargs  = pev->tevs[j + 1].nargs;
>> +pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes;
>> +pev->tevs[j].args   = pev->tevs[j + 1].args;
>> +pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
>> +pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
>> +pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
>> +pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
>> +pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
>> +pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
>> +}
>> +
>> +ret:
>> +pev->ntevs--;
>> +}
> This code smells no good...
> Why don't you make a list of newly added probes ?
>

Yes. This is little ugly. Consider a following scenario.

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
Provider: libpthread
Name: mutex_entry
Location: 0x9ddb, Base: 0x000139cc, ...
  --
Provider: libpthread
Name: mutex_entry
Location: 0xbcbb, Base: 0x000139cc, ...

  $ perf probe sdt_libpthread:mutex_entry

  $ cat /sys/kernel/debug/tracing/uprobe_events
p:sdt_libpthread/mutex_entry 
/usr/lib64/libpthread-2.24.so:0x9ddb
p:sdt_libpthread/mutex_entry_1 
/usr/lib64/libpthread-2.24.so:0xbcbb

  $ perf probe -d sdt_libpthread:mutex_entry

  $ perf record -a -e sdt_libpthread:mutex_entry
 /* This should record:
   sdt_libpthread/mutex_entry  <== New entry (delete at the end 
of session)
   sdt_libpthread/mutex_entry_1  <== Reuse (do not delete at the 
end of session)
  */

First of all, perf looks into uprobe_events for entry with 
sdt_libpthread:mutex_entry
*name*. As it's not present, perf does a lookup in probe-cache and gets two 
entries.
Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events 
entries
with entries found from probe-cache with *filename* and *address* as comparison
parameter (because this combo makes unique identifier for any SDT event in a
particular system - please correct me if that's wrong). And perf finds one entry
sdt_libpthread/mutex_entry_1.

Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will 
add
new entry for that event even if it exists. So this code removes that particular
element of an array and shift all remaining elements one step.

Please suggest if you have a better approach.

>> +
>> +/* Compare address and filename */
>> +static bool is_sdt_match(struct 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-28 Thread Ravi Bangoria
Thanks Masami for the review.
Please find my replies to your comments.

On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> On Fri, 24 Feb 2017 13:13:25 +0530
> Ravi Bangoria  wrote:
>
>> After invoking 'perf record', behind the scenes, it checks whether
>> the event specified is an SDT event by using the flag '%' or string
>> 'sdt_'. After that, it first checks whether event already exists
>> with that *name* in uprobe_events file. If found, it records that
>> particular event. Otherwise, it does a lookup of the probe cache to
>> find out the SDT event. If its not present, it throws an error.
>> If found, it again tries to find existing events from uprobe_events
>> file, but this time it uses *address* and *filename* for comparison.
>> Finally it writes new events into the uprobe_events file and starts
>> recording. It also maintains a list of the event names that were
>> written to uprobe_events file for this session. After finishing the
>> record session, it removes the events from the uprobe_events file
>> using the maintained name list.
> OK, the behavior looks good. However, the implementation seems
> too complex, could you make it simpler?
> I have some comments on the code, see below.

Yes, it's complex code. It's all because of multiple SDT events with same name
exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- 
which
creates complex situations that has to be taken care. See my replies below for
such examples.

>> +/*
>> + * Remove ith tev from pev->tevs list and shift remaining
>> + * tevs(i+1 to pev->ntevs) one step.
>> + */
>> +static void shift_pev(struct perf_probe_event *pev, int i)
>> +{
>> +int j;
>> +
>> +free(pev->tevs[i].event);
>> +free(pev->tevs[i].group);
>> +free(pev->tevs[i].args);
>> +free(pev->tevs[i].point.realname);
>> +free(pev->tevs[i].point.symbol);
>> +free(pev->tevs[i].point.module);
>> +
>> +/*
>> + * If ith element is last element, no need to shift,
>> + * just decrement pev->ntevs.
>> + */
>> +if (i == pev->ntevs - 1)
>> +goto ret;
>> +
>> +for (j = i; j < pev->ntevs - 1; j++) {
>> +pev->tevs[j].event  = pev->tevs[j + 1].event;
>> +pev->tevs[j].group  = pev->tevs[j + 1].group;
>> +pev->tevs[j].nargs  = pev->tevs[j + 1].nargs;
>> +pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes;
>> +pev->tevs[j].args   = pev->tevs[j + 1].args;
>> +pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
>> +pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
>> +pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
>> +pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
>> +pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
>> +pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
>> +}
>> +
>> +ret:
>> +pev->ntevs--;
>> +}
> This code smells no good...
> Why don't you make a list of newly added probes ?
>

Yes. This is little ugly. Consider a following scenario.

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
Provider: libpthread
Name: mutex_entry
Location: 0x9ddb, Base: 0x000139cc, ...
  --
Provider: libpthread
Name: mutex_entry
Location: 0xbcbb, Base: 0x000139cc, ...

  $ perf probe sdt_libpthread:mutex_entry

  $ cat /sys/kernel/debug/tracing/uprobe_events
p:sdt_libpthread/mutex_entry 
/usr/lib64/libpthread-2.24.so:0x9ddb
p:sdt_libpthread/mutex_entry_1 
/usr/lib64/libpthread-2.24.so:0xbcbb

  $ perf probe -d sdt_libpthread:mutex_entry

  $ perf record -a -e sdt_libpthread:mutex_entry
 /* This should record:
   sdt_libpthread/mutex_entry  <== New entry (delete at the end 
of session)
   sdt_libpthread/mutex_entry_1  <== Reuse (do not delete at the 
end of session)
  */

First of all, perf looks into uprobe_events for entry with 
sdt_libpthread:mutex_entry
*name*. As it's not present, perf does a lookup in probe-cache and gets two 
entries.
Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events 
entries
with entries found from probe-cache with *filename* and *address* as comparison
parameter (because this combo makes unique identifier for any SDT event in a
particular system - please correct me if that's wrong). And perf finds one entry
sdt_libpthread/mutex_entry_1.

Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will 
add
new entry for that event even if it exists. So this code removes that particular
element of an array and shift all remaining elements one step.

Please suggest if you have a better approach.

>> +
>> +/* Compare address and filename */
>> +static bool is_sdt_match(struct probe_trace_event *tev1,
>> +

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-27 Thread Masami Hiramatsu
On Fri, 24 Feb 2017 13:13:25 +0530
Ravi Bangoria  wrote:

> From: Hemant Kumar 
> 
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
> 
> For example :
> 
>   $ perf list sdt
> sdt_libpthread:mutex_entry [SDT event]
> sdt_libc:setjmp[SDT event]
> ...
> 
>   $ perf record -a -e sdt_libc:setjmp
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
> 
>   $ perf script
>  bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
> login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>   fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> 
> Recording on SDT events with same provider and marker names is also
> supported:
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
>   Provider: libpthread
>   Name: mutex_entry
>   Location: 0x9ddb, Base: 0x000139cc, ...
> --
>   Provider: libpthread
>   Name: mutex_entry
>   Location: 0xbcbb, Base: 0x000139cc, ...
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
> Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]
> 
>   $ perf script
>   in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: 
> (7fb1a1940ddb)
>  rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: 
> (7fb1a1942cbb)
> 
> 
> After invoking 'perf record', behind the scenes, it checks whether
> the event specified is an SDT event by using the flag '%' or string
> 'sdt_'. After that, it first checks whether event already exists
> with that *name* in uprobe_events file. If found, it records that
> particular event. Otherwise, it does a lookup of the probe cache to
> find out the SDT event. If its not present, it throws an error.
> If found, it again tries to find existing events from uprobe_events
> file, but this time it uses *address* and *filename* for comparison.
> Finally it writes new events into the uprobe_events file and starts
> recording. It also maintains a list of the event names that were
> written to uprobe_events file for this session. After finishing the
> record session, it removes the events from the uprobe_events file
> using the maintained name list.

OK, the behavior looks good. However, the implementation seems
too complex, could you make it simpler?
I have some comments on the code, see below.

[...]
> diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
> index 1fcebc3..d255a6d 100644
> --- a/tools/perf/builtin-probe.c
> +++ b/tools/perf/builtin-probe.c
> @@ -45,6 +45,7 @@
>  #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*"
>  #define DEFAULT_FUNC_FILTER "!_*"
>  #define DEFAULT_LIST_FILTER "*"
> +#define MAX_EVENT_LENGTH 512
>  
>  /* Session management structure */
>  static struct {
> @@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err)
>   pr_err("\n");
>  }
>  
> +/* Show how to use the event. */
> +static void record_syntax_hint(const char *group, const char *event)
> +{
> + char ge[MAX_EVENT_LENGTH];
> +
> + pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> + pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> +
> + snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event);
> + if (is_sdt_event(ge))
> + pr_info("Hint: SDT event can be directly recorded with 'perf 
> record'. No need to create probe manually.\n");
> +}

Here, we don't need this message because anyway perf_add_probe_events already
created this probe...

> +
>  static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
>  {
>   int ret;
> @@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event 
> *pevs, int npevs)
>   }
>  
>   /* Note that it is possible to skip all events because of blacklist */
> - if (event) {
> - /* Show how to use the event. */
> - pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> - pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> - }
> + if (event)
> + record_syntax_hint(group, event);
>  
>  out_cleanup:
>   cleanup_perf_probe_events(pevs, npevs);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index bc84a37..fa9b345 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -39,6 +39,7 @@
>  #include "util/trigger.h"
>  #include 

Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-27 Thread Masami Hiramatsu
On Fri, 24 Feb 2017 13:13:25 +0530
Ravi Bangoria  wrote:

> From: Hemant Kumar 
> 
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
> 
> For example :
> 
>   $ perf list sdt
> sdt_libpthread:mutex_entry [SDT event]
> sdt_libc:setjmp[SDT event]
> ...
> 
>   $ perf record -a -e sdt_libc:setjmp
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
> 
>   $ perf script
>  bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
> login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>   fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> 
> Recording on SDT events with same provider and marker names is also
> supported:
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
>   Provider: libpthread
>   Name: mutex_entry
>   Location: 0x9ddb, Base: 0x000139cc, ...
> --
>   Provider: libpthread
>   Name: mutex_entry
>   Location: 0xbcbb, Base: 0x000139cc, ...
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
> Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]
> 
>   $ perf script
>   in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: 
> (7fb1a1940ddb)
>  rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: 
> (7fb1a1942cbb)
> 
> 
> After invoking 'perf record', behind the scenes, it checks whether
> the event specified is an SDT event by using the flag '%' or string
> 'sdt_'. After that, it first checks whether event already exists
> with that *name* in uprobe_events file. If found, it records that
> particular event. Otherwise, it does a lookup of the probe cache to
> find out the SDT event. If its not present, it throws an error.
> If found, it again tries to find existing events from uprobe_events
> file, but this time it uses *address* and *filename* for comparison.
> Finally it writes new events into the uprobe_events file and starts
> recording. It also maintains a list of the event names that were
> written to uprobe_events file for this session. After finishing the
> record session, it removes the events from the uprobe_events file
> using the maintained name list.

OK, the behavior looks good. However, the implementation seems
too complex, could you make it simpler?
I have some comments on the code, see below.

[...]
> diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
> index 1fcebc3..d255a6d 100644
> --- a/tools/perf/builtin-probe.c
> +++ b/tools/perf/builtin-probe.c
> @@ -45,6 +45,7 @@
>  #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*"
>  #define DEFAULT_FUNC_FILTER "!_*"
>  #define DEFAULT_LIST_FILTER "*"
> +#define MAX_EVENT_LENGTH 512
>  
>  /* Session management structure */
>  static struct {
> @@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err)
>   pr_err("\n");
>  }
>  
> +/* Show how to use the event. */
> +static void record_syntax_hint(const char *group, const char *event)
> +{
> + char ge[MAX_EVENT_LENGTH];
> +
> + pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> + pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> +
> + snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event);
> + if (is_sdt_event(ge))
> + pr_info("Hint: SDT event can be directly recorded with 'perf 
> record'. No need to create probe manually.\n");
> +}

Here, we don't need this message because anyway perf_add_probe_events already
created this probe...

> +
>  static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
>  {
>   int ret;
> @@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event 
> *pevs, int npevs)
>   }
>  
>   /* Note that it is possible to skip all events because of blacklist */
> - if (event) {
> - /* Show how to use the event. */
> - pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> - pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> - }
> + if (event)
> + record_syntax_hint(group, event);
>  
>  out_cleanup:
>   cleanup_perf_probe_events(pevs, npevs);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index bc84a37..fa9b345 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -39,6 +39,7 @@
>  #include "util/trigger.h"
>  #include "util/perf-hooks.h"
>  #include "asm/bug.h"
> +#include 

[PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-23 Thread Ravi Bangoria
From: Hemant Kumar 

Add support for directly recording SDT events which are present in
the probe cache. Without this patch, we could probe into SDT events
using 'perf probe' and 'perf record'. With this patch, we can probe
the SDT events directly using 'perf record'.

For example :

  $ perf list sdt
sdt_libpthread:mutex_entry [SDT event]
sdt_libc:setjmp[SDT event]
...

  $ perf record -a -e sdt_libc:setjmp
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]

  $ perf script
 bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
  fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)

Recording on SDT events with same provider and marker names is also
supported:

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
  Provider: libpthread
  Name: mutex_entry
  Location: 0x9ddb, Base: 0x000139cc, ...
--
  Provider: libpthread
  Name: mutex_entry
  Location: 0xbcbb, Base: 0x000139cc, ...

  $ perf record -a -e sdt_libpthread:mutex_entry
Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]

  $ perf script
  in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: 
(7fb1a1940ddb)
 rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: 
(7fb1a1942cbb)


After invoking 'perf record', behind the scenes, it checks whether
the event specified is an SDT event by using the flag '%' or string
'sdt_'. After that, it first checks whether event already exists
with that *name* in uprobe_events file. If found, it records that
particular event. Otherwise, it does a lookup of the probe cache to
find out the SDT event. If its not present, it throws an error.
If found, it again tries to find existing events from uprobe_events
file, but this time it uses *address* and *filename* for comparison.
Finally it writes new events into the uprobe_events file and starts
recording. It also maintains a list of the event names that were
written to uprobe_events file for this session. After finishing the
record session, it removes the events from the uprobe_events file
using the maintained name list.

Signed-off-by: Hemant Kumar 
Signed-off-by: Ravi Bangoria 
---
 tools/lib/api/fs/tracing_path.c |  17 +-
 tools/perf/builtin-probe.c  |  21 ++-
 tools/perf/builtin-record.c |  23 +++
 tools/perf/perf.h   |   2 +
 tools/perf/util/parse-events.c  |  56 +-
 tools/perf/util/parse-events.h  |   2 +
 tools/perf/util/probe-event.c   |  44 -
 tools/perf/util/probe-event.h   |   4 +
 tools/perf/util/probe-file.c| 376 
 tools/perf/util/probe-file.h|  27 +++
 10 files changed, 552 insertions(+), 20 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 3e606b9..fa52e67 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, 
size_t size,
 * - jirka
 */
if (debugfs__configured() || tracefs__configured()) {
-   /* sdt markers */
-   if (!strncmp(filename, "sdt_", 4)) {
-   snprintf(buf, size,
-   "Error:\tFile %s/%s not found.\n"
-   "Hint:\tSDT event cannot be directly 
recorded on.\n"
-   "\tPlease first use 'perf probe %s:%s' 
before recording it.\n",
-   tracing_events_path, filename, sys, 
name);
-   } else {
-   snprintf(buf, size,
-"Error:\tFile %s/%s not found.\n"
-"Hint:\tPerhaps this kernel misses 
some CONFIG_ setting to enable this feature?.\n",
-tracing_events_path, filename);
-   }
+   snprintf(buf, size,
+"Error:\tFile %s/%s not found.\n"
+"Hint:\tPerhaps this kernel misses some 
CONFIG_ setting to enable this feature?.\n",
+tracing_events_path, filename);
break;
}
snprintf(buf, size, "%s",
diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 1fcebc3..d255a6d 100644
--- a/tools/perf/builtin-probe.c
+++ 

[PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'

2017-02-23 Thread Ravi Bangoria
From: Hemant Kumar 

Add support for directly recording SDT events which are present in
the probe cache. Without this patch, we could probe into SDT events
using 'perf probe' and 'perf record'. With this patch, we can probe
the SDT events directly using 'perf record'.

For example :

  $ perf list sdt
sdt_libpthread:mutex_entry [SDT event]
sdt_libc:setjmp[SDT event]
...

  $ perf record -a -e sdt_libc:setjmp
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]

  $ perf script
 bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
  fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)

Recording on SDT events with same provider and marker names is also
supported:

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
  Provider: libpthread
  Name: mutex_entry
  Location: 0x9ddb, Base: 0x000139cc, ...
--
  Provider: libpthread
  Name: mutex_entry
  Location: 0xbcbb, Base: 0x000139cc, ...

  $ perf record -a -e sdt_libpthread:mutex_entry
Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]

  $ perf script
  in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: 
(7fb1a1940ddb)
 rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: 
(7fb1a1942cbb)


After invoking 'perf record', behind the scenes, it checks whether
the event specified is an SDT event by using the flag '%' or string
'sdt_'. After that, it first checks whether event already exists
with that *name* in uprobe_events file. If found, it records that
particular event. Otherwise, it does a lookup of the probe cache to
find out the SDT event. If its not present, it throws an error.
If found, it again tries to find existing events from uprobe_events
file, but this time it uses *address* and *filename* for comparison.
Finally it writes new events into the uprobe_events file and starts
recording. It also maintains a list of the event names that were
written to uprobe_events file for this session. After finishing the
record session, it removes the events from the uprobe_events file
using the maintained name list.

Signed-off-by: Hemant Kumar 
Signed-off-by: Ravi Bangoria 
---
 tools/lib/api/fs/tracing_path.c |  17 +-
 tools/perf/builtin-probe.c  |  21 ++-
 tools/perf/builtin-record.c |  23 +++
 tools/perf/perf.h   |   2 +
 tools/perf/util/parse-events.c  |  56 +-
 tools/perf/util/parse-events.h  |   2 +
 tools/perf/util/probe-event.c   |  44 -
 tools/perf/util/probe-event.h   |   4 +
 tools/perf/util/probe-file.c| 376 
 tools/perf/util/probe-file.h|  27 +++
 10 files changed, 552 insertions(+), 20 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 3e606b9..fa52e67 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, 
size_t size,
 * - jirka
 */
if (debugfs__configured() || tracefs__configured()) {
-   /* sdt markers */
-   if (!strncmp(filename, "sdt_", 4)) {
-   snprintf(buf, size,
-   "Error:\tFile %s/%s not found.\n"
-   "Hint:\tSDT event cannot be directly 
recorded on.\n"
-   "\tPlease first use 'perf probe %s:%s' 
before recording it.\n",
-   tracing_events_path, filename, sys, 
name);
-   } else {
-   snprintf(buf, size,
-"Error:\tFile %s/%s not found.\n"
-"Hint:\tPerhaps this kernel misses 
some CONFIG_ setting to enable this feature?.\n",
-tracing_events_path, filename);
-   }
+   snprintf(buf, size,
+"Error:\tFile %s/%s not found.\n"
+"Hint:\tPerhaps this kernel misses some 
CONFIG_ setting to enable this feature?.\n",
+tracing_events_path, filename);
break;
}
snprintf(buf, size, "%s",
diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 1fcebc3..d255a6d 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -45,6 +45,7 @@
 #define DEFAULT_VAR_FILTER "!__k???tab_* &