Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
Thank you Brendan for reviewing, On Wednesday 01 March 2017 10:34 AM, Brendan Gregg wrote: > On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg >wrote: >> G'Day Ravi, >> > [...] >> Now retrying perf: >> >> # ./perf record -e sdt_node:http__server__request -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] >> # ./perf script >> node 7646 [002] 361.012364: >> sdt_node:http__server__request: (dc2e69) >> node 7646 [002] 361.204718: >> sdt_node:http__server__request: (dc2e69) >> node 7646 [002] 361.363043: >> sdt_node:http__server__request: (dc2e69) >> >> Now perf works. >> >> If I restart the node process, it goes back to the broken state. >> > Oh sorry, I forgot about that these Node.js probes are behind an > is-enabled semaphore. Yes. Perf does not support "is-enabled" markers yet. > $ readelf -n `which node` > [...] > stapsdt 0x0089NT_STAPSDT (SystemTap probe descriptors) > Provider: node > Name: http__server__request > Location: 0x00dc2e69, Base: 0x0112e064, Semaphore: > 0x01470954 > Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp) > 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp) > # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) > 2>/dev/null | xxd > : 00 . > # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$(( > 0x01470954 )) 2>/dev/null > # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) > 2>/dev/null | xxd > : 01 . > # ./perf record -e sdt_node:http__server__request -a > Matching event(s) from uprobe_events: >sdt_node:http__server__request 0x9c2e69@/usr/local/bin/node > Use 'perf probe -d ' to delete event(s). > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ] > # ./perf script > node 31695 [003] 24947.168761: > sdt_node:http__server__request: (dc2e69) > node 31695 [003] 24947.476143: > sdt_node:http__server__request: (dc2e69) > node 31695 [003] 24947.679090: > sdt_node:http__server__request: (dc2e69) > > So setting that to 1 made the probe work from perf. I guess this is > not a problem with this patch set, but rather a feature request for > the next one: is-enabled SDT support. Yes. Actually I'm thinking about how this can be accomplished. Perf is a userspace tool and, unlike systemtap, it cannot change semaphore value easily. This is what I was thinking: 'perf record', at the start of session will increments semaphore in /proc//mem. and at the end of session, perf will decrement it (same as bcc). This does not require any support from kernel infrastructure. But there are challenges with this approach: 1. What if user starts workload after starting 'perf record'. How perf will be able to increment semaphore value. 2. Systemwide record. We have to loop over all pids and check if any process is using SDT with semaphore that is being recorded. 3. Dynamic library loading. How to handle SDT probes in library that is not loaded at the time of 'perf record'? Please let me know your thoughts. > Were probe arguments supposed to work? I don't notice them in the perf > script output. Not yet. Alexis[1] (and followed by me[2]) had sent patches for that. Please have a look at them. [1] https://lkml.org/lkml/2016/12/13/784 [2] https://lkml.org/lkml/2017/2/2/145 So, why perf is able to record data after recording them with bcc? Ideally, bcc should increment semaphore value at the start of session and it should decrement at the end of the session. So after bcc process exits, semaphore value should be zero. But actually it's not happening. I've seen this when I was experimenting bcc with is-enabled markers. See this example, $ readelf -n /usr/bin/node | grep -A2 Provider Provider: node Name: http__server__request Location: 0x00e5f484, Base: 0x011a0bc4, Semaphore: 0x01558cf2 $ sudo cat /proc/1426/maps 0040-01306000 r-xp 08:02 1083365/usr/bin/node 01506000-01551000 r--p 00f06000 08:02 1083365/usr/bin/node 01551000-01559000 rw-p 00f51000 08:02 1083365/usr/bin/node ... [TERMINAL-1]$ gdb 1426 (gdb) x/1 0x1558cf2 0x1558cf2:0 [TERMINAL-2]$ sudo ./trace.py -p 1426 'u:node:http__server__request' PIDTIDCOMM FUNC /* Do not exit yet. */ [TERMINAL-1] (gdb) x/1 0x1558cf2 0x1558cf2:2 [TERMINAL-2] ^C /* Exit bcc trace.py */ [TERMINAL-1] (gdb) x/1 0x1558cf2 0x1558cf2:2 Here it's maintaining value 2 as it is. it should be 0. I suspect this is a bug in bcc. Please let me know if I'm understanding it wrong. > > PS, if it's helpful, here's the commands to build node
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
Thank you Brendan for reviewing, On Wednesday 01 March 2017 10:34 AM, Brendan Gregg wrote: > On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg > wrote: >> G'Day Ravi, >> > [...] >> Now retrying perf: >> >> # ./perf record -e sdt_node:http__server__request -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] >> # ./perf script >> node 7646 [002] 361.012364: >> sdt_node:http__server__request: (dc2e69) >> node 7646 [002] 361.204718: >> sdt_node:http__server__request: (dc2e69) >> node 7646 [002] 361.363043: >> sdt_node:http__server__request: (dc2e69) >> >> Now perf works. >> >> If I restart the node process, it goes back to the broken state. >> > Oh sorry, I forgot about that these Node.js probes are behind an > is-enabled semaphore. Yes. Perf does not support "is-enabled" markers yet. > $ readelf -n `which node` > [...] > stapsdt 0x0089NT_STAPSDT (SystemTap probe descriptors) > Provider: node > Name: http__server__request > Location: 0x00dc2e69, Base: 0x0112e064, Semaphore: > 0x01470954 > Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp) > 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp) > # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) > 2>/dev/null | xxd > : 00 . > # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$(( > 0x01470954 )) 2>/dev/null > # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) > 2>/dev/null | xxd > : 01 . > # ./perf record -e sdt_node:http__server__request -a > Matching event(s) from uprobe_events: >sdt_node:http__server__request 0x9c2e69@/usr/local/bin/node > Use 'perf probe -d ' to delete event(s). > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ] > # ./perf script > node 31695 [003] 24947.168761: > sdt_node:http__server__request: (dc2e69) > node 31695 [003] 24947.476143: > sdt_node:http__server__request: (dc2e69) > node 31695 [003] 24947.679090: > sdt_node:http__server__request: (dc2e69) > > So setting that to 1 made the probe work from perf. I guess this is > not a problem with this patch set, but rather a feature request for > the next one: is-enabled SDT support. Yes. Actually I'm thinking about how this can be accomplished. Perf is a userspace tool and, unlike systemtap, it cannot change semaphore value easily. This is what I was thinking: 'perf record', at the start of session will increments semaphore in /proc//mem. and at the end of session, perf will decrement it (same as bcc). This does not require any support from kernel infrastructure. But there are challenges with this approach: 1. What if user starts workload after starting 'perf record'. How perf will be able to increment semaphore value. 2. Systemwide record. We have to loop over all pids and check if any process is using SDT with semaphore that is being recorded. 3. Dynamic library loading. How to handle SDT probes in library that is not loaded at the time of 'perf record'? Please let me know your thoughts. > Were probe arguments supposed to work? I don't notice them in the perf > script output. Not yet. Alexis[1] (and followed by me[2]) had sent patches for that. Please have a look at them. [1] https://lkml.org/lkml/2016/12/13/784 [2] https://lkml.org/lkml/2017/2/2/145 So, why perf is able to record data after recording them with bcc? Ideally, bcc should increment semaphore value at the start of session and it should decrement at the end of the session. So after bcc process exits, semaphore value should be zero. But actually it's not happening. I've seen this when I was experimenting bcc with is-enabled markers. See this example, $ readelf -n /usr/bin/node | grep -A2 Provider Provider: node Name: http__server__request Location: 0x00e5f484, Base: 0x011a0bc4, Semaphore: 0x01558cf2 $ sudo cat /proc/1426/maps 0040-01306000 r-xp 08:02 1083365/usr/bin/node 01506000-01551000 r--p 00f06000 08:02 1083365/usr/bin/node 01551000-01559000 rw-p 00f51000 08:02 1083365/usr/bin/node ... [TERMINAL-1]$ gdb 1426 (gdb) x/1 0x1558cf2 0x1558cf2:0 [TERMINAL-2]$ sudo ./trace.py -p 1426 'u:node:http__server__request' PIDTIDCOMM FUNC /* Do not exit yet. */ [TERMINAL-1] (gdb) x/1 0x1558cf2 0x1558cf2:2 [TERMINAL-2] ^C /* Exit bcc trace.py */ [TERMINAL-1] (gdb) x/1 0x1558cf2 0x1558cf2:2 Here it's maintaining value 2 as it is. it should be 0. I suspect this is a bug in bcc. Please let me know if I'm understanding it wrong. > > PS, if it's helpful, here's the commands to build node with these SDT probes: > > $
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Tue, Feb 28, 2017 at 2:31 PM, Brendan Greggwrote: > G'Day Ravi, > [...] > Now retrying perf: > > # ./perf record -e sdt_node:http__server__request -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] > # ./perf script > node 7646 [002] 361.012364: > sdt_node:http__server__request: (dc2e69) > node 7646 [002] 361.204718: > sdt_node:http__server__request: (dc2e69) > node 7646 [002] 361.363043: > sdt_node:http__server__request: (dc2e69) > > Now perf works. > > If I restart the node process, it goes back to the broken state. > Oh sorry, I forgot about that these Node.js probes are behind an is-enabled semaphore. $ readelf -n `which node` [...] stapsdt 0x0089NT_STAPSDT (SystemTap probe descriptors) Provider: node Name: http__server__request Location: 0x00dc2e69, Base: 0x0112e064, Semaphore: 0x01470954 Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp) 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp) # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) 2>/dev/null | xxd : 00 . # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$(( 0x01470954 )) 2>/dev/null # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) 2>/dev/null | xxd : 01 . # ./perf record -e sdt_node:http__server__request -a Matching event(s) from uprobe_events: sdt_node:http__server__request 0x9c2e69@/usr/local/bin/node Use 'perf probe -d ' to delete event(s). ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ] # ./perf script node 31695 [003] 24947.168761: sdt_node:http__server__request: (dc2e69) node 31695 [003] 24947.476143: sdt_node:http__server__request: (dc2e69) node 31695 [003] 24947.679090: sdt_node:http__server__request: (dc2e69) So setting that to 1 made the probe work from perf. I guess this is not a problem with this patch set, but rather a feature request for the next one: is-enabled SDT support. Were probe arguments supposed to work? I don't notice them in the perf script output. PS, if it's helpful, here's the commands to build node with these SDT probes: $ sudo apt-get install systemtap-sdt-dev # adds "dtrace", used by node build $ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz $ tar xvf node-v4.4.1.tar.gz $ cd node-v4.4.1 $ ./configure --with-dtrace $ make -j 8 Brendan
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg wrote: > G'Day Ravi, > [...] > Now retrying perf: > > # ./perf record -e sdt_node:http__server__request -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] > # ./perf script > node 7646 [002] 361.012364: > sdt_node:http__server__request: (dc2e69) > node 7646 [002] 361.204718: > sdt_node:http__server__request: (dc2e69) > node 7646 [002] 361.363043: > sdt_node:http__server__request: (dc2e69) > > Now perf works. > > If I restart the node process, it goes back to the broken state. > Oh sorry, I forgot about that these Node.js probes are behind an is-enabled semaphore. $ readelf -n `which node` [...] stapsdt 0x0089NT_STAPSDT (SystemTap probe descriptors) Provider: node Name: http__server__request Location: 0x00dc2e69, Base: 0x0112e064, Semaphore: 0x01470954 Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp) 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp) # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) 2>/dev/null | xxd : 00 . # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$(( 0x01470954 )) 2>/dev/null # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x01470954 )) 2>/dev/null | xxd : 01 . # ./perf record -e sdt_node:http__server__request -a Matching event(s) from uprobe_events: sdt_node:http__server__request 0x9c2e69@/usr/local/bin/node Use 'perf probe -d ' to delete event(s). ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ] # ./perf script node 31695 [003] 24947.168761: sdt_node:http__server__request: (dc2e69) node 31695 [003] 24947.476143: sdt_node:http__server__request: (dc2e69) node 31695 [003] 24947.679090: sdt_node:http__server__request: (dc2e69) So setting that to 1 made the probe work from perf. I guess this is not a problem with this patch set, but rather a feature request for the next one: is-enabled SDT support. Were probe arguments supposed to work? I don't notice them in the perf script output. PS, if it's helpful, here's the commands to build node with these SDT probes: $ sudo apt-get install systemtap-sdt-dev # adds "dtrace", used by node build $ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz $ tar xvf node-v4.4.1.tar.gz $ cd node-v4.4.1 $ ./configure --with-dtrace $ make -j 8 Brendan
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Tue, 28 Feb 2017 16:26:19 +0530 Ravi Bangoriawrote: > Thanks Masami for the review. > Please find my replies to your comments. > > On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote: > > On Fri, 24 Feb 2017 13:13:25 +0530 > > Ravi Bangoria wrote: > > > >> After invoking 'perf record', behind the scenes, it checks whether > >> the event specified is an SDT event by using the flag '%' or string > >> 'sdt_'. After that, it first checks whether event already exists > >> with that *name* in uprobe_events file. If found, it records that > >> particular event. Otherwise, it does a lookup of the probe cache to > >> find out the SDT event. If its not present, it throws an error. > >> If found, it again tries to find existing events from uprobe_events > >> file, but this time it uses *address* and *filename* for comparison. > >> Finally it writes new events into the uprobe_events file and starts > >> recording. It also maintains a list of the event names that were > >> written to uprobe_events file for this session. After finishing the > >> record session, it removes the events from the uprobe_events file > >> using the maintained name list. > > OK, the behavior looks good. However, the implementation seems > > too complex, could you make it simpler? > > I have some comments on the code, see below. > > Yes, it's complex code. It's all because of multiple SDT events with same name > exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- > which > creates complex situations that has to be taken care. See my replies below for > such examples. OK, but in that case, could you split those support for review? E.g. the first one just combine perf-probe and perf-record and second one support multiple SDT events. > > >> +/* > >> + * Remove ith tev from pev->tevs list and shift remaining > >> + * tevs(i+1 to pev->ntevs) one step. > >> + */ > >> +static void shift_pev(struct perf_probe_event *pev, int i) > >> +{ > >> + int j; > >> + > >> + free(pev->tevs[i].event); > >> + free(pev->tevs[i].group); > >> + free(pev->tevs[i].args); > >> + free(pev->tevs[i].point.realname); > >> + free(pev->tevs[i].point.symbol); > >> + free(pev->tevs[i].point.module); > >> + > >> + /* > >> + * If ith element is last element, no need to shift, > >> + * just decrement pev->ntevs. > >> + */ > >> + if (i == pev->ntevs - 1) > >> + goto ret; > >> + > >> + for (j = i; j < pev->ntevs - 1; j++) { > >> + pev->tevs[j].event = pev->tevs[j + 1].event; > >> + pev->tevs[j].group = pev->tevs[j + 1].group; > >> + pev->tevs[j].nargs = pev->tevs[j + 1].nargs; > >> + pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes; > >> + pev->tevs[j].args = pev->tevs[j + 1].args; > >> + pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname; > >> + pev->tevs[j].point.symbol = pev->tevs[j + 1].point.symbol; > >> + pev->tevs[j].point.module = pev->tevs[j + 1].point.module; > >> + pev->tevs[j].point.offset = pev->tevs[j + 1].point.offset; > >> + pev->tevs[j].point.address = pev->tevs[j + 1].point.address; > >> + pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe; > >> + } > >> + > >> +ret: > >> + pev->ntevs--; > >> +} > > This code smells no good... > > Why don't you make a list of newly added probes ? > > > > Yes. This is little ugly. Consider a following scenario. > > $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider > Provider: libpthread > Name: mutex_entry > Location: 0x9ddb, Base: 0x000139cc, ... > -- > Provider: libpthread > Name: mutex_entry > Location: 0xbcbb, Base: 0x000139cc, ... > > $ perf probe sdt_libpthread:mutex_entry > > $ cat /sys/kernel/debug/tracing/uprobe_events > p:sdt_libpthread/mutex_entry > /usr/lib64/libpthread-2.24.so:0x9ddb > p:sdt_libpthread/mutex_entry_1 > /usr/lib64/libpthread-2.24.so:0xbcbb > > $ perf probe -d sdt_libpthread:mutex_entry > > $ perf record -a -e sdt_libpthread:mutex_entry > /* This should record: >sdt_libpthread/mutex_entry <== New entry (delete at the > end of session) >sdt_libpthread/mutex_entry_1 <== Reuse (do not delete at the > end of session) > */ Hmm, OK. but this also not becomes the reason why you need 2 different lists. After calling apply_perf_probe_events(), pev->tevs have been updated, which have applied names. So you can pull it from the trace_event. > > First of all, perf looks into uprobe_events for entry with > sdt_libpthread:mutex_entry > *name*. As it's not present, perf does a lookup in probe-cache and gets two > entries. > Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events >
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Tue, 28 Feb 2017 16:26:19 +0530 Ravi Bangoria wrote: > Thanks Masami for the review. > Please find my replies to your comments. > > On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote: > > On Fri, 24 Feb 2017 13:13:25 +0530 > > Ravi Bangoria wrote: > > > >> After invoking 'perf record', behind the scenes, it checks whether > >> the event specified is an SDT event by using the flag '%' or string > >> 'sdt_'. After that, it first checks whether event already exists > >> with that *name* in uprobe_events file. If found, it records that > >> particular event. Otherwise, it does a lookup of the probe cache to > >> find out the SDT event. If its not present, it throws an error. > >> If found, it again tries to find existing events from uprobe_events > >> file, but this time it uses *address* and *filename* for comparison. > >> Finally it writes new events into the uprobe_events file and starts > >> recording. It also maintains a list of the event names that were > >> written to uprobe_events file for this session. After finishing the > >> record session, it removes the events from the uprobe_events file > >> using the maintained name list. > > OK, the behavior looks good. However, the implementation seems > > too complex, could you make it simpler? > > I have some comments on the code, see below. > > Yes, it's complex code. It's all because of multiple SDT events with same name > exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- > which > creates complex situations that has to be taken care. See my replies below for > such examples. OK, but in that case, could you split those support for review? E.g. the first one just combine perf-probe and perf-record and second one support multiple SDT events. > > >> +/* > >> + * Remove ith tev from pev->tevs list and shift remaining > >> + * tevs(i+1 to pev->ntevs) one step. > >> + */ > >> +static void shift_pev(struct perf_probe_event *pev, int i) > >> +{ > >> + int j; > >> + > >> + free(pev->tevs[i].event); > >> + free(pev->tevs[i].group); > >> + free(pev->tevs[i].args); > >> + free(pev->tevs[i].point.realname); > >> + free(pev->tevs[i].point.symbol); > >> + free(pev->tevs[i].point.module); > >> + > >> + /* > >> + * If ith element is last element, no need to shift, > >> + * just decrement pev->ntevs. > >> + */ > >> + if (i == pev->ntevs - 1) > >> + goto ret; > >> + > >> + for (j = i; j < pev->ntevs - 1; j++) { > >> + pev->tevs[j].event = pev->tevs[j + 1].event; > >> + pev->tevs[j].group = pev->tevs[j + 1].group; > >> + pev->tevs[j].nargs = pev->tevs[j + 1].nargs; > >> + pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes; > >> + pev->tevs[j].args = pev->tevs[j + 1].args; > >> + pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname; > >> + pev->tevs[j].point.symbol = pev->tevs[j + 1].point.symbol; > >> + pev->tevs[j].point.module = pev->tevs[j + 1].point.module; > >> + pev->tevs[j].point.offset = pev->tevs[j + 1].point.offset; > >> + pev->tevs[j].point.address = pev->tevs[j + 1].point.address; > >> + pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe; > >> + } > >> + > >> +ret: > >> + pev->ntevs--; > >> +} > > This code smells no good... > > Why don't you make a list of newly added probes ? > > > > Yes. This is little ugly. Consider a following scenario. > > $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider > Provider: libpthread > Name: mutex_entry > Location: 0x9ddb, Base: 0x000139cc, ... > -- > Provider: libpthread > Name: mutex_entry > Location: 0xbcbb, Base: 0x000139cc, ... > > $ perf probe sdt_libpthread:mutex_entry > > $ cat /sys/kernel/debug/tracing/uprobe_events > p:sdt_libpthread/mutex_entry > /usr/lib64/libpthread-2.24.so:0x9ddb > p:sdt_libpthread/mutex_entry_1 > /usr/lib64/libpthread-2.24.so:0xbcbb > > $ perf probe -d sdt_libpthread:mutex_entry > > $ perf record -a -e sdt_libpthread:mutex_entry > /* This should record: >sdt_libpthread/mutex_entry <== New entry (delete at the > end of session) >sdt_libpthread/mutex_entry_1 <== Reuse (do not delete at the > end of session) > */ Hmm, OK. but this also not becomes the reason why you need 2 different lists. After calling apply_perf_probe_events(), pev->tevs have been updated, which have applied names. So you can pull it from the trace_event. > > First of all, perf looks into uprobe_events for entry with > sdt_libpthread:mutex_entry > *name*. As it's not present, perf does a lookup in probe-cache and gets two > entries. > Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events > entries > with entries found from probe-cache with *filename* and
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
G'Day Ravi, On Thu, Feb 23, 2017 at 11:43 PM, Ravi Bangoriawrote: > > From: Hemant Kumar > > Add support for directly recording SDT events which are present in > the probe cache. Without this patch, we could probe into SDT events > using 'perf probe' and 'perf record'. With this patch, we can probe > the SDT events directly using 'perf record'. > > For example : > > $ perf list sdt > sdt_libpthread:mutex_entry [SDT event] > sdt_libc:setjmp[SDT event] > ... > > $ perf record -a -e sdt_libc:setjmp > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] > > $ perf script > bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) > login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) > fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) > [...] Thanks, I like the usage. I ran into trouble testing on Node.js: # ./perf buildid-cache --add `which node` # ./perf list | grep sdt_node sdt_node:gc__done [SDT event] sdt_node:gc__start [SDT event] sdt_node:http__client__request [SDT event] sdt_node:http__client__response[SDT event] sdt_node:http__server__request [SDT event] sdt_node:http__server__response[SDT event] sdt_node:net__server__connection [SDT event] sdt_node:net__stream__end [SDT event] # ./perf record -e sdt_node:http__server__request -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.308 MB perf.data ] # ./perf script # No events. I can see it had set it up: # cat /sys/kernel/debug/tracing/uprobe_events p:sdt_node/http__server__request /usr/local/bin/node:0x009c2e69 Ok. Am I sure my workload is working? Trying from bcc/eBPF: # /mnt/src/bcc/tools/trace.py 'u:node:http__server__request' failed to enable probe 'http__server__request'; a possible cause can be that the probe requires a pid to enable # /mnt/src/bcc/tools/trace.py -p `pgrep node` 'u:node:http__server__request' In file included from /virtual/main.c:41: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/blkdev.h:9: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/genhd.h:64: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/device.h:24: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/devinfo.h:21: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/consumer.h:17: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/seq_file.h:10: /lib/modules/4.10.0-rc8-virtual/build/include/linux/fs.h:2648:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare] if (id < 0 || id >= READING_MAX_ID) ~~ ^ ~ 1 warning generated. PIDTIDCOMM FUNC 7646 7646 node http__server__request 7646 7646 node http__server__request 7646 7646 node http__server__request ^C (ignore the warning; I just asked lkml about it). So that works. It instrumented: # cat /sys/kernel/debug/tracing/uprobe_events p:uprobes/p__usr_local_bin_node_0x9c2e69_bcc_25410 /usr/local/bin/node:0x009c2e69 Now retrying perf: # ./perf record -e sdt_node:http__server__request -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] # ./perf script node 7646 [002] 361.012364: sdt_node:http__server__request: (dc2e69) node 7646 [002] 361.204718: sdt_node:http__server__request: (dc2e69) node 7646 [002] 361.363043: sdt_node:http__server__request: (dc2e69) Now perf works. If I restart the node process, it goes back to the broken state. This doesn't fix it either: # ./perf probe sdt_node:http__server__request Added new event: sdt_node:http__server__request (on %http__server__request in /usr/local/bin/node) You can now use it in all perf tools, such as: perf record -e sdt_node:http__server__request -aR sleep 1 Hint: SDT event can be directly recorded with 'perf record'. No need to create probe manually. Brendan
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
G'Day Ravi, On Thu, Feb 23, 2017 at 11:43 PM, Ravi Bangoria wrote: > > From: Hemant Kumar > > Add support for directly recording SDT events which are present in > the probe cache. Without this patch, we could probe into SDT events > using 'perf probe' and 'perf record'. With this patch, we can probe > the SDT events directly using 'perf record'. > > For example : > > $ perf list sdt > sdt_libpthread:mutex_entry [SDT event] > sdt_libc:setjmp[SDT event] > ... > > $ perf record -a -e sdt_libc:setjmp > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] > > $ perf script > bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) > login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) > fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) > [...] Thanks, I like the usage. I ran into trouble testing on Node.js: # ./perf buildid-cache --add `which node` # ./perf list | grep sdt_node sdt_node:gc__done [SDT event] sdt_node:gc__start [SDT event] sdt_node:http__client__request [SDT event] sdt_node:http__client__response[SDT event] sdt_node:http__server__request [SDT event] sdt_node:http__server__response[SDT event] sdt_node:net__server__connection [SDT event] sdt_node:net__stream__end [SDT event] # ./perf record -e sdt_node:http__server__request -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.308 MB perf.data ] # ./perf script # No events. I can see it had set it up: # cat /sys/kernel/debug/tracing/uprobe_events p:sdt_node/http__server__request /usr/local/bin/node:0x009c2e69 Ok. Am I sure my workload is working? Trying from bcc/eBPF: # /mnt/src/bcc/tools/trace.py 'u:node:http__server__request' failed to enable probe 'http__server__request'; a possible cause can be that the probe requires a pid to enable # /mnt/src/bcc/tools/trace.py -p `pgrep node` 'u:node:http__server__request' In file included from /virtual/main.c:41: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/blkdev.h:9: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/genhd.h:64: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/device.h:24: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/devinfo.h:21: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/consumer.h:17: In file included from /lib/modules/4.10.0-rc8-virtual/build/include/linux/seq_file.h:10: /lib/modules/4.10.0-rc8-virtual/build/include/linux/fs.h:2648:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare] if (id < 0 || id >= READING_MAX_ID) ~~ ^ ~ 1 warning generated. PIDTIDCOMM FUNC 7646 7646 node http__server__request 7646 7646 node http__server__request 7646 7646 node http__server__request ^C (ignore the warning; I just asked lkml about it). So that works. It instrumented: # cat /sys/kernel/debug/tracing/uprobe_events p:uprobes/p__usr_local_bin_node_0x9c2e69_bcc_25410 /usr/local/bin/node:0x009c2e69 Now retrying perf: # ./perf record -e sdt_node:http__server__request -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ] # ./perf script node 7646 [002] 361.012364: sdt_node:http__server__request: (dc2e69) node 7646 [002] 361.204718: sdt_node:http__server__request: (dc2e69) node 7646 [002] 361.363043: sdt_node:http__server__request: (dc2e69) Now perf works. If I restart the node process, it goes back to the broken state. This doesn't fix it either: # ./perf probe sdt_node:http__server__request Added new event: sdt_node:http__server__request (on %http__server__request in /usr/local/bin/node) You can now use it in all perf tools, such as: perf record -e sdt_node:http__server__request -aR sleep 1 Hint: SDT event can be directly recorded with 'perf record'. No need to create probe manually. Brendan
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
Thanks Masami for the review. Please find my replies to your comments. On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote: > On Fri, 24 Feb 2017 13:13:25 +0530 > Ravi Bangoriawrote: > >> After invoking 'perf record', behind the scenes, it checks whether >> the event specified is an SDT event by using the flag '%' or string >> 'sdt_'. After that, it first checks whether event already exists >> with that *name* in uprobe_events file. If found, it records that >> particular event. Otherwise, it does a lookup of the probe cache to >> find out the SDT event. If its not present, it throws an error. >> If found, it again tries to find existing events from uprobe_events >> file, but this time it uses *address* and *filename* for comparison. >> Finally it writes new events into the uprobe_events file and starts >> recording. It also maintains a list of the event names that were >> written to uprobe_events file for this session. After finishing the >> record session, it removes the events from the uprobe_events file >> using the maintained name list. > OK, the behavior looks good. However, the implementation seems > too complex, could you make it simpler? > I have some comments on the code, see below. Yes, it's complex code. It's all because of multiple SDT events with same name exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- which creates complex situations that has to be taken care. See my replies below for such examples. >> +/* >> + * Remove ith tev from pev->tevs list and shift remaining >> + * tevs(i+1 to pev->ntevs) one step. >> + */ >> +static void shift_pev(struct perf_probe_event *pev, int i) >> +{ >> +int j; >> + >> +free(pev->tevs[i].event); >> +free(pev->tevs[i].group); >> +free(pev->tevs[i].args); >> +free(pev->tevs[i].point.realname); >> +free(pev->tevs[i].point.symbol); >> +free(pev->tevs[i].point.module); >> + >> +/* >> + * If ith element is last element, no need to shift, >> + * just decrement pev->ntevs. >> + */ >> +if (i == pev->ntevs - 1) >> +goto ret; >> + >> +for (j = i; j < pev->ntevs - 1; j++) { >> +pev->tevs[j].event = pev->tevs[j + 1].event; >> +pev->tevs[j].group = pev->tevs[j + 1].group; >> +pev->tevs[j].nargs = pev->tevs[j + 1].nargs; >> +pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes; >> +pev->tevs[j].args = pev->tevs[j + 1].args; >> +pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname; >> +pev->tevs[j].point.symbol = pev->tevs[j + 1].point.symbol; >> +pev->tevs[j].point.module = pev->tevs[j + 1].point.module; >> +pev->tevs[j].point.offset = pev->tevs[j + 1].point.offset; >> +pev->tevs[j].point.address = pev->tevs[j + 1].point.address; >> +pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe; >> +} >> + >> +ret: >> +pev->ntevs--; >> +} > This code smells no good... > Why don't you make a list of newly added probes ? > Yes. This is little ugly. Consider a following scenario. $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider Provider: libpthread Name: mutex_entry Location: 0x9ddb, Base: 0x000139cc, ... -- Provider: libpthread Name: mutex_entry Location: 0xbcbb, Base: 0x000139cc, ... $ perf probe sdt_libpthread:mutex_entry $ cat /sys/kernel/debug/tracing/uprobe_events p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x9ddb p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0xbcbb $ perf probe -d sdt_libpthread:mutex_entry $ perf record -a -e sdt_libpthread:mutex_entry /* This should record: sdt_libpthread/mutex_entry <== New entry (delete at the end of session) sdt_libpthread/mutex_entry_1 <== Reuse (do not delete at the end of session) */ First of all, perf looks into uprobe_events for entry with sdt_libpthread:mutex_entry *name*. As it's not present, perf does a lookup in probe-cache and gets two entries. Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events entries with entries found from probe-cache with *filename* and *address* as comparison parameter (because this combo makes unique identifier for any SDT event in a particular system - please correct me if that's wrong). And perf finds one entry sdt_libpthread/mutex_entry_1. Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will add new entry for that event even if it exists. So this code removes that particular element of an array and shift all remaining elements one step. Please suggest if you have a better approach. >> + >> +/* Compare address and filename */ >> +static bool is_sdt_match(struct
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
Thanks Masami for the review. Please find my replies to your comments. On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote: > On Fri, 24 Feb 2017 13:13:25 +0530 > Ravi Bangoria wrote: > >> After invoking 'perf record', behind the scenes, it checks whether >> the event specified is an SDT event by using the flag '%' or string >> 'sdt_'. After that, it first checks whether event already exists >> with that *name* in uprobe_events file. If found, it records that >> particular event. Otherwise, it does a lookup of the probe cache to >> find out the SDT event. If its not present, it throws an error. >> If found, it again tries to find existing events from uprobe_events >> file, but this time it uses *address* and *filename* for comparison. >> Finally it writes new events into the uprobe_events file and starts >> recording. It also maintains a list of the event names that were >> written to uprobe_events file for this session. After finishing the >> record session, it removes the events from the uprobe_events file >> using the maintained name list. > OK, the behavior looks good. However, the implementation seems > too complex, could you make it simpler? > I have some comments on the code, see below. Yes, it's complex code. It's all because of multiple SDT events with same name exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- which creates complex situations that has to be taken care. See my replies below for such examples. >> +/* >> + * Remove ith tev from pev->tevs list and shift remaining >> + * tevs(i+1 to pev->ntevs) one step. >> + */ >> +static void shift_pev(struct perf_probe_event *pev, int i) >> +{ >> +int j; >> + >> +free(pev->tevs[i].event); >> +free(pev->tevs[i].group); >> +free(pev->tevs[i].args); >> +free(pev->tevs[i].point.realname); >> +free(pev->tevs[i].point.symbol); >> +free(pev->tevs[i].point.module); >> + >> +/* >> + * If ith element is last element, no need to shift, >> + * just decrement pev->ntevs. >> + */ >> +if (i == pev->ntevs - 1) >> +goto ret; >> + >> +for (j = i; j < pev->ntevs - 1; j++) { >> +pev->tevs[j].event = pev->tevs[j + 1].event; >> +pev->tevs[j].group = pev->tevs[j + 1].group; >> +pev->tevs[j].nargs = pev->tevs[j + 1].nargs; >> +pev->tevs[j].uprobes= pev->tevs[j + 1].uprobes; >> +pev->tevs[j].args = pev->tevs[j + 1].args; >> +pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname; >> +pev->tevs[j].point.symbol = pev->tevs[j + 1].point.symbol; >> +pev->tevs[j].point.module = pev->tevs[j + 1].point.module; >> +pev->tevs[j].point.offset = pev->tevs[j + 1].point.offset; >> +pev->tevs[j].point.address = pev->tevs[j + 1].point.address; >> +pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe; >> +} >> + >> +ret: >> +pev->ntevs--; >> +} > This code smells no good... > Why don't you make a list of newly added probes ? > Yes. This is little ugly. Consider a following scenario. $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider Provider: libpthread Name: mutex_entry Location: 0x9ddb, Base: 0x000139cc, ... -- Provider: libpthread Name: mutex_entry Location: 0xbcbb, Base: 0x000139cc, ... $ perf probe sdt_libpthread:mutex_entry $ cat /sys/kernel/debug/tracing/uprobe_events p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x9ddb p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0xbcbb $ perf probe -d sdt_libpthread:mutex_entry $ perf record -a -e sdt_libpthread:mutex_entry /* This should record: sdt_libpthread/mutex_entry <== New entry (delete at the end of session) sdt_libpthread/mutex_entry_1 <== Reuse (do not delete at the end of session) */ First of all, perf looks into uprobe_events for entry with sdt_libpthread:mutex_entry *name*. As it's not present, perf does a lookup in probe-cache and gets two entries. Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events entries with entries found from probe-cache with *filename* and *address* as comparison parameter (because this combo makes unique identifier for any SDT event in a particular system - please correct me if that's wrong). And perf finds one entry sdt_libpthread/mutex_entry_1. Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will add new entry for that event even if it exists. So this code removes that particular element of an array and shift all remaining elements one step. Please suggest if you have a better approach. >> + >> +/* Compare address and filename */ >> +static bool is_sdt_match(struct probe_trace_event *tev1, >> +
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Fri, 24 Feb 2017 13:13:25 +0530 Ravi Bangoriawrote: > From: Hemant Kumar > > Add support for directly recording SDT events which are present in > the probe cache. Without this patch, we could probe into SDT events > using 'perf probe' and 'perf record'. With this patch, we can probe > the SDT events directly using 'perf record'. > > For example : > > $ perf list sdt > sdt_libpthread:mutex_entry [SDT event] > sdt_libc:setjmp[SDT event] > ... > > $ perf record -a -e sdt_libc:setjmp > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] > > $ perf script > bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) > login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) > fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) > > Recording on SDT events with same provider and marker names is also > supported: > > $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider > Provider: libpthread > Name: mutex_entry > Location: 0x9ddb, Base: 0x000139cc, ... > -- > Provider: libpthread > Name: mutex_entry > Location: 0xbcbb, Base: 0x000139cc, ... > > $ perf record -a -e sdt_libpthread:mutex_entry > Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ] > > $ perf script > in:imjournal 442 [000] 6625.701833: sdt_libpthread:mutex_entry: > (7fb1a1940ddb) > rs:main Q:Reg 443 [001] 6625.701889: sdt_libpthread:mutex_entry_1: > (7fb1a1942cbb) > > > After invoking 'perf record', behind the scenes, it checks whether > the event specified is an SDT event by using the flag '%' or string > 'sdt_'. After that, it first checks whether event already exists > with that *name* in uprobe_events file. If found, it records that > particular event. Otherwise, it does a lookup of the probe cache to > find out the SDT event. If its not present, it throws an error. > If found, it again tries to find existing events from uprobe_events > file, but this time it uses *address* and *filename* for comparison. > Finally it writes new events into the uprobe_events file and starts > recording. It also maintains a list of the event names that were > written to uprobe_events file for this session. After finishing the > record session, it removes the events from the uprobe_events file > using the maintained name list. OK, the behavior looks good. However, the implementation seems too complex, could you make it simpler? I have some comments on the code, see below. [...] > diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c > index 1fcebc3..d255a6d 100644 > --- a/tools/perf/builtin-probe.c > +++ b/tools/perf/builtin-probe.c > @@ -45,6 +45,7 @@ > #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*" > #define DEFAULT_FUNC_FILTER "!_*" > #define DEFAULT_LIST_FILTER "*" > +#define MAX_EVENT_LENGTH 512 > > /* Session management structure */ > static struct { > @@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err) > pr_err("\n"); > } > > +/* Show how to use the event. */ > +static void record_syntax_hint(const char *group, const char *event) > +{ > + char ge[MAX_EVENT_LENGTH]; > + > + pr_info("\nYou can now use it in all perf tools, such as:\n\n"); > + pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event); > + > + snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event); > + if (is_sdt_event(ge)) > + pr_info("Hint: SDT event can be directly recorded with 'perf > record'. No need to create probe manually.\n"); > +} Here, we don't need this message because anyway perf_add_probe_events already created this probe... > + > static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs) > { > int ret; > @@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event > *pevs, int npevs) > } > > /* Note that it is possible to skip all events because of blacklist */ > - if (event) { > - /* Show how to use the event. */ > - pr_info("\nYou can now use it in all perf tools, such as:\n\n"); > - pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event); > - } > + if (event) > + record_syntax_hint(group, event); > > out_cleanup: > cleanup_perf_probe_events(pevs, npevs); > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index bc84a37..fa9b345 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -39,6 +39,7 @@ > #include "util/trigger.h" > #include
Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
On Fri, 24 Feb 2017 13:13:25 +0530 Ravi Bangoria wrote: > From: Hemant Kumar > > Add support for directly recording SDT events which are present in > the probe cache. Without this patch, we could probe into SDT events > using 'perf probe' and 'perf record'. With this patch, we can probe > the SDT events directly using 'perf record'. > > For example : > > $ perf list sdt > sdt_libpthread:mutex_entry [SDT event] > sdt_libc:setjmp[SDT event] > ... > > $ perf record -a -e sdt_libc:setjmp > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] > > $ perf script > bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) > login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) > fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) > > Recording on SDT events with same provider and marker names is also > supported: > > $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider > Provider: libpthread > Name: mutex_entry > Location: 0x9ddb, Base: 0x000139cc, ... > -- > Provider: libpthread > Name: mutex_entry > Location: 0xbcbb, Base: 0x000139cc, ... > > $ perf record -a -e sdt_libpthread:mutex_entry > Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ] > > $ perf script > in:imjournal 442 [000] 6625.701833: sdt_libpthread:mutex_entry: > (7fb1a1940ddb) > rs:main Q:Reg 443 [001] 6625.701889: sdt_libpthread:mutex_entry_1: > (7fb1a1942cbb) > > > After invoking 'perf record', behind the scenes, it checks whether > the event specified is an SDT event by using the flag '%' or string > 'sdt_'. After that, it first checks whether event already exists > with that *name* in uprobe_events file. If found, it records that > particular event. Otherwise, it does a lookup of the probe cache to > find out the SDT event. If its not present, it throws an error. > If found, it again tries to find existing events from uprobe_events > file, but this time it uses *address* and *filename* for comparison. > Finally it writes new events into the uprobe_events file and starts > recording. It also maintains a list of the event names that were > written to uprobe_events file for this session. After finishing the > record session, it removes the events from the uprobe_events file > using the maintained name list. OK, the behavior looks good. However, the implementation seems too complex, could you make it simpler? I have some comments on the code, see below. [...] > diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c > index 1fcebc3..d255a6d 100644 > --- a/tools/perf/builtin-probe.c > +++ b/tools/perf/builtin-probe.c > @@ -45,6 +45,7 @@ > #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*" > #define DEFAULT_FUNC_FILTER "!_*" > #define DEFAULT_LIST_FILTER "*" > +#define MAX_EVENT_LENGTH 512 > > /* Session management structure */ > static struct { > @@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err) > pr_err("\n"); > } > > +/* Show how to use the event. */ > +static void record_syntax_hint(const char *group, const char *event) > +{ > + char ge[MAX_EVENT_LENGTH]; > + > + pr_info("\nYou can now use it in all perf tools, such as:\n\n"); > + pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event); > + > + snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event); > + if (is_sdt_event(ge)) > + pr_info("Hint: SDT event can be directly recorded with 'perf > record'. No need to create probe manually.\n"); > +} Here, we don't need this message because anyway perf_add_probe_events already created this probe... > + > static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs) > { > int ret; > @@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event > *pevs, int npevs) > } > > /* Note that it is possible to skip all events because of blacklist */ > - if (event) { > - /* Show how to use the event. */ > - pr_info("\nYou can now use it in all perf tools, such as:\n\n"); > - pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event); > - } > + if (event) > + record_syntax_hint(group, event); > > out_cleanup: > cleanup_perf_probe_events(pevs, npevs); > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index bc84a37..fa9b345 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -39,6 +39,7 @@ > #include "util/trigger.h" > #include "util/perf-hooks.h" > #include "asm/bug.h" > +#include
[PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
From: Hemant KumarAdd support for directly recording SDT events which are present in the probe cache. Without this patch, we could probe into SDT events using 'perf probe' and 'perf record'. With this patch, we can probe the SDT events directly using 'perf record'. For example : $ perf list sdt sdt_libpthread:mutex_entry [SDT event] sdt_libc:setjmp[SDT event] ... $ perf record -a -e sdt_libc:setjmp ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] $ perf script bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) Recording on SDT events with same provider and marker names is also supported: $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider Provider: libpthread Name: mutex_entry Location: 0x9ddb, Base: 0x000139cc, ... -- Provider: libpthread Name: mutex_entry Location: 0xbcbb, Base: 0x000139cc, ... $ perf record -a -e sdt_libpthread:mutex_entry Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ] $ perf script in:imjournal 442 [000] 6625.701833: sdt_libpthread:mutex_entry: (7fb1a1940ddb) rs:main Q:Reg 443 [001] 6625.701889: sdt_libpthread:mutex_entry_1: (7fb1a1942cbb) After invoking 'perf record', behind the scenes, it checks whether the event specified is an SDT event by using the flag '%' or string 'sdt_'. After that, it first checks whether event already exists with that *name* in uprobe_events file. If found, it records that particular event. Otherwise, it does a lookup of the probe cache to find out the SDT event. If its not present, it throws an error. If found, it again tries to find existing events from uprobe_events file, but this time it uses *address* and *filename* for comparison. Finally it writes new events into the uprobe_events file and starts recording. It also maintains a list of the event names that were written to uprobe_events file for this session. After finishing the record session, it removes the events from the uprobe_events file using the maintained name list. Signed-off-by: Hemant Kumar Signed-off-by: Ravi Bangoria --- tools/lib/api/fs/tracing_path.c | 17 +- tools/perf/builtin-probe.c | 21 ++- tools/perf/builtin-record.c | 23 +++ tools/perf/perf.h | 2 + tools/perf/util/parse-events.c | 56 +- tools/perf/util/parse-events.h | 2 + tools/perf/util/probe-event.c | 44 - tools/perf/util/probe-event.h | 4 + tools/perf/util/probe-file.c| 376 tools/perf/util/probe-file.h| 27 +++ 10 files changed, 552 insertions(+), 20 deletions(-) diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 3e606b9..fa52e67 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, size_t size, * - jirka */ if (debugfs__configured() || tracefs__configured()) { - /* sdt markers */ - if (!strncmp(filename, "sdt_", 4)) { - snprintf(buf, size, - "Error:\tFile %s/%s not found.\n" - "Hint:\tSDT event cannot be directly recorded on.\n" - "\tPlease first use 'perf probe %s:%s' before recording it.\n", - tracing_events_path, filename, sys, name); - } else { - snprintf(buf, size, -"Error:\tFile %s/%s not found.\n" -"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", -tracing_events_path, filename); - } + snprintf(buf, size, +"Error:\tFile %s/%s not found.\n" +"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", +tracing_events_path, filename); break; } snprintf(buf, size, "%s", diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c index 1fcebc3..d255a6d 100644 --- a/tools/perf/builtin-probe.c +++
[PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
From: Hemant Kumar Add support for directly recording SDT events which are present in the probe cache. Without this patch, we could probe into SDT events using 'perf probe' and 'perf record'. With this patch, we can probe the SDT events directly using 'perf record'. For example : $ perf list sdt sdt_libpthread:mutex_entry [SDT event] sdt_libc:setjmp[SDT event] ... $ perf record -a -e sdt_libc:setjmp ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] $ perf script bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) Recording on SDT events with same provider and marker names is also supported: $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider Provider: libpthread Name: mutex_entry Location: 0x9ddb, Base: 0x000139cc, ... -- Provider: libpthread Name: mutex_entry Location: 0xbcbb, Base: 0x000139cc, ... $ perf record -a -e sdt_libpthread:mutex_entry Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ] $ perf script in:imjournal 442 [000] 6625.701833: sdt_libpthread:mutex_entry: (7fb1a1940ddb) rs:main Q:Reg 443 [001] 6625.701889: sdt_libpthread:mutex_entry_1: (7fb1a1942cbb) After invoking 'perf record', behind the scenes, it checks whether the event specified is an SDT event by using the flag '%' or string 'sdt_'. After that, it first checks whether event already exists with that *name* in uprobe_events file. If found, it records that particular event. Otherwise, it does a lookup of the probe cache to find out the SDT event. If its not present, it throws an error. If found, it again tries to find existing events from uprobe_events file, but this time it uses *address* and *filename* for comparison. Finally it writes new events into the uprobe_events file and starts recording. It also maintains a list of the event names that were written to uprobe_events file for this session. After finishing the record session, it removes the events from the uprobe_events file using the maintained name list. Signed-off-by: Hemant Kumar Signed-off-by: Ravi Bangoria --- tools/lib/api/fs/tracing_path.c | 17 +- tools/perf/builtin-probe.c | 21 ++- tools/perf/builtin-record.c | 23 +++ tools/perf/perf.h | 2 + tools/perf/util/parse-events.c | 56 +- tools/perf/util/parse-events.h | 2 + tools/perf/util/probe-event.c | 44 - tools/perf/util/probe-event.h | 4 + tools/perf/util/probe-file.c| 376 tools/perf/util/probe-file.h| 27 +++ 10 files changed, 552 insertions(+), 20 deletions(-) diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 3e606b9..fa52e67 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, size_t size, * - jirka */ if (debugfs__configured() || tracefs__configured()) { - /* sdt markers */ - if (!strncmp(filename, "sdt_", 4)) { - snprintf(buf, size, - "Error:\tFile %s/%s not found.\n" - "Hint:\tSDT event cannot be directly recorded on.\n" - "\tPlease first use 'perf probe %s:%s' before recording it.\n", - tracing_events_path, filename, sys, name); - } else { - snprintf(buf, size, -"Error:\tFile %s/%s not found.\n" -"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", -tracing_events_path, filename); - } + snprintf(buf, size, +"Error:\tFile %s/%s not found.\n" +"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", +tracing_events_path, filename); break; } snprintf(buf, size, "%s", diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c index 1fcebc3..d255a6d 100644 --- a/tools/perf/builtin-probe.c +++ b/tools/perf/builtin-probe.c @@ -45,6 +45,7 @@ #define DEFAULT_VAR_FILTER "!__k???tab_* &