❦ 7 mai 2014 22:19 +0200, Willy Tarreau <w...@1wt.eu> : >> Here is a proof of concept. To test, use `make TARGET=linux2628 >> USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like >> that. Then, there is a quick example in example/haproxy.stp. > > Interesting, but just for my understanding, what does it provide beyond > building with "TRACE=1" where the compiler dumps *all* function calls, > and not only those that were instrumented ? I'm asking because I never > used dtrace, so I'm totally ignorant here.
See below. >> The trick with those tracepoints is that they are just NOOP until you >> enable them. So, even when someone compiles dtrace support, they will >> not have any performance impact until trying to use the tracepoints. > > Well, they will at least have the performance impact of the "if" which > disables them and the inflated/reordered functions I guess! So at least > we have to be reasonable not to put them everywhere (eg: not in the > polling loops nor in the scheduler). No, they are really just NOP. They are registered in some part of the ELF executable and when the tracepoint is activated, the NOP is replaced by a JMP. When arguments are expensive to build, there is the possibility to test if the probe is enabled, but in this case, even when the probe is not enabled, there is a cost. So, better keep the arguments simple. I cannot find a link which explains that clearly (I am pretty sure there was an article on LWN for that). I can show you the result: $ readelf -x .note.stapsdt ./haproxy Hex dump of section '.note.stapsdt': 0x00000000 08000000 3d000000 03000000 73746170 ....=.......stap 0x00000010 73647400 af7c4200 00000000 6f634800 sdt..|B.....ocH. 0x00000020 00000000 b09e6900 00000000 68617072 ......i.....hapr 0x00000030 6f787900 66726f6e 74656e64 5f616363 oxy.frontend_acc 0x00000040 65707400 38403235 36382825 72617829 ept.8@2568(%rax) 0x00000050 00000000 .... systemtap/dtrace are able to read this section: $ stap -L 'process("./haproxy").mark("*")' process("./haproxy").mark("frontend_accept") $arg1:long (all arguments are seen as long/pointer because this is not something encoded) gdb is also able to use them: (gdb) info probes Provider Name Where Semaphore Object haproxy frontend_accept 0x0000000000427caf 0x0000000000699eb0 /home/bernat/code/dailymotion/haproxy/haproxy (gdb) disassemble frontend_accept Dump of assembler code for function frontend_accept: 0x0000000000427c90 <+0>: push %r14 0x0000000000427c92 <+2>: push %r13 0x0000000000427c94 <+4>: push %r12 0x0000000000427c96 <+6>: push %rbp 0x0000000000427c97 <+7>: push %rbx 0x0000000000427c98 <+8>: mov %rdi,%rbx 0x0000000000427c9b <+11>: add $0xffffffffffffff80,%rsp 0x0000000000427c9f <+15>: mov 0x270(%rdi),%r12 0x0000000000427ca6 <+22>: mov 0x20(%rdi),%rax 0x0000000000427caa <+26>: mov 0x34(%r12),%ebp 0x0000000000427caf <+31>: nop 0x0000000000427cb0 <+32>: mov 0x30(%rdi),%rax 0x0000000000427cb4 <+36>: movq $0x0,0x2f0(%rdi) 0x0000000000427cbf <+47>: movq $0x0,0x2e8(%rdi) [...] See the nop at 427caf? So the main interest of those probes are: * low overhead, they can be left in production to be here when you really need them * discoverable, someone not tech-savvy enough to read the source can list them and decide which ones to enable because someone more tech-savvy chosed them >> While the probe arguments can be anything, it is simpler to only keep >> simple types like null-terminated strings or int. Otherwise, they are >> difficult to exploit. If you put struct, without the debug symbols, the >> data is not exploitable. >> >> Now, all the hard work is to put trace points everywhere. > > That's where gcc does the stuff free of charge in fact. I still tend to > be cautious about what the debugging code becomes over time, because we > had this twice, once with the DPRINTF() macro which was never up to date, > and once with the http_silent_debug() macro which became so unbalanced > over time that I recently totally removed it. Yes, this is a big problem. In the kernel where a similar mechanism exists, some maintainers are reluctant to provide tracepoints because they would become part of the user/kernel interface and have to be maintained which is a lot of work. >> But they can also be put in places where logs >> would be too verbose. I currently don't have interest in doing that but >> if someone is willing too, it is only a matter of defining the probes in >> probes.d and placing them in the C code. This is really nifty to debug >> stuff in production. However, I think that people interested in that can >> also use debug symbols to place probe at any place they want to. GCC is >> now better at providing debug symbols which work on optimized >> executables. Ubuntu is providing debug symbols for almost >> everything. Tracepoints are still interesting as they can be listed and >> they are hand-picked. > > That was the principle of the http_silent_debug() in fact. Just to know > where we passed, in which order at a low cost. But I think I failed at it > by trying to maintain this code stable, while in practice we probably only > need something properly instrumented to easily add new tracepoints when > needed. Maybe your patch can be a nice step forward in that direction, I > have no idea. It's not intrusive, that's possibly something we can merge > and see if it is quickly adopted or not. In its current form, it is too far limited. We can wait for more people asking for such support and have someone which will at least add the minimal instrumentation at key points in the code. The patch is likely to stay up-to-date for quite some time since it is small and relying on a cross-OS "frozen" mechanism, so no hurry, it should continue to work when we really need it. I am a big fan of systemtap but I usually rely on debug symbols since so few programs have tracepoints and I can usually understand the code. For interpreted languages, tracepoints are for more interesting since understanding a VM is more complex than understanding a regular program. But tracepoints being discoverable, they are easier to use than debug symbols. Debug symbols are available for Ubuntu and Redhat (I mean debug symbols that match the installed packages). I wanted to do the same thing for Debian, but no time to go forward on this. This is a huge difficulty. You want to know something, you don't have the debug symbols, you need to recompile, stop, start and maybe the problem will be gone before you had a chance to debug it. The tracepoints are quite convenient as we can ask distributions to enable them as soon as they exist. While GCC has made some good progress to generate good debug symbols, you can still get missing symbols (missing arguments, missing local variables) or errors. Usually, they happen right when you need them. ;-) -- Use uniform input formats. - The Elements of Programming Style (Kernighan & Plauger)