Re: haproxy 1.5-dev24: 100% CPU Load or Core Dumped

Vincent Bernat Wed, 07 May 2014 13:57:21 -0700

 ❦  7 mai 2014 22:19 +0200, Willy Tarreau <w...@1wt.eu> :

>> Here is a proof of concept. To test, use `make TARGET=linux2628
>> USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like
>> that. Then, there is a quick example in example/haproxy.stp.
>
> Interesting, but just for my understanding, what does it provide beyond
> building with "TRACE=1" where the compiler dumps *all* function calls,
> and not only those that were instrumented ? I'm asking because I never
> used dtrace, so I'm totally ignorant here.


See below.

>> The trick with those tracepoints is that they are just NOOP until you
>> enable them. So, even when someone compiles dtrace support, they will
>> not have any performance impact until trying to use the tracepoints.
>
> Well, they will at least have the performance impact of the "if" which
> disables them and the inflated/reordered functions I guess! So at least
> we have to be reasonable not to put them everywhere (eg: not in the
> polling loops nor in the scheduler).

No, they are really just NOP. They are registered in some part of the
ELF executable and when the tracepoint is activated, the NOP is
replaced by a JMP.

When arguments are expensive to build, there is the possibility to test
if the probe is enabled, but in this case, even when the probe is not
enabled, there is a cost. So, better keep the arguments simple.

I cannot find a link which explains that clearly (I am pretty sure there
was an article on LWN for that). I can show you the result:

$ readelf -x .note.stapsdt ./haproxy

Hex dump of section '.note.stapsdt':
  0x00000000 08000000 3d000000 03000000 73746170 ....=.......stap
  0x00000010 73647400 af7c4200 00000000 6f634800 sdt..|B.....ocH.
  0x00000020 00000000 b09e6900 00000000 68617072 ......i.....hapr
  0x00000030 6f787900 66726f6e 74656e64 5f616363 oxy.frontend_acc
  0x00000040 65707400 38403235 36382825 72617829 ept.8@2568(%rax)
  0x00000050 00000000                            ....

systemtap/dtrace are able to read this section:

$ stap -L 'process("./haproxy").mark("*")'
process("./haproxy").mark("frontend_accept") $arg1:long

(all arguments are seen as long/pointer because this is not something
encoded)

gdb is also able to use them:

(gdb) info probes
Provider Name            Where              Semaphore          Object           
                             
haproxy  frontend_accept 0x0000000000427caf 0x0000000000699eb0 
/home/bernat/code/dailymotion/haproxy/haproxy 
(gdb) disassemble frontend_accept 
Dump of assembler code for function frontend_accept:
   0x0000000000427c90 <+0>:     push   %r14
   0x0000000000427c92 <+2>:     push   %r13
   0x0000000000427c94 <+4>:     push   %r12
   0x0000000000427c96 <+6>:     push   %rbp
   0x0000000000427c97 <+7>:     push   %rbx
   0x0000000000427c98 <+8>:     mov    %rdi,%rbx
   0x0000000000427c9b <+11>:    add    $0xffffffffffffff80,%rsp
   0x0000000000427c9f <+15>:    mov    0x270(%rdi),%r12
   0x0000000000427ca6 <+22>:    mov    0x20(%rdi),%rax
   0x0000000000427caa <+26>:    mov    0x34(%r12),%ebp
   0x0000000000427caf <+31>:    nop
   0x0000000000427cb0 <+32>:    mov    0x30(%rdi),%rax
   0x0000000000427cb4 <+36>:    movq   $0x0,0x2f0(%rdi)
   0x0000000000427cbf <+47>:    movq   $0x0,0x2e8(%rdi)
   [...]

See the nop at 427caf?

So the main interest of those probes are:

 * low overhead, they can be left in production to be here when you
   really need them
 * discoverable, someone not tech-savvy enough to read the source can
   list them and decide which ones to enable because someone more
   tech-savvy chosed them

>> While the probe arguments can be anything, it is simpler to only keep
>> simple types like null-terminated strings or int. Otherwise, they are
>> difficult to exploit. If you put struct, without the debug symbols, the
>> data is not exploitable.
>> 
>> Now, all the hard work is to put trace points everywhere.
>
> That's where gcc does the stuff free of charge in fact. I still tend to
> be cautious about what the debugging code becomes over time, because we
> had this twice, once with the DPRINTF() macro which was never up to date,
> and once with the http_silent_debug() macro which became so unbalanced
> over time that I recently totally removed it.

Yes, this is a big problem. In the kernel where a similar mechanism
exists, some maintainers are reluctant to provide tracepoints because
they would become part of the user/kernel interface and have to be
maintained which is a lot of work.

>> But they can also be put in places where logs
>> would be too verbose. I currently don't have interest in doing that but
>> if someone is willing too, it is only a matter of defining the probes in
>> probes.d and placing them in the C code. This is really nifty to debug
>> stuff in production. However, I think that people interested in that can
>> also use debug symbols to place probe at any place they want to. GCC is
>> now better at providing debug symbols which work on optimized
>> executables. Ubuntu is providing debug symbols for almost
>> everything. Tracepoints are still interesting as they can be listed and
>> they are hand-picked.
>
> That was the principle of the http_silent_debug() in fact. Just to know
> where we passed, in which order at a low cost. But I think I failed at it
> by trying to maintain this code stable, while in practice we probably only
> need something properly instrumented to easily add new tracepoints when
> needed. Maybe your patch can be a nice step forward in that direction, I
> have no idea. It's not intrusive, that's possibly something we can merge
> and see if it is quickly adopted or not.

In its current form, it is too far limited. We can wait for more people
asking for such support and have someone which will at least add the
minimal instrumentation at key points in the code. The patch is likely
to stay up-to-date for quite some time since it is small and relying on
a cross-OS "frozen" mechanism, so no hurry, it should continue to work
when we really need it.

I am a big fan of systemtap but I usually rely on debug symbols since so
few programs have tracepoints and I can usually understand the code.

For interpreted languages, tracepoints are for more interesting since
understanding a VM is more complex than understanding a regular
program. But tracepoints being discoverable, they are easier to use than
debug symbols.

Debug symbols are available for Ubuntu and Redhat (I mean debug symbols
that match the installed packages). I wanted to do the same thing for
Debian, but no time to go forward on this. This is a huge
difficulty. You want to know something, you don't have the debug
symbols, you need to recompile, stop, start and maybe the problem will
be gone before you had a chance to debug it. The tracepoints are quite
convenient as we can ask distributions to enable them as soon as they
exist.

While GCC has made some good progress to generate good debug symbols,
you can still get missing symbols (missing arguments, missing local
variables) or errors. Usually, they happen right when you need them. ;-)
-- 
Use uniform input formats.
            - The Elements of Programming Style (Kernighan & Plauger)

Re: haproxy 1.5-dev24: 100% CPU Load or Core Dumped

Reply via email to