Re: Query logging performance

2015-08-04 Thread W.C.A. Wijngaards via Unbound-users
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Darren,

On 03/08/15 19:50, Darren Spruell via Unbound-users wrote:
 Unbound's documentation mentions that query logging can have very 
 adverse performance on server operation. I was curious if the
 project feels this has been optimized to the degree possible
 already, or if an approach similar to what some other projects take
 may be beneficial; namely something like delegating logging
 responsibilities to a different thread (Suricata IDS engine, I
 think) or even using a separate log output process (Squid [1],
 OpenBSD's PF/pflogd(8) [2]).
 
 Alternately, is dnstap [3] the preferred direction for this?
 Depending on the implementation, the difference in complexity
 between a fast, native textual query log on the server vs. a dnstap
 configuration could be a factor.

Dnstap is implemented, configure --enable-dnstap.

Make sure to install the dependencies, libfstrm and protobuf-c. In
unbound.conf it looks a bit like this:

dnstap:
dnstap-enable: yes
dnstap-socket-path: /tmp/mysock

Best regards,
   Wouter

 
 [1] http://www.squid-cache.org/Doc/config/logfile_daemon/ [2]
 http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man8/pflogd.8?q
uery=pflogdsec=8

 
[3] http://dnstap.info/
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVwGjDAAoJEJ9vHC1+BF+NFJQP/i10HHNU7LRoGncHeCIFPO39
+rjqjBq0isUj8G/tvd1flSa/KLTXDuvX9cVi/Yb1bw3h6hqhqlTwdacGiyhZLFV2
RO687zeCSEU/erGLFfc150F0dUuXRisrtP4CIXPvMBLVF6Xhs+uGwxzxSu6z5/xB
P/TaQKIqSZh2FCGb1KxMoSbNq4Epzr5JsGdABYejz9cC35R7CjhD7hND80cfqqOt
WWM3e1qLp4YpptEMf0Pz7QcftMgnL1f5451joz01jmu7a/mr8FEb6RFv4x9MHftn
QDM2Xt7k+E1OAmzD++/TCUL3fSVk9SyO7Lf4FObN5KDPDzcHraBzy9L8b15n4air
UCM0oQJzqiP/MllWNzkuQrT19K4jzdgvP3lIeDnvgYd/vhGoxrWuzSEji5sJhVvq
jRGphcxKN+P2DTJWTwAaCxC8ELwizFRkCjpwPBDLFBzWeTbaxT97fSoWzQJAdojP
BX1wGsZix3l1qDjYSATlFuyLVRBuSJa4F5Sgm2NO/ddH0ue+Ah38XIA5rZNJzbw2
Nhr4h4xKUdP9yYj2qmKlqzvjh+YGzMW2R9/4FL/iT9J2mhQh1OjLOJM1sP+ls7M7
txWmr1uTAMiZKR6SewtZt7BOyTDDwZhAAZNFJ6P2VKB4SykQ07d0W1BUQoOUk9Ym
DSUNSLFKqpBwpzdVXK07
=JzY3
-END PGP SIGNATURE-


Re: Query logging performance

2015-08-04 Thread Robert Edmonds via Unbound-users
W.C.A. Wijngaards via Unbound-users wrote:
 On 03/08/15 19:50, Darren Spruell via Unbound-users wrote:
  Unbound's documentation mentions that query logging can have very 
  adverse performance on server operation. I was curious if the
  project feels this has been optimized to the degree possible
  already, or if an approach similar to what some other projects take
  may be beneficial; namely something like delegating logging
  responsibilities to a different thread (Suricata IDS engine, I
  think) or even using a separate log output process (Squid [1],
  OpenBSD's PF/pflogd(8) [2]).

Hi, Darren:

I looked into how existing DNS servers (BIND, Unbound) implement query
logging when I was originally implementing dnstap support in Unbound.
There are some brief notes on slides 6 and 7 in this slide deck:

http://dnstap.info/slides/dnstap_nanog60.pdf

Benchmarks are shown on slides 36 and 37.  Query logging just kills the
performance of these two servers.

BIND and Unbound are very similar in how their query logging is
performed, though the actual implementations are very different.
It basically comes down to:

 (1) Formatting the query into a textual log entry, usually using a
 printf() family function.

 (2) Using a stdio function or syslog() to export the log entry to a
 file or socket.

 (3) Doing this from the worker thread responsible for processing the
 query.

(The basic idea dates all the way back to BIND 4:
http://dnstap.info/slides/dnstap.html#(8).)

Using printf() family functions in (1) is bad, since format strings have
to be parsed for each entry.  If your log format is highly customizable
or has a lot of fields, etc. you can end up taking a not insubstantial
performance hit.  This can be optimized a bit with custom formatting
code.

(2) + (3) is where the big performance hit comes from, since the C
library is ultimately hiding mutexes in the internal implementations of
typical functions like fprintf() and syslog() used to write the log
output.  With heavily threaded servers like BIND and Unbound, (3) causes
the contention for those mutexes to impact the performance of the
server; you may have many worker threads running and serving clients,
but they will all be contending for the same FILE* or syslog mutex.
(Though, in the syslog case, there's no particularly good reason that
all the threads in a process have to contend for access to the same
socket to the syslog server, other than because that's how the syslog()
implementation provided by the system works.)

If you're only interested in acquiring raw query packets, query logging
in the DNS server is probably not the best approach; there are a variety
of kernel facilities for acquiring that data without affecting the
performance of the DNS server too badly, like BPF/AF_PACKET (pcap),
ulogd, pflogd, etc., depending on kernel.  However, there is additional
metadata that can be exported via in-server logging that doesn't appear
on the wire, stuff like cache status and bailiwick information (the
latter is useful for passive DNS).

The dnstap implementation in Unbound is actually a combination of three
different things: the dnstap.pb protobuf schema itself [0], the
protobuf-c [1] implementation of Protocol Buffers for binary message
serialization, and the fstrm library [2] which provides a dedicated
logging thread and AF_UNIX transport.

[0] https://github.com/dnstap/dnstap.pb

[1] https://github.com/protobuf-c/protobuf-c

[2] https://github.com/farsightsec/fstrm

So, using binary protobuf messages rather than printf() style formatting
addresses (1) above, though protobuf serialization performance can still
be a concern.  The lock contention in (2) and (3) are addressed by the
use of a dedicated logging thread provided by fstrm that provides each
worker thread a lockless way to enqueue log messages.

However, all of the logging work can't be deferred to the logging
thread, so (3) is still a concern.  Namely, the creation of the log
message itself (and any needed memory allocations, memory copies, etc.)
still has to be done by the worker thread.  Since the logging thread
performs work asynchronously to the worker threads, we can't defer 100%
of the work of creating the log message to the point at which the log
data is actually written out: needed data fields that would go into the
rendered log message may have been free()'d and would no longer be
available.  This is because Unbound is written in C and doesn't use any
sort of refcounting or garbage collection scheme that we could latch
onto to avoid the overhead of those memory copies, so we have to do log
message rendering in the worker thread.  And this applies regardless of
whether the log format is text or binary.

In practice, though, that residual overhead from needing to render the
log message in the worker thread is not that bad, see slide 38 in the
slide deck linked above (compare the 'unbound' or
'unbound-dnstap-disabled' plots against the 'unbound-dnstap-discard'
plot).

  Alternately,