> On Oct 6, 2017, at 12:10 AM, Clark, Gilbert <[email protected]> wrote:
>
> I'll note that one of the challenges with profiling is that there are the bro
> scripts, and then there is the bro engine. The scripting layer has a
> completely different set of optimizations that might make sense than the
> engine does: turning off / turning on / tweaking different scripts can have a
> huge impact on Bro's relative performance depending on the frequency with
> which those script fragments are executed. Thus, one way to look at speeding
> things up might be to take a look at the scripts that are run most often and
> seeing about ways to accelerate core pieces of them ... possibly by moving
> pieces of those scripts to builtins (as C methods).
>
Re: scripts, I have some code I put together to do arbitrary benchmarks of
templated bro scripts. I need to clean it up and publish it, but I found some
interesting things. Function calls are relatively slow.. so things like
ip in Site::local_nets
Is faster than calling
Site::is_local_addr(ip);
inlining short functions could speed things up a bit.
I also found that things like
port == 22/tcp || port == 3389/tcp
Is faster than checking if port in {22/tcp,3389/tcp}.. up to about 10 ports..
Having the hash class fallback to a linear search when the hash only contains
few items could speed things up there. Things like 'likely_server_ports' have
1 or 2 ports in most cases.
> If I had to guess at one engine-related thing that would've sped things up
> when I was profiling this stuff back in the day, it'd probably be rebuilding
> the memory allocation strategy / management. From what I remember, Bro does
> do some malloc / free in the data path, which hurts quite a bit when one is
> trying to make things go fast. It also means that the selection of a memory
> allocator and NUMA / per-node memory management is going to be important.
> That's probably not going to qualify as something *small*, though ...
Ah! This reminds me of something I was thinking about a few weeks ago. I'm
not sure to what extent bro uses memory allocation pools/interning for common
immutable data structures. Like for port objects or small strings. There's no
reason bro should be mallocing/freeing memory to create port objects when they
are only 65536 times 2 (or 3?) port objects... but bro does things like
tcp_hdr->Assign(0, new PortVal(ntohs(tp->th_sport), TRANSPORT_TCP));
tcp_hdr->Assign(1, new PortVal(ntohs(tp->th_dport), TRANSPORT_TCP));
For every packet. As well as allocating a ton of TYPE_COUNT vals for things
like packet sizes and header lengths.. which will almost always be between 0
and 64k.
For things that can't be interned, like ipv6 address, having an allocation pool
could speed things up... Instead of freeing things like IPAddr objects they
could just be returned to a pool, and then when a new IPAddr object is needed,
an already initialized object could be grabbed from the pool and 'refreshed'
with the new value.
https://golang.org/pkg/sync/#Pool
Talks about that sort of thing.
> On a related note, a fun experiment is always to try running bro with a
> different allocator and seeing what happens ...
I recently noticed our boxes were using jemalloc instead of tcmalloc..
Switching that caused malloc to drop a few places down in 'perf top' output.
—
Justin Azoff
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev