> On Oct 6, 2017, at 5:59 PM, Jim Mellander <[email protected]> wrote:
>
> I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
>
> There also are probably some optimizations of frequent operations now that
> we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte counts
> only). Seeing as this is done per-packet on all tcp payload, optimizing this
> seems reasonable. Here's a discussion of do the checksum calc in 64-bit
> arithmetic: https://locklessinc.com/articles/tcp_checksum/ -
So I still haven't gotten this to work, but I did some more tests that I think
show it is worthwhile to look into replacing this function.
I generated a large pcap of a 3 minute iperf run:
$ du -hs iperf.pcap
9.6G iperf.pcap
$ tcpdump -n -r iperf.pcap |wc -l
reading from file iperf.pcap, link-type EN10MB (Ethernet)
7497698
Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well
as cpu instructions reported by `perf`:
$ python2 bench.py 5 bro -Cbr iperf.pcap
15.19 49947664388
15.66 49947827678
15.74 49947853306
15.66 49949603644
15.42 49951191958
elapsed
Min 15.18678689
Max 15.7425909042
Avg 15.5343231678
instructions
Min 49947664388
Max 49951191958
Avg 49948828194
$ python2 bench.py 5 bro -br iperf.pcap
20.82 95502327077
21.31 95489729078
20.52 95483242217
21.45 95499193001
21.32 95498830971
elapsed
Min 20.5184400082
Max 21.4452238083
Avg 21.083449173
instructions
Min 95483242217
Max 95502327077
Avg 95494664468
So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds
is spent computing checksums.
According to https://locklessinc.com/articles/tcp_checksum/, they run their
benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.
Their runtime starts out at about 11s, which puts it in line with the current
implementation in bro. The other implementations they show are
between 7 and 10x faster depending on packet size. A 90% drop in time spent
computing checksums would be a noticeable improvement.
Unfortunately I couldn't get their implementation to work inside of bro and get
the right result, and even if I could, it's not clear what the license for the
code is.
—
Justin Azoff
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev