> On Oct 6, 2017, at 5:59 PM, Jim Mellander <[email protected]> wrote:
> 
> I particularly like the idea of an allocation pool that per-packet 
> information can be stored, and reused by the next packet.
> 
> There also are probably some optimizations of frequent operations now that 
> we're in a 64-bit world that could prove useful - the one's complement 
> checksum calculation in net_util.cc is one that comes to mind, especially 
> since it works effectively a byte at a time (and works with even byte counts 
> only).  Seeing as this is done per-packet on all tcp payload, optimizing this 
> seems reasonable.  Here's a discussion of do the checksum calc in 64-bit 
> arithmetic: https://locklessinc.com/articles/tcp_checksum/ -

So I still haven't gotten this to work, but I did some more tests that I think 
show it is worthwhile to look into replacing this function.

I generated a large pcap of a 3 minute iperf run:

    $ du -hs iperf.pcap
    9.6G        iperf.pcap
    $ tcpdump  -n -r iperf.pcap |wc -l
    reading from file iperf.pcap, link-type EN10MB (Ethernet)
    7497698

Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well 
as cpu instructions reported by `perf`:

    $ python2 bench.py 5 bro -Cbr iperf.pcap
    15.19 49947664388
    15.66 49947827678
    15.74 49947853306
    15.66 49949603644
    15.42 49951191958
    elapsed
    Min 15.18678689
    Max 15.7425909042
    Avg 15.5343231678
    
    instructions
    Min 49947664388
    Max 49951191958
    Avg 49948828194
    
    $ python2 bench.py 5 bro -br iperf.pcap
    20.82 95502327077
    21.31 95489729078
    20.52 95483242217
    21.45 95499193001
    21.32 95498830971
    elapsed
    Min 20.5184400082
    Max 21.4452238083
    Avg 21.083449173
    
    instructions
    Min 95483242217
    Max 95502327077
    Avg 95494664468


So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds 
is spent computing checksums.

According to https://locklessinc.com/articles/tcp_checksum/, they run their 
benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.

Their runtime starts out at about 11s, which puts it in line with the current 
implementation in bro.  The other implementations they show are
between 7 and 10x faster depending on packet size.  A 90% drop in time spent 
computing checksums would be a noticeable improvement.


Unfortunately I couldn't get their implementation to work inside of bro and get 
the right result, and even if I could, it's not clear what the license for the 
code is.





— 
Justin Azoff


_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to