Re: [Bro-Dev] early performance comparisons of CAF-based run loop

2017-04-14 Thread Siwek, Jon

> On Apr 14, 2017, at 8:32 AM, Robin Sommer  wrote:
> 
> - I don't think we should spend time anymore on improving the old
>  communication code. We're getting close to retire that now and a
>  number of its issues (like selects in the child process) will just
>  go away with that. Let's focus on the new setting where Broker/CAF
>  will be doing all communication.

If people are hitting the 1024 FD hard-limit in the old comm. code’s select(), 
that would indeed go away with the change to Broker.  But I think the way 
Broker is integrated in the parent’s main loop still relies on a select(), with 
the number of FDs it monitors scaling with the number of peers.  i.e. there may 
still be critical errors w/ large Bro clusters even using Broker as the 
communication system, just this time the problem manifests in the main loop.

Just mentioning it in case you didn’t account for the real fix also requiring 
the CAF-based loop being fully realized in addition to Broker — I’m less 
certain about the timeline of finishing up the CAF-based loop compared to just 
patching in a temporary stopgap of patching out the select() calls.  (Also 
don’t have a sense of the frequency/urgency of the problem).

> - Regarding optimizing for different use cases: I would prefer
>  avoiding having lots of knobs to configure the specifics of the
>  loop. We have these magic values in the current I/O loop where
>  nobody knows how to pick them because it's hard to understand their
>  impact; and where folks have played with them, it was always hard
>  conclude much about them beyond any specific setting. What we could
>  try instead is a loop that adjusts itself based on load patterns: if
>  the load is heavy on packets, build larger batches to process
>  between polls; if input comes from lots different sources, increase
>  the polling; etc.

That seems like a Good Idea.

>  it does pose the question if/how can
>  integrate packet sources that either don't need or don't support
>  select/poll

I think that’s just a matter of making sure the main loop “spins” at an 
appropriate frequency, which might change dynamically, dependent on loading 
pattern optimizations, as per the above idea.

Maybe you could even think of reading an offline pcap file as a source that 
doesn’t need select/poll.  Pedantically, regular files also don't “support” 
select(), at least not w/ the same intention (nonblocking IO), but it just 
happens to work fine in the current runloop implementation.

So since I’ve been able to get the CAF-based loop working on offline pcap files 
(it does not rely on polling the FD of the open file since it didn't work 
anyway w/ CAF's epoll-based multiplexer on Linux), it may be fair to say that 
other packet sources that don’t require/support poll-ability should also be 
possible to integrate.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] early performance comparisons of CAF-based run loop

2017-04-14 Thread Robin Sommer
Nice, thanks for the doing these measurements! I haven't looked at the
code yet, but some quick thoughts on your results and some of the
other comments this thread, and then some suggested next steps at the
end.

- Agree that overall your numbers suggest that all these mechanisms
  are fine performancewise, assuming we keep the optimization to batch
  packets between polls/selects to avoid the
  one-system-call-per-packet overhead.

- I don't think we should spend time anymore on improving the old
  communication code. We're getting close to retire that now and a
  number of its issues (like selects in the child process) will just
  go away with that. Let's focus on the new setting where Broker/CAF
  will be doing all communication.

- Regarding optimizing for different use cases: I would prefer
  avoiding having lots of knobs to configure the specifics of the
  loop. We have these magic values in the current I/O loop where
  nobody knows how to pick them because it's hard to understand their
  impact; and where folks have played with them, it was always hard
  conclude much about them beyond any specific setting. What we could
  try instead is a loop that adjusts itself based on load patterns: if
  the load is heavy on packets, build larger batches to process
  between polls; if input comes from lots different sources, increase
  the polling; etc. Any heuristic here would need to stay quite simple
  (otherwise we'd again end up not being able to predict much), but I
  think that'd be worth a try.

- Gilbert's point on high-performance IPC is a good one. I don't think
  we want to switch to direct memory access as our main model for the
  time being at least, but it does pose the question if/how can
  integrate packet sources that either don't need or don't support
  select/poll. (Which, in a nod to history, accounts for some of the
  complexities of the current loop because many years ago some pcaps
  didn't support select)


In terms of next steps, we need to see if these results hold across
different OSs, and also with live traffic. The two questions are (1)
does the new loop function on all platforms with both low- and
high-volume live traffic (presumably it will but that needs double
checking, given the history of weird OS-specific effects); and (2)
does performance match the measurements shown so far? If we can
confirm that on at least Linux and FreeBSD for, say, the two most
recent major releases of each and also consider common alternative
capturing solutions (pfring, netmap, afnet?), I'd be pretty
comfortable switching.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev