Just starting a discussion to take inventory of the current problems with Bro’s 
main loop and ideas for how to improve it.  Let’s begin with a list of issues 
(please comment if you have additions):

(1) It uses select(), which is the worst polling mechanism.  It has an upper 
limit on number of fds that can be polled (some OSs are fixed at 1024), and 
also scales poorly.  This is problematic for Bro clusters that have many 
nodes/peers.

2) Integrating new I/O sources isn’t always straightforward from a technical 
standpoint (e.g. see [1]).  I also found that it’s difficult to understand the 
ramifications of any change to the run loop without also digging into esoteric 
details you may not initially think are related (e.g. I usually had to 
double-check the internals of I/O or threading systems when making any change 
to the main loop, which may mean there's basic problems with those 
abstractions).

3) Bro’s time/timers are coupled with I/O.  Time does not move forward unless 
there is an active I/O source.  This isn’t usually a functional problem for 
users, but devs occasionally have to hack around it (e.g. unit tests).

I think CAF [2] and/or libuv [3] can address these issues:

1) libuv: abstracts whatever polling mechanism is best for the OS you’re on.  
CAF: could allow a more direct actor messaging interface to Broker and since 
remote communication takes the bulk of fds being polled, the remaining fds 
(e.g. packet sources, etc.) could be fine to poll in whatever fashion, while 
the remote communication then is subject to CAF’s own multiplexer.

2) Both libuv and CAF use abstractions/models that are shown to work well.  I 
think the actor model, by design, does a better job of encouraging systems that 
are decoupled and therefore scalable.

3) Both libuv and CAF have mechanisms that could implement timers into the run 
loop such that they’d work independently of other I/O.

libuv may be a quicker, more straightforward path to fixing (1), which is the 
most critical issue, but it’s also the easiest to fix without aid of a library. 
 Libuv can also replace other misc. code in Bro like async DNS and signal 
handling, but, while those may be crufty, they aren’t frequent sources of pain.

Since CAF is a requirement of Broker already and has most potential to 
improve/replace parts of Bro’s threading system and the way in which Broker is 
integrated, it may be best in the long-term to explore moving things out of 
Bro’s current run loop by making them into actors that use message-passing 
interfaces and then relying on CAF’s own loop.

Any thoughts?

- Jon

[1] http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2015-May/010069.html
[2] https://actor-framework.org/
[3] http://docs.libuv.org/en/v1.x/


_______________________________________________
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to