> On Apr 12, 2017, at 9:05 PM, Siwek, Jon <[email protected]> wrote:
> 
> 
>> On Apr 12, 2017, at 1:35 PM, Slagell, Adam J <[email protected]> wrote:
>> 
>> Justin asked an interesting question today, how does this affect performance 
>> on the manager? That is where we are feeling a lot of pain with select().
> 
> If you mean the select() that’s in the process fork’d by the old 
> RemoteSerializer code, you’d still see the same problems with the CAF-based 
> runloop.  But that code is irrelevant once Broker takes its place. i.e. to 
> answer that question, you need to design a communication stress test using 
> Broker-based Bros as that’s more relevant than just changing the main loop.

Yep, that select stuff.  My question was mostly about the different workloads 
in a bro cluster.

Something that may be optimized for a worker dealing with 1 pktsrc and 2 peers 
may not be as optimal for a logger/manager that has no pktsrc but 100+ worker 
connections.  I've often wondered if the event loop should have a hint 
somewhere about which kind of process is running so it can optimize for 
throughput vs multiplexing many peers.

> Eventually, I can also imagine the Broker-based communication being more 
> tightly integrated into the CAF-based runloop helping improve performance 
> over the current Broker integration method.  Either way, what needs to be 
> measured is how CAF’s multiplexer performs in relation to Bro’s communication 
> patterns, but maybe still want to wait for the Broker improvements to wrap up 
> before looking into doing those tests.
> 
> In the near-term, I can make a totally separate code branch that simply 
> replaces select() with epoll.  Then, if Justin were to test it and find it 
> alleviates performance pains on the manager, it could potentially get merged 
> into bro/master ahead of the any of the pending broker/caf/runloop projects 
> since it should be a trivial and safe change to do.  Let me know.

Ah.. I had actually started trying to do that a long time ago, but gave up 
because broker was going to replace all of that code anyway.

https://github.com/bro/bro/commits/topic/jazoff/select-to-poll

from what I recall the first commit seemed to work but the second broke 
something.

The thing that always stood out to me was that the manager would run select 
across all the worker sockets, and then loop over each worker and run CanRead, 
which just ran select again on each individual FD.

One issue a few people have run into on the manager is that select returns 
EINVAL and deadlocks bro if you give it a FD larger than 1024, which you 
currently hit on around a 200 node cluster (socket + flares use 4 or 5 FDs per 
worker).


-- 
- Justin Azoff


_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to