> On Oct 9, 2017, at 2:08 PM, Siwek, Jon <[email protected]> wrote:
> 
> 
>> I got send_event_hashed to work via a bit of a hack 
>> (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro),
>> but it needs support from inside broker or at least the bro/broker 
>> integration to work properly in the case of node failure.
>> 
>> My ultimate vision is a cluster with 2+ physical datanode/manager/logger 
>> boxes where one box can fail and the cluster will continue to function 
>> perfectly.
>> The only thing this requires is a send_event_hashed function that does 
>> consistent ring hashing and is aware of node failure.
> 
> Yeah, that sounds like a good idea that I can try to work into the design.  
> What is a “data node” though?  We don’t currently have that?

We did at one point, see

topic/seth/broker-merge / topic/mfischer/broker-integration

The data node replaced the proxies and did stuff related to broker data stores.

I think the idea was that a data node process would own the broker data store.

My usage of data nodes was for scaling out data aggregation, I never did 
anything with the data stores.  The data nodes were just a place to stream scan 
attempts to for aggregation.

> More broadly, it sounds like a user needs a way to specify which nodes they 
> want to belong to a worker pool, do you still imagine that is done like you 
> had in the example broctl.cfg from the earlier thread?  Do you need to be 
> able to specify more than one type of pool?

People have asked for this now as solution for fixing an overloaded manager 
process, but if we get load balancing/failover working as well as 
QoS/priorities there may not be a point into statically configuring things like 
that.. like someone might want to do

# a node for tracking spam
[spam]
type = data/spam

# a node for sumstats
[sumstats]
type = data/sumstats

# a node for known hosts/certs/etc tracking
[known]
Type = data/known

But I think just having the ability to do

[data]
type = data
lb_procs = 6

This would work better for everyone.  Sending one type of data to one type of 
data node is still going to eventually overload a single process.

>> For things that don't need necessarily need consistent partitioning - like 
>> maybe logs if you were using Kafka, a way to designate that a topic should 
>> be distributed round-robin between subscribers would be useful too.
> 
> Yeah, that seems like it would require pretty much the same set of 
> functionality to get working and then user can just specify a different 
> function to use for distributing events (e.g. hash vs. round-robin).
> 
> - Jon

Great!  Right now broctl configures this in a 'round-robin' type way by 
assigning every other worker to a different logger node.  With support for this 
in broker it could just connect every worker to every logger process and broker 
could handle the load balancing/failover.



— 
Justin Azoff


_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to