> On Oct 9, 2017, at 2:08 PM, Siwek, Jon <[email protected]> wrote: > > >> I got send_event_hashed to work via a bit of a hack >> (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro), >> but it needs support from inside broker or at least the bro/broker >> integration to work properly in the case of node failure. >> >> My ultimate vision is a cluster with 2+ physical datanode/manager/logger >> boxes where one box can fail and the cluster will continue to function >> perfectly. >> The only thing this requires is a send_event_hashed function that does >> consistent ring hashing and is aware of node failure. > > Yeah, that sounds like a good idea that I can try to work into the design. > What is a “data node” though? We don’t currently have that?
We did at one point, see topic/seth/broker-merge / topic/mfischer/broker-integration The data node replaced the proxies and did stuff related to broker data stores. I think the idea was that a data node process would own the broker data store. My usage of data nodes was for scaling out data aggregation, I never did anything with the data stores. The data nodes were just a place to stream scan attempts to for aggregation. > More broadly, it sounds like a user needs a way to specify which nodes they > want to belong to a worker pool, do you still imagine that is done like you > had in the example broctl.cfg from the earlier thread? Do you need to be > able to specify more than one type of pool? People have asked for this now as solution for fixing an overloaded manager process, but if we get load balancing/failover working as well as QoS/priorities there may not be a point into statically configuring things like that.. like someone might want to do # a node for tracking spam [spam] type = data/spam # a node for sumstats [sumstats] type = data/sumstats # a node for known hosts/certs/etc tracking [known] Type = data/known But I think just having the ability to do [data] type = data lb_procs = 6 This would work better for everyone. Sending one type of data to one type of data node is still going to eventually overload a single process. >> For things that don't need necessarily need consistent partitioning - like >> maybe logs if you were using Kafka, a way to designate that a topic should >> be distributed round-robin between subscribers would be useful too. > > Yeah, that seems like it would require pretty much the same set of > functionality to get working and then user can just specify a different > function to use for distributing events (e.g. hash vs. round-robin). > > - Jon Great! Right now broctl configures this in a 'round-robin' type way by assigning every other worker to a different logger node. With support for this in broker it could just connect every worker to every logger process and broker could handle the load balancing/failover. — Justin Azoff _______________________________________________ bro-dev mailing list [email protected] http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
