actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)

Azoff, Justin S Fri, 03 Nov 2017 13:13:10 -0700

On Nov 3, 2017, at 3:13 PM, Jan Grashöfer 
<[email protected]<mailto:[email protected]>> wrote:


On 03/11/17 18:07, Azoff, Justin S wrote:> Partitioning the intel data set is a 
little tricky since it supports subnets and hashing 10.10.0.0/16
and 10.10.10.10 won't necessarily give you the same node.  Maybe subnets need 
to exist on all
nodes but everything else can be partitioned?

Good point! Subnets are stored kind of separate to allow prefix matches anyway. 
However, I am a bit hesitant as it would become a quite complex setup.

Indeed..  replication+load balancing is probably a good enough first step.

There would also need to be a method for
re-distributing the data if the cluster configuration changes due to nodes 
being added or removed.

Right, that's exactly what I was thinking of. I guess this applies also to 
other use cases which will use HRW. I am just not sure whether dynamic layout 
changes are out of scope at the moment...

Other use cases are still problematic, but even without 
replication/redistribution the situation is still greatly improved.
Take scan detection for example:

With sumstats/scan-ng/simple-scan if the current manager host or process dies, 
all detection comes to a halt
until it is restarted.  Once it is restarted, all state is lost so everything 
starts over from 0.

If there were 4 data nodes participating in scan detection, and all 4 die, same 
result, so this is no better or
worse than the current situation.
If only one node dies though, only 1/4 of the analysis is affected. The 
remaining analysis can immediately
fail over to the next node. So while it may still have to start from 0, there 
would only be a small hole in the analysis.

For example:

The scan threshold is 20 packets.
A scan has just started from 10.10.10.10.
10 packets into the scan, the data node that 10.10.10.10 hashes to crashes.
HRW now routes data for 10.10.10.10 to another node
30 packets into the scan, the threshold on the new node crosses 20 and a notice 
is raised.

Replication between data nodes could make this even more seamless, but it's not 
a huge priority, at least for me.
My priority is getting the cluster to a point where things don't grind to a 
halt just because one component is down.

Ignoring the worker->logger connections, it would look something like the 
attached layout.png

[cid:4B1B7729-7A8D-483C-83A8-04E1783FE0AE@home]

Fully agreed! In that case it might be nice if one can define separate special 
purpose data nodes, e.g. "intel data nodes". But, I am not sure whether this is 
a good idea as this might lead to complex cluster definitions and poor 
usability as users need to know a bit about how the underlying mechanisms work. 
On the other hand this would theoretically allow to completely decouple the 
intel data store (e.g. interface a "real" database with some pybroker-scripts).

Jan

I've been thinking the same thing, but I hope it doesn't come to that.  Ideally 
people will be able
to scale their clusters by just increasing the number of data nodes without 
having to get into
the details about what node is doing what.

Partitioning the data analysis by task has been suggested.. i.e., one data node 
for scan detection,
one data node for spam detection, one data node for sumstats.. I think this 
would be very easy to
implement, but it doesn't do anything to help scale out those individual tasks 
once one process can
no longer handle the load.  You would just end up with something like the scan 
detection and spam
data nodes at 20% cpu and the sumstats node CPU at 100%


—
Justin Azoff

_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)

Reply via email to