> On Mar 3, 2017, at 4:36 PM, Aashish Sharma <[email protected]> wrote:
> 
> SO I came across a sample of Broker-API usage:

Yeah.. there's a lot of things wrong with how that is being done.  There are a 
few things going on here.

One is that &synchronized is no longer functions.  I think we should bring this 
back, it may not be in the form of &synchronized, but at least some way to 
create a simple data structure that is automatically kept in sync between nodes.

The other is that the api that known hosts is currently using is too high level:

Broker::exists(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::lookup(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::set_contains(res2$result, Broker::data(host))
Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), 
Broker::data(host));

which in english is:

1. see if the known_hosts table exists (why would it not exist?)
2. transfer the entire known_hosts table over from the data node
3. see if it contains host
4. add host if not present

And (due to probably an oversight), it does this twice resulting in the 
known_hosts table being transferred twice.

This would work a lot better if it keep a persistent copy of the known_hosts 
set between calls and only updated it from the data node if the host wasn't 
found.  The only downside there is that the entire table is still being copied 
between nodes instead of just updates.


To accomplish what known hosts really needs, which is just "Have I seen this 
host before", we could just do something like:

local added = Broker::add_to_set(Cluster::cluster_store, 
Broker::data("known_hosts"), Broker::data(host));
if(added) {
    # host did not previously exist in the set



The only problem in this case is there is no local cache to prevent the same 
host from being checked multiple times.  That would require a local copy of the 
set, or like you said, a bloomfilter of sorts (probably one of those reverse 
bloomfilters that has false negatives but not false positives).

So, for the case of tracking things using a set across the cluster all one 
needs is a simple function that:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node and inspects the response (new or duplicate)

Things get a little more complicated in that I want the ability to scale out 
the data nodes.  So that means the slight variation:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node that corresponds to the hash of the item and 
inspects the response (new or duplicate)

so from a users point of view, the Broker part of the function could just be

if(Broker::check_or_add_to_set("known_hosts", host)) {
    Log::write(Known::HOSTS_LOG, [$ts=network_time(), $host=host]);
}

Another way of writing this, which corresponds to your 'event based' approach 
is to just have the function instead do:

Check to see if the item is in the local cache or bloom filter
Send an event over to the data node that says a new host was potentially found.

For known hosts purposes, the data node doesn't even need to send anything back 
to the worker, it can just log it(or not).

It would help me think about this if you could outline some of your use cases 
for broker stores.  I have a good idea of what needs to be done to fix known 
hosts/services/certs and sumstats/scan detection.  But I don't know what things 
you have in mind :-)





-- 
- Justin Azoff


_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to