> 2) Let the developer specify constraints for the data service
> distribution across data nodes and automatize the optimization. The
> minimal example would be that for each data service a minimum and
> maximum or default number of data nodes is specified (e.g. Intel on 1-2
> nodes and Scan detection on all available nodes). More complex
> specifications could require that a data service isn't scheduled on data
> nodes together with (particular) other services.
I like the idea of having some algorithm than can automatically allocate nodes
into pools and think maybe it could also be done in a way that provides a sane
default yet is still customizable enough for users, at least for the most
common use-cases.
It seems so far we can roughly group the needs of script developers into 2
categories: they either have a data set that can trivial be partitioned across
data nodes or they have a data set that doesn’t. The best we can provide for
the later is replication/redundancy and also giving them exclusive/isolated
reign of a node or set of nodes.
An API that falls out from that is:
type Cluster::Pool: record {
# mostly opaque...
};
type Cluster::PoolSpec: record {
topic: string;
node_type: Cluster::node_type &default = Cluster::DATA;
max_nodes: int &default = -1; # negative number means "all available
nodes"
exclusive: bool &default = F;
};
global Cluster::register_pool(spec: PoolSpec): Pool;
Example script-usage:
global Intel::pool: Cluster::Pool;
const Intel::max_pool_nodes = +2 &redef;
const Intel::use_exclusive_pool_nodes = F &redef;
const Intel::pool_spec = Cluster::PoolSpec(
$topic = “bro/cluster/pool/intel”,
$max_nodes = Intel::max_pool_nodes,
$exclusive = Intel::use_exclusive_pool_nodes,
) &redef;
event bro_init() { Intel::pool = Cluster::register_pool(Intel::pool_spec); }
And other scripts would be similar except their default $max_nodes is still -1,
using all available nodes.
I think this makes the user-experience also straightforward: the default
configuration will always be functional and the scaling procedure is still
mostly “just add more data nodes” and occasionally either “toggle the
$exclusive flag” or “increase $max_nodes” depending on the user’s circumstance.
The later options don’t necessarily address the fundamental scaling issue for
the user completely, but it seems like maybe the best we can do at least at
this level of abstraction.
- Jon
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev