Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Jon Siwek
On Tue, Aug 14, 2018 at 12:09 PM Jan Grashöfer  wrote:
>
> On 13/08/18 18:24, Jon Siwek wrote:
> > Old Worker:
> >
> >Cluster::relay_rr(Cluster::proxy_pool, my_event);
> >
> > New Worker:
> >
> >Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event);
>
> That doesn't look like an API simplification to me ;D

The goal here I imagine is rather to avoid releasing a function that
we knowingly plan to remove later.  A user would have to eventually
port all Cluster::relay_rr() calls, but that Broker::publish() pattern
remains valid.

> > Even Newer Worker:
> >
> >Broker::publish(Cluster::worker_topic, my_event);
> >
> > See any problems there?
>
> For this case: Would it be easy to setup distinct pools for different
> tasks? I could imagine a pool of proxies that is used explicitly for
> intel distribution and one pool used for processing SumStats events. I
> think we have discussed something like that before.

Yeah, it would still be possible to define your own pool and use it
for your own purposes and it looks similar to the call before:

 Broker::publish(Cluster::rr_topic(Cluster::my_pool), my_event);

A difference in the context of our needs for the cluster communication
is that the pool is being used as a means of achieving routing (in a
load-balanced fashion) and so the call gets simplified once those
mechanisms get built into Broker routing.  In your case, you don't
need the routing aspect, just the load-balancing provided by the
"pool" concept.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Jan Grashöfer
On 13/08/18 18:24, Jon Siwek wrote:
> Old Worker:
> 
>Cluster::relay_rr(Cluster::proxy_pool, my_event);
> 
> New Worker:
> 
>Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event);

That doesn't look like an API simplification to me ;D

> Even Newer Worker:
> 
>Broker::publish(Cluster::worker_topic, my_event);
> 
> See any problems there?

For this case: Would it be easy to setup distinct pools for different 
tasks? I could imagine a pool of proxies that is used explicitly for 
intel distribution and one pool used for processing SumStats events. I 
think we have discussed something like that before. Maybe I am mixing 
cluster and broker levels again...

Jan
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Robin Sommer



On Tue, Aug 14, 2018 at 10:51 -0500, Jonathan Siwek wrote:

> Not sure, is Broker::auto_publish() currently able to do the same thing?

Hmm .. Good point. Scope is different between the two (event vs topic)
but the effect is similar in the end.

> I can also see the opposite being intuitive: If I told
> Broker::subscribe() to raise locally, then I get just always use
> Broker::publish() and not think about the difference between using
> "event" versus "publish".  Would Broker::auto_publish() be removable
> then?

I would like to say "yes" (because I like the subscribe() approach
better than auto_publish() :-), but would that work well with our
cluster topics? If we didn't have the event-specific auto_publish(),
we would have to turn on local raise for *all* events going to, e.g.,
bro/cluster/worker. And thinking about it, maybe that's in fact also
an argument against my original thinking how this could help unify
scripts --- well, unless we'd go with Jan's thought of subtopics
(e.g., subscribe("bro/cluster/worker/intel", local_raise=T).

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Jon Siwek
On Tue, Aug 14, 2018 at 10:13 AM Robin Sommer  wrote:

> One more question: what about raising published events locally as well
> if the sending node is subscribed to the topic? I'm kind of torn on
> that. I don't think we want that as a default, but perhaps as an
> option, either with the publish() call or, likely better, with the
> subscribe() call? I can see that being helpful in cases like unifying
> standalone vs cluster operation; and more generally, for running
> multiple node types inside the same Bro instance.

Not sure, is Broker::auto_publish() currently able to do the same thing?

e.g. if I want an event to be raised locally, I raise it via "event"
and it automatically gets published.

I can also see the opposite being intuitive: If I told
Broker::subscribe() to raise locally, then I get just always use
Broker::publish() and not think about the difference between using
"event" versus "publish".  Would Broker::auto_publish() be removable
then?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Robin Sommer



On Mon, Aug 13, 2018 at 13:55 -0500, Jonathan Siwek wrote:

> associating node IDs with subscription state and also message state
> (push node IDs into messages upon receipt before forwarding),

Yeah, that sounds like the right direction. Some reading might be
worthwile doing here, there are quite a few papers out there on
routing in overlay networks.

> (1) Remove relay(...) functions
> (2) Reduce unique topic names (use pre-existing cluster topics where possible)
> (3) Add Broker::forward(topic_prefix) function + enable Broker forwarding

Yes, that sounds good to me, plus whatever that means for "publish()"
itself. I like what we have arrived at here.

One more question: what about raising published events locally as well
if the sending node is subscribed to the topic? I'm kind of torn on
that. I don't think we want that as a default, but perhaps as an
option, either with the publish() call or, likely better, with the
subscribe() call? I can see that being helpful in cases like unifying
standalone vs cluster operation; and more generally, for running
multiple node types inside the same Bro instance.

> An alternative to (3) would be implementing "real" routing in Broker
> right from the start.

In an ideal world, yes, that would certainly be nice to have. But it's
a larger task that I don't think we would be able to finish for 2.6
anymore. So, I'd put that on the list for later.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Azoff, Justin S

> On Aug 13, 2018, at 12:24 PM, Jon Siwek  wrote:
> 
> Even Newer Worker:
> 
>  Broker::publish(Cluster::worker_topic, my_event);
> 
> See any problems there?

That's nice and simple :-)

Assuming that can send the events around in the most efficient way possible, 
that's perfect.

The one tricky case is doing that on the manager.  While the manager is fully 
connected to all workers,
you really want to offload the fanning out of messages to one of the proxies.

— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Jon Siwek
On Mon, Aug 13, 2018 at 8:09 AM Jan Grashöfer  wrote:
>
> On 10/08/18 17:12, Robin Sommer wrote:
> > I hear you, but I think I haven't quite understood the concern yet.
> > Can you give me an example where the difference matters? What's
> > different between publishing intel events to bro/cluster/worker/intel
> > vs bro/cluster/worker if both go to all workers? Or is it so that some
> > workers can decide not to receive the intel events?
>
> The use case I had in my mind is an external application that is
> interested in interfacing with the intelligence framework. Either for
> querying it similar to workers of for managing purposes. If possible, it
> could be beneficial for such an application to receive only the relevant
> parts of cluster communication.
>
> On 10/08/18 17:52, Jon Siwek wrote:
> > (1) if the event you're publishing just facilitates scalable cluster
> > analysis: you'd tend to use the topic names which target node classes
> > within a cluster (eventually this might be "bro//worker")
> >
> > (2) if the event you're publishing is intended for external
> > consumption, then you should use a topic which describes some specific
> > qualities of the message (e.g. "jan/intel")
>
> The case described above seems to be both. On the one hand the primary
> use case is internal cluster communication. On the other hand it feels
> quite natural to dock here for external applications. Another
> (debatable) use case might be directly interfacing the configuration
> framework, skipping the configuration file layer.

I'm generally thinking there's nothing stopping one from picking a new
topic name to re-publish some set of events under.  Would that be
possible in the case you're imagining?

I don't think we're going to come up with a general (or enforce-able)
way of picking topic names such that they'll be useful for any
arbitrary, external use-case.  So we pick the topic name that is best
for the use-case we have at time of writing a script (e.g. we just
want to get it working on a cluster so use the pre-existing topics
that are available for that), and then let others re-publish a subset
of events under different topics dependent on their specific use-case.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Jon Siwek
On Fri, Aug 10, 2018 at 11:47 AM Azoff, Justin S  wrote:

> If relay is removed how does a script writer efficiently get an event from 
> one worker (or manager)
> to all of the other workers?

Old Worker:

  Cluster::relay_rr(Cluster::proxy_pool, my_event);

New Worker:

  Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event);

New Proxy:

  event my_event() { Broker::publish(Cluster::worker_topic, my_event); }

So the proxy has additional overhead of the proxy's event handler.  I
doubt that's much a problem from the "efficiency" standpoint, but if
it were, then just having more proxies helps.  Once real routing were
available the code would still work or you could opt to change to
just:

Even Newer Worker:

  Broker::publish(Cluster::worker_topic, my_event);

See any problems there?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Jan Grashöfer
On 10/08/18 17:12, Robin Sommer wrote:
> I hear you, but I think I haven't quite understood the concern yet.
> Can you give me an example where the difference matters? What's
> different between publishing intel events to bro/cluster/worker/intel
> vs bro/cluster/worker if both go to all workers? Or is it so that some
> workers can decide not to receive the intel events?

The use case I had in my mind is an external application that is 
interested in interfacing with the intelligence framework. Either for 
querying it similar to workers of for managing purposes. If possible, it 
could be beneficial for such an application to receive only the relevant 
parts of cluster communication.

On 10/08/18 17:52, Jon Siwek wrote:
> (1) if the event you're publishing just facilitates scalable cluster
> analysis: you'd tend to use the topic names which target node classes
> within a cluster (eventually this might be "bro//worker")
> 
> (2) if the event you're publishing is intended for external
> consumption, then you should use a topic which describes some specific
> qualities of the message (e.g. "jan/intel")

The case described above seems to be both. On the one hand the primary 
use case is internal cluster communication. On the other hand it feels 
quite natural to dock here for external applications. Another 
(debatable) use case might be directly interfacing the configuration 
framework, skipping the configuration file layer.

Jan
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Azoff, Justin S

> On Aug 10, 2018, at 11:55 AM, Robin Sommer  wrote:
> 
> 
> Ok, let's make that change then, I think removing relay() will help
> for sure making the API easier.

If relay is removed how does a script writer efficiently get an event from one 
worker (or manager)
to all of the other workers?

— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Robin Sommer



On Fri, Aug 10, 2018 at 10:24 -0500, Jonathan Siwek wrote:

> Or is it a matter of "if a user needed it for something, then it's
> available" ?

Yeah, including matching expectations: if there's a
"bro/cluster/worker" topic, I'd expect I can publish there to reach
all the workers (from anywhere). However, I think I'm with you now
that maybe we just shouldn't do do/support any forwarding in the
cluster right now. Pools and manual relaying are a (currently better)
alternative, and we can change things later. And at least it's a clear
message: no forwarding across cluster nodes.

> However, I can see Broker::forward() could make it a bit easier for a
> user wanting to manually set up a forwarding route between clusters or
> other external applications.  Is that a clear use-case we need to
> cater to now?

Well, if it were easy to add the forward() function, that could indeed
be quite useful for external integrations still. With that, one could
selectively forward custom topics (at one's own risk), without causing
a mess for the cluster. I'm thinking osquery integration for example,
where messages might go through an intermediary Bro. One advantage
that Broker-internal forwarding has compared to manual relaying is
that messages won't be propagated back to the sender.

But it's a matter of effort at this point I'd say.

> RR via proxy is not just load-balancing either, but fault-tolerance as
> well.

Yeah, that's right.

> But here you're talking more about removing the relay() functions and
> doing the RR-via-proxy "manually", right?  That seems ok to me -- once
> "real" routing is available, you then have the option to simplify your
> script and get a minor optimization by not having to manually
> handle+forward the event on proxies.

Ok, let's make that change then, I think removing relay() will help
for sure making the API easier.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Jon Siwek
On Fri, Aug 10, 2018 at 8:29 AM Jan Grashöfer  wrote:

> > Let me try to phrase it differently: If there's already a topic for a
> > use case, it's better to use it. That's easier and less error-prone.
> > So if, e.g., I want to send my script's data to all workers,
> > publishing to bro/cluster/worker will do the job. And that will even
> > automatically adapt if things get more complex later.
>
> Maybe a silly question: Would that work using further "specialized"
> topics like bro/cluster/worker/intel? From my understanding one feature
> of topics is that one would be able to subscribe only the the things
> that one is interested in. Having a bunch of events just published to
> bro/cluster/worker seems counterproductive.

Yeah, topic use-cases may need clarification.  There's one desire to
use topics as a way to specify known destination(s) within a cluster.
Another desire could be using the topic name to hierarchically
summarize/describe a quality of the message content in order to share
with the external world.  Maybe the thing that's currently unclear is
what the intended borders are for information sharing?  I break it
down like:

(1) if the event you're publishing just facilitates scalable cluster
analysis: you'd tend to use the topic names which target node classes
within a cluster (eventually this might be "bro//worker")

(2) if the event you're publishing is intended for external
consumption, then you should use a topic which describes some specific
qualities of the message (e.g. "jan/intel")

Events that fall under (1) don't need to be descriptive since we don't
want to encourage people to arbitrarily start subscribing to events
that act as the details for how cluster analysis is implemented.  Or I
guess if they do subscribe, then they are the kind of person that's
more interested in inspecting the cluster's performance/communication
characteristics anyway.

I'd also say that (2) is a user decision -- they need to be the one to
decide if their cluster has produced some bit of information worthy of
sharing to the external world and then publish it under a suitable
topic name.

That make sense?

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Jon Siwek
On Thu, Aug 9, 2018 at 1:29 PM Robin Sommer  wrote:

> > (1) enable the "explicit/manual" forwarding by default?
>
> Coming from that assumption above, I'd say yes here, doing it like you
> suggest: differentiate between forwarding and locally raising an event
> by topic. Maybe instead of adding it to Broker::subscribe() as a
> boolean, we add a separate "Broker::forward(topic_prefix)" function,
> and use that to essentially hardcode forwarding on each node just like
> we want/need for the cluster. Behind the scenes Broker could still
> just store the information as a boolean, but API-wise it means we can
> later (once we have real routing) just rip out the forward() calls and
> let Magic take its role. :)

Not sure there'd be anywhere we'd currently use Broker::forward() ?
Or is it a matter of "if a user needed it for something, then it's
available" ?

The only intra-cluster communication that's more than 1 hop at the
moment is worker-worker, but setting up a Broker::forward() route
wouldn't be my first thought as it's not currently a scalable
approach.  I'd instead take the cautious approach of relaying via a
RR-proxy so one can add proxies to handle more load as needed.

However, I can see Broker::forward() could make it a bit easier for a
user wanting to manually set up a forwarding route between clusters or
other external applications.  Is that a clear use-case we need to
cater to now?  If so, then it would indeed be just saying "hey,
Broker::forward() is now a no-op since Broker has real routing
mechanisms and you can remove them".

> As you say, we don't get load-balancing that way (today), but we still
> have pools for distributing analyses (like the known-* scripts do).
> And if distributing message load (like the Intel scripts do) is
> necessary, I think pools can solve that as well: we could use a RR
> proxy pool and funnel it through script-land there: send to one proxy
> and have an event handler there that triggers a new event to publish
> it back out to the workers. For proxies, that kind of additional load
> should be fine (if load-balancing is even necessary at all; just going
> through a single forwarding node might just as well be fine.

Seems more prudent not to guess whether a single, hardcoded forwarding
node is good enough when writing the default cluster-enabled scripts.
RR via proxy is not just load-balancing either, but fault-tolerance as
well.

But here you're talking more about removing the relay() functions and
doing the RR-via-proxy "manually", right?  That seems ok to me -- once
"real" routing is available, you then have the option to simplify your
script and get a minor optimization by not having to manually
handle+forward the event on proxies.

> > (2) re-implement any existing subscription cycles?
>
> Now, here I'm starting to change my mind a bit. Maybe in the end, in
> large topologies, it would be futile to insist on not having cycles
> after all. The assumption above doesn't care about it, putting Broker
> in charge of figuring it out. So with that, if we can set up
> forwarding through (1) in a way that cycles in subscriptions don't
> matter, it may be fine to just leave them in. But I guess in the end
> it doesn't matter, removing them can only make things better/easier.

Again, I think we wouldn't have any Broker::forward() usages in the
default cluster setup, but simply enabling the forwarding of messages
at the Broker-layer would currently cause some messages to route in a
cycle.  Enabling the current message forwarding means we need to
re-implement existing subscription cycles.  If we instead waited for
the "real" routing, then it doesn't matter if we leave them in.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Robin Sommer



On Fri, Aug 10, 2018 at 15:22 +0200, Jan Grashöfer wrote:

> different purposes. If that is still a design goal, it feels like the
> structure of a cluster could be more volatile than it used to be.

It is, and we have some of that, and I think it fits in with the
discussion here too. In my mind, I see two separate things in this
discussion: one is a general Broker API that facilitates some very
different applications; and the 2nd is our cluster framework that uses
that API for a specific use-case. The latter is much easier to tune
for us in terms of how it uses Broker, as we can hide much of it
internally and adjust later, i.e., by adding a new node type. The
question for the cluster framework, then, is what API *it* provides
for scripts to share state in a cluster. And a part of the answer to
that could be "standardized topics" that are guaranteed to get the
information to where it needs to go.

> Maybe a silly question: Would that work using further "specialized" topics
> like bro/cluster/worker/intel? From my understanding one feature of topics
> is that one would be able to subscribe only the the things that one is
> interested in. Having a bunch of events just published to bro/cluster/worker
> seems counterproductive.

I hear you, but I think I haven't quite understood the concern yet.
Can you give me an example where the difference matters? What's
different between publishing intel events to bro/cluster/worker/intel
vs bro/cluster/worker if both go to all workers? Or is it so that some
workers can decide not to receive the intel events?

(And technically, subscriptions are prefixed based, so anybody
subscribing to bro/cluster/worker automatically gets
bro/cluster/worker/intel as well; not sure if that helps or hurts
here?)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Jan Grashöfer
On 08/08/18 17:48, Robin Sommer wrote:> I think it's safe to assume we 
have the cluster structure under our
> own control; it's whatever we configure it to be. That's something
> that's easier to change later than the API itself. Said differently:
> we can always adjust the connections and topics that we set up by
> default; it's much harder to change how the publish() function works.

I think in an earlier discussion (could be 
http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-February/012386.html) 
there was the idea of different types of data nodes that would serve 
different purposes. If that is still a design goal, it feels like the 
structure of a cluster could be more volatile than it used to be. Not 
sure how that fits to the current assumptions. Just wanted to bring that 
back into the discussion.

> Let me try to phrase it differently: If there's already a topic for a
> use case, it's better to use it. That's easier and less error-prone.
> So if, e.g., I want to send my script's data to all workers,
> publishing to bro/cluster/worker will do the job. And that will even
> automatically adapt if things get more complex later.

Maybe a silly question: Would that work using further "specialized" 
topics like bro/cluster/worker/intel? From my understanding one feature 
of topics is that one would be able to subscribe only the the things 
that one is interested in. Having a bunch of events just published to 
bro/cluster/worker seems counterproductive.

> Maybe it's a *necessary* design, but that doesn't make it nice. ;-) It
> makes it very hard to follow the logic; when reading through the
> scripts I got lost multiple times because some "@if I-am-a-manager"
> was somewhere half a page earlier, disabling the code I was currently
> looking at for most nodes. We probably can't totally avoid that, but
> the less the better.

I agree! One thing that could also help here is clear separation. In the 
intel framework that kind of code is capsuled in a cluster.bro file, 
which is basically divided into a worker and a manager part. In the end 
it's a tradeoff between abstraction and flexibility.

Jan
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-09 Thread Robin Sommer
Yeah, and let me add one thing: What if as a starting point for
modeling things, we assumed that we have global topic-based routing
available. Meaning if node A publishes to topic X, the message will
show up at all nodes that are subscribed to topic X anywhere, no
matter what the topology --- Broker will somehow take care of that. I
believe that's where we want to get eventually, through whatever
mechanism; it's not trivial, but also not rocket science.

Then we (A) design the API from that perspective and adapt our
standard scripts accoordingly, and (B) see how we can get an
approximation of that assumption for today's Broker and our simple
clusters, by having the cluster framework hardcode what need.

> (1) enable the "explicit/manual" forwarding by default?

Coming from that assumption above, I'd say yes here, doing it like you
suggest: differentiate between forwarding and locally raising an event
by topic. Maybe instead of adding it to Broker::subscribe() as a
boolean, we add a separate "Broker::forward(topic_prefix)" function,
and use that to essentially hardcode forwarding on each node just like
we want/need for the cluster. Behind the scenes Broker could still
just store the information as a boolean, but API-wise it means we can
later (once we have real routing) just rip out the forward() calls and
let Magic take its role. :)

As you say, we don't get load-balancing that way (today), but we still
have pools for distributing analyses (like the known-* scripts do).
And if distributing message load (like the Intel scripts do) is
necessary, I think pools can solve that as well: we could use a RR
proxy pool and funnel it through script-land there: send to one proxy
and have an event handler there that triggers a new event to publish
it back out to the workers. For proxies, that kind of additional load
should be fine (if load-balancing is even necessary at all; just going
through a single forwarding node might just as well be fine.

> (2) re-implement any existing subscription cycles?

Now, here I'm starting to change my mind a bit. Maybe in the end, in
large topologies, it would be futile to insist on not having cycles
after all. The assumption above doesn't care about it, putting Broker
in charge of figuring it out. So with that, if we can set up
forwarding through (1) in a way that cycles in subscriptions don't
matter, it may be fine to just leave them in. But I guess in the end
it doesn't matter, removing them can only make things better/easier.

> Also maybe begs the question for later regarding the "real" routing
> mechanism: I suppose that would need to be smart enough to do
> automatic load-balancing in the case of there being more than one
> route to a subscriber.

Yeah, I'm becoming more and more convinced that in the end we won't
get around adding a "real" routing layer that takes of such things
under the hood.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-09 Thread Jon Siwek
On Wed, Aug 8, 2018 at 2:50 PM Robin Sommer  wrote:

> > * enable message forwarding by default (meaning re-implement the one
> > or two subscription patterns that might create a cycle)
>
> Haven't quite made up my mind on this one. In principlel yes, but
> right now a host needs to be subscribed to a topic to forward it if I
> remember than right. That may limit how we use topics, not sure (e.g.,
> if a worker wanted to talk to other workers, with "real"
> forwarding/routing they'd just publish to the worker topic and that
> message would get routed there, but not be processed at the
> intermediary hops as well. With our current forwarding, the hops would
> need to subscribe to the worker topic as well and hence the event got
> raised there, too.)

Yeah, that's how I also understand the current mechanisms would work.

Maybe can split it into two separate questions:

(1) enable the "explicit/manual" forwarding by default?
(2) re-implement any existing subscription cycles?

Answer to (2) may pragmatically be "yes" because they'd be known to
cause problems if ever (1) did become enabled (and also could be
problematic for a more sophisticated/automatic/implicit routing system
should that become available in the future... at least I think it's a
problem, but then again maybe connection-cycles would also still be a
problem at that point, not quite sure).

Answer to (1) may be "no" because we don't have a use for it at the
moment -- having the forwarding-nodes also raise events is not ideal,
but if we solved that would it be useful?  Maybe an idea would be
extend the subscribe() API in Bro:

function Broker::subscribe(topic_prefix: string, forward_only:
bool =F);

I recall that we have access to both the message/event as well as the
topic string on the receiver side, so could be possible to detect
whether or not to raise the event depending on whether the topic only
has a matching subscription prefix that is marked as forward_only.

With that you could do something like:

# On Manager
Broker::subscribe(worker_to_worker_topic, T);

# On Worker
Broker::subscribe(worker_to_worker_topic);
Broker::publish(worker_to_worker_topic, my_event);

There, my_event would be distributed from one worker to all workers
via the manager, but not sure that's as usable/dynamic as the current
"relay" mechanism because you also get a load-balancing scheme to go
along with it.  Here, you'd only ever want to pick a single manager or
proxy to do the forwarding (subscribing like this on all proxies
causes all proxies to forward to all workers resulting in undesired
event duplication.)

So I guess that's still to say I'm not sure what the use of the
current forwarding mechanism would be if it were enabled.  Also maybe
begs the question for later regarding the "real" routing mechanism: I
suppose that would need to be smart enough to do automatic
load-balancing in the case of there being more than one route to a
subscriber.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer



On Wed, Aug 08, 2018 at 12:36 -0500, Jonathan Siwek wrote:

> * publish() API simplifications/compressions (pending decision on
> exactly what those should be)

Yeah, with an eye on the semantics for forwarding (now and later),
and whether to raise published events locally as well if the host is
subscribed itself.

And maybe the 2nd eye on: can define these semantics so that we can
get rid of some of the "what node type am I?" checks? I'm not sure how
that would look like, but generally it would be nice if one could just
publish stuff liberally without worrying too much and the
subscriptions and forwarding semantics do the right thing (not always,
but often)).

> * enable message forwarding by default (meaning re-implement the one
> or two subscription patterns that might create a cycle)

Haven't quite made up my mind on this one. In principlel yes, but
right now a host needs to be subscribed to a topic to forward it if I
remember than right. That may limit how we use topics, not sure (e.g.,
if a worker wanted to talk to other workers, with "real"
forwarding/routing they'd just publish to the worker topic and that
message would get routed there, but not be processed at the
intermediary hops as well. With our current forwarding, the hops would
need to subscribe to the worker topic as well and hence the event got
raised there, too.)

> * see if any script-specific topics can instead use a pre-existing
> "cluster" topic

Yep.

> difficult due to having to hunt down things in various scripts and
> whether a more centralized config could be something to do?

Yeah, that sounds useful for the cluster case: it could be part of the
cluster framework to define all the relevant node types with their
characeristics. That would also make later changes easier &
centralized to how topics and connections are set up.

For other use cases, it should still be possible to configure things
independently, too, though (say, for talking to external Broker
applications).

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Jon Siwek
On Wed, Aug 8, 2018 at 10:55 AM Robin Sommer  wrote:

> That's actually something I realized yesterday: we don't have direct
> worker-to-worker communication right now, correct? A worker cannot
> just publish to "bro/cluster/workers".

Right, here's a crude graphic of the cluster layout from the docs:

https://github.com/bro/bro/blob/master/doc/frameworks/broker/cluster-layout.png

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Jon Siwek
On Wed, Aug 8, 2018 at 10:53 AM Robin Sommer  wrote:
>
> Yeah, I realize that. A direct port of the old logic was of course the
> goal so far, with the drawbacks of that approach accepted &
> understood. That's what's in place now; that's great and exactly as
> planned. We can get 2.6 out this way, and it'll be fine.

I'm earnestly probing to try to get a better decomposition of the
issues that make it hard to understand cluster communication patterns.

There's the exercise of trying to answer "what *is* this script
doing?" and then there's also trying to answer "what *should* it be
doing?".

I seldom felt like I had definitive answers for the later, but I can
see how it would be beneficial to do that and also broader
script/framework makeovers, possibly before 2.6, because it would help
inform whether new APIs are catering to "good" use-cases.  Though my
thinking is it's not critical to get a 100% API/use-case match off the
bat and that there's some actionable stuff to take away from this
thread that is at least going to have us heading in a better direction
sooner rather than later...

> My point is that now also seems like a good time to take stock of what
> we got that way. That direct porting is finally getting us some sense
> of where things aren't an ideal match between API and use cases yet.
> And if there's something easy we can do about that before people start
> relying on the new API, it seems that would be beneficial to do. But
> we can see.

Yeah, agreed.  What I've taken away from your earlier points is that
these smaller changes are seeming like they'd be beneficial to do
before 2.6:

* publish() API simplifications/compressions (pending decision on
exactly what those should be)
* enable message forwarding by default (meaning re-implement the one
or two subscription patterns that might create a cycle)
* see if any script-specific topics can instead use a pre-existing
"cluster" topic

What do you think?

A separate question/idea I just had was whether how much of the
process of auditing the subscriptions and communication patterns was
difficult due to having to hunt down things in various scripts and
whether a more centralized config could be something to do?  e.g. I
don't know how the details would work out, but I'm imagining a
workflow where one edits a centralized config file with
subscription/node info in it and that auto-generates the code to set
them up.  Sort of like working backward from the info in the PDF you
shared.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer



On Wed, Aug 08, 2018 at 14:20 +, Justin Azoff wrote:

> There's also a bunch of places that I think were written standalone first and 
> then updated to work on a cluster in
> place resulting in some awkwardness..

Yeah, indeed, that's another other source of complexity with these
scripts.

> But if this was written in a more 'cluster by default' way, it would just 
> look like:

Nice example. That's the kind of thing I hope we can do during the
next cycle: streamline the scripts to unify these kinds of logic.

> Broker::publish could possibly be optimized for standalone to raise the event 
> directly if not being ran in a cluster.

Or we generally raise published events locally as well if the node is
subscribed to the destination topic. There are pros and cons for that
I think.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer
Yeah, I realize that. A direct port of the old logic was of course the
goal so far, with the drawbacks of that approach accepted &
understood. That's what's in place now; that's great and exactly as
planned. We can get 2.6 out this way, and it'll be fine.

My point is that now also seems like a good time to take stock of what
we got that way. That direct porting is finally getting us some sense
of where things aren't an ideal match between API and use cases yet.
And if there's something easy we can do about that before people start
relying on the new API, it seems that would be beneficial to do. But
we can see.

Robin

On Tue, Aug 07, 2018 at 13:39 -0500, Jonathan Siwek wrote:

> How much is due to new API usage and how much is due to things mainly
> being a direct port of old communication patterns (which I guess are
> written by various people over extended lengths of time and so there's
> inconsistencies to be expected) ?  Or due to being a mishmash of both
> old and new?


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer


On Tue, Aug 07, 2018 at 12:05 +0200, Jan Grashöfer wrote:

> What I can recall, it's about simplifying the API in the light of
> multi-hop routing, which is not fully functional yet.

To level up a bit, what I'm hoping for is that we can find some easy
ways to simplify the API a bit more now, with an eye towards dynamic
multi-hop coming later. I don't know if it'll work out before 2.6
still, but changing the API later is more painful.

We don't need to (or even can) solve multi-hop topologies right now, I
think nobody really has the use cases clear in their heads yet. But if
we could simplify the API a bit more for our current use cases in a
way that may extend to multihop naturally later, that would probably
save us some headaches at that point.

> How does forwarding work if I add another node type?

That's actually something I realized yesterday: we don't have direct
worker-to-worker communication right now, correct? A worker cannot
just publish to "bro/cluster/workers".

> Do we assume a certain cluster structure here? If yes: Is that a valid
> assumption?

I think it's safe to assume we have the cluster structure under our
own control; it's whatever we configure it to be. That's something
that's easier to change later than the API itself. Said differently:
we can always adjust the connections and topics that we set up by
default; it's much harder to change how the publish() function works.
 
> From my understanding this would mean going back to the old 
> communication patterns. What's the point of having topics if we don't 
> use them?

Let me try to phrase it differently: If there's already a topic for a
use case, it's better to use it. That's easier and less error-prone.
So if, e.g., I want to send my script's data to all workers,
publishing to bro/cluster/worker will do the job. And that will even
automatically adapt if things get more complex later. For example, I
can see having multiple otherwise independent cluster sharing a
communication channel. In that case, we could internally change the
topic to "bro/cluster//workers", and everybody using the
predefined worker topic would still reach "their" workers without any
further changes.

> That's something I would have expected. I don't think this is 
> necessarily an indicator of bad design.

Maybe it's a *necessary* design, but that doesn't make it nice. ;-) It
makes it very hard to follow the logic; when reading through the
scripts I got lost multiple times because some "@if I-am-a-manager"
was somewhere half a page earlier, disabling the code I was currently
looking at for most nodes. We probably can't totally avoid that, but
the less the better.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Azoff, Justin S

> On Aug 6, 2018, at 3:50 PM, Robin Sommer  wrote:
> 
>- Relaying is hardly used.
> 
> 
>- There's a lot of checks in publishing code of the type "if I am
>  (not) of node type X".

I think these 2 are somewhat related.  Since there weren't higher level things 
like relaying, in order to relay
a message from one worker to all other workers you had to jump through hoops 
with worker2manger and
manager2worker events and often lots of @if stuff.

There's also a bunch of places that I think were written standalone first and 
then updated to work on a cluster in
place resulting in some awkwardness.. like notice/main.bro:

function NOTICE(n: Notice::Info)
{
if ( Notice::is_being_suppressed(n) )
return;

@if ( Cluster::is_enabled() )
if ( Cluster::local_node_type() == Cluster::MANAGER )
Notice::internal_NOTICE(n);
else
{
n$peer_name = n$peer_descr = Cluster::node;
Broker::publish(Cluster::manager_topic, Notice::cluster_notice, n);
}
@else
Notice::internal_NOTICE(n);
@endif
}

event Notice::cluster_notice(n: Notice::Info)
{
NOTICE(n);
}

So on a worker, calling NOTICE publishes a cluster_notice event that then 
re-calls NOTICE on the manager, 
which then does the right thing.  You end up with a single small function with 
nested @if/if that works 3 different ways.

But if this was written in a more 'cluster by default' way, it would just look 
like:

function NOTICE(n: Notice::Info)
{
if ( Notice::is_being_suppressed(n) )
return;

n$peer_name = n$peer_descr = Cluster::node;
Broker::publish(Cluster::manager_topic, Notice::cluster_notice, n);
}

event Notice::cluster_notice(n: Notice::Info)
{
if ( Notice::is_being_suppressed(n) )
return;

Notice::internal_NOTICE(n);
}

Which other than the suppression check, has no branching at all.

Broker::publish could possibly be optimized for standalone to raise the event 
directly if not being ran in a cluster.
The only small downside is on a standalone you'd call is_being_suppressed 
twice, could always add a @if if you
really wanted, but is_being_suppressed is just a set lookup.

Then this stuff would be a good use for efficient relaying/broadcasting instead 
of making the manager do all the work:

Broker::auto_publish(Cluster::worker_topic, Notice::begin_suppression);
Broker::auto_publish(Cluster::proxy_topic, Notice::begin_suppression);


— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-07 Thread Jon Siwek
On Mon, Aug 6, 2018 at 3:00 PM Robin Sommer  wrote:

> Overall I have to say I found it pretty hard to follow this all
> because we don't have much consistency right now in how scripts
> structure their communication. That's not surprising, given that we're
> just starting to use all this, but it suggests that we have room for
> improvement in our abstractions. :)

How much is due to new API usage and how much is due to things mainly
being a direct port of old communication patterns (which I guess are
written by various people over extended lengths of time and so there's
inconsistencies to be expected) ?  Or due to being a mishmash of both
old and new?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-07 Thread Jon Siwek
On Mon, Aug 6, 2018 at 1:57 PM Robin Sommer  wrote:

> I have another question about this specific case: we use relay_rr()
> only for sending Intel::insert_indicator. Intel::remove_indicator gets
> published normally through auto_publish(). Why the difference?

Potentially no reason other than no one reviewed whether it had
potential to be optimized in a similar way.  e.g. I first ported
scripts in a direct fashion without trying to change too much
structurally about comm. patterns or doing any optimization except in
cases where a change was specifically talked about.  I only recall
Justin had called out Intel::insert_indicator, so it got changed.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-07 Thread Jan Grashöfer
To be honest, I have somehow lost track of the discussion. What I can 
recall, it's about simplifying the API in the light of multi-hop 
routing, which is not fully functional yet.

Regarding multi-hop routing I am even not sure what the actual goal is 
that we are currently aiming at. However, from a conceptual perspective 
I think "routing" either needs routing algorithms or strict conventions 
of how the network, to route messages through, is structured. So, what 
would a "deep cluster" look like and what kind of message flows do we 
expect in there?

Some comments on the observations:

On 06/08/18 21:50, Robin Sommer wrote:
>  - The main topics are bro/cluster/ and
>bro/cluster/node/. For these we wouldn't have a problem
>with loops if we enabled automatic, topic-driven forwading as
>far as I can see.

How does forwarding work if I add another node type? Do we assume a 
certain cluster structure here? If yes: Is that a valid assumption?

>  - bro/cluster/broadcast seems to be the main case with a looping
>problem, because everybody subscribes to it. It's hardly used
>though. (bro/config/change is used similarly though).

The topic-concept is a multicast scheme, isn't it? Having a broadcast 
functionality on top of that feels odd. However, it's limited to the 
cluster topic. This leads me to the question which domains do we operate 
on? If I think of messages, I start to think about a cluster but that 
might be only one domain of application. I think it would be good to 
define layers of abstraction more precise here.

>  - There are a couple of script-specific topics where I'm wondering
>if these could switch to using bro/cluster/ instead
>(bro/intel/*, bro/irc/dcc_transfer_update). In other words: when
>clusterizing scripts, prefer not to introduce new topics.

 From my understanding this would mean going back to the old 
communication patterns. What's the point of having topics if we don't 
use them?

>  - There's a lot of checks in publishing code of the type "if I am
>(not) of node type X".

That's something I would have expected. I don't think this is 
necessarily an indicator of bad design. Having these kind of checks 
means that roles are somehow fixed and responsibilities are explicitly 
codified.

>  - Pools are used for two different things: 1. the known-* scripts
>pick a proxy to process and log the information; whereas 2. the
>Intel scripts pick a proxy just as a relay to broadcast stuff
>out, reducing load. That 1st application is a good, but the 2nd
>feels like should be handled differently.

I think we should be careful about introducing too much abstractions. 
Communication patterns tend to be complex and the more of the complexity 
is hidden, the easier it will be to generate misunderstandings. For 
example, in case of the intel framework, proxy nodes might be able to 
implement some more logic than just relaying at some point. Having the 
relay abstraction would mean to deal with two different levels of 
abstractions regarding intel on proxy nodes in this case.

> Overall I have to say I found it pretty hard to follow this all
> because we don't have much consistency right now in how scripts
> structure their communication. That's not surprising, given that we're
> just starting to use all this, but it suggests that we have room for
> improvement in our abstractions. :)

I totally agree here! I think it could help to come up with some more 
use cases to identify the best abstractions.

Jan
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-06 Thread Robin Sommer


On Mon, Jul 30, 2018 at 09:01 -0700, I wrote:

> Is there a summary somewhere of what events & topics the cluster nodes
> are currently exchanging?

So I went through the exercise of collecting this information: what
connections do we have between nodes, who's subscribing to what, and
who's publishing what; see the attached PDF. This is based on all the
standard scripts, with some special cases ignored (like the control
framework).

I'm not fully sure yet what to conclude from this, but a few quick
observations:

- The main topics are bro/cluster/ and
  bro/cluster/node/. For these we wouldn't have a problem
  with loops if we enabled automatic, topic-driven forwading as
  far as I can see.

- bro/cluster/broadcast seems to be the main case with a looping
  problem, because everybody subscribes to it. It's hardly used
  though. (bro/config/change is used similarly though).

- Relaying is hardly used.

- There are a couple of script-specific topics where I'm wondering
  if these could switch to using bro/cluster/ instead
  (bro/intel/*, bro/irc/dcc_transfer_update). In other words: when
  clusterizing scripts, prefer not to introduce new topics.

- There's a lot of checks in publishing code of the type "if I am
  (not) of node type X".

- Pools are used for two different things: 1. the known-* scripts
  pick a proxy to process and log the information; whereas 2. the
  Intel scripts pick a proxy just as a relay to broadcast stuff
  out, reducing load. That 1st application is a good, but the 2nd
  feels like should be handled differently.

Need to mull over this more, thoughts welcome.

Overall I have to say I found it pretty hard to follow this all
because we don't have much consistency right now in how scripts
structure their communication. That's not surprising, given that we're
just starting to use all this, but it suggests that we have room for
improvement in our abstractions. :)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com


Broker Communication.pdf
Description: Adobe PDF document
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-06 Thread Robin Sommer



On Fri, Aug 03, 2018 at 15:57 -0500, Jonathan Siwek wrote:

> Another use is hidden within Cluster::relay_rr():

Yeah, though at least from an API perspective this is different: The
caller gives relay_rr() only one topic to send to (indicator_topic).
It's then using a different topic internally to get it over to the
proxy first, but that feels more like an implementation detail. So in
that sense I would argue that this is not a use-case for the Broker
API letting users change the topic on relay. (I'm not saying that that
capability can't be useful, I'm just still looking for actual use
cases.)

I have another question about this specific case: we use relay_rr()
only for sending Intel::insert_indicator. Intel::remove_indicator gets
published normally through auto_publish(). Why the difference?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-03 Thread Jon Siwek
On Fri, Aug 3, 2018 at 12:22 PM Robin Sommer  wrote:

> On Fri, Jul 27, 2018 at 10:39 -0700, I wrote:
>
> > Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, 
> > val, location);
>
> Can somebody remind me what the use-case is for changing the topic on
> relay? Grepping over our standard scripts, I see only one use of
> relay(), and that's the one above.

Another use is hidden within Cluster::relay_rr():

event Intel::new_item(item: Item) =5
{
if ( Cluster::proxy_pool$alive_count == 0 )
Broker::publish(indicator_topic, Intel::insert_indicator, item);
else
Cluster::relay_rr(Cluster::proxy_pool, "Intel::new_item_relay_rr",
  indicator_topic, Intel::insert_indicator, item);
}

That is, if the manager is currently connected to some proxy, it picks
one to do the work of distributing the event to workers.  Manager
sends 1 message instead of N.

I don't know if there's currently other use-cases for Broker::relay
specifically, but Cluster::relay_rr/Cluster::relay_hrw is essentially
an extension of that which just also does the work of choosing the
initial topic based upon a given pool and partition strategy.

Might have been Justin who originally pointed out potential for
avoiding manager overload in this way.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-03 Thread Robin Sommer



On Fri, Jul 27, 2018 at 10:39 -0700, I wrote:

> Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, 
> val, location);

Can somebody remind me what the use-case is for changing the topic on
relay? Grepping over our standard scripts, I see only one use of
relay(), and that's the one above.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer



On Mon, Jul 30, 2018 at 13:30 -0500, Jonathan Siwek wrote:

> Seems clunky and could get dicey

Agreed. :) It'd just be a heuristic to catch some obvious errors. I
don't think there's more we can do, we can't really catch loops
statically by looking at the code.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek
On Mon, Jul 30, 2018 at 12:16 PM Robin Sommer  wrote:

> > I'd be more comfortable if one could automate answering the question:
> > "if I add a subscription to a given node in the network, will I create
> > a cycle?".
>
> Hmm ... What about a test mode where we'd spin up a dummy cluster
> (similar to what the bests do), have each node send a message to all
> subscribed topics, and watch for TTL violations?

Seems clunky and could get dicey -- subscriptions that
transient/dynamic may not be well-tested and you'd probably want to
guarantee that sending such a dummy message actually does not result
in any side-effects at the Bro-layer.  If nodes start raising random
events at unusual/unintended times I start to doubt the stability of
things.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer



On Mon, Jul 30, 2018 at 11:15 -0500, Jonathan Siwek wrote:

> I don't see why not, but it takes planning and prudence on everyone's
> part (including users) to not break that rule.

Yeah, question is we can pre-configure the cluster so that user's
don't need to worry about it most of the time.

> I'd be more comfortable if one could automate answering the question:
> "if I add a subscription to a given node in the network, will I create
> a cycle?".

Hmm ... What about a test mode where we'd spin up a dummy cluster
(similar to what the bests do), have each node send a message to all
subscribed topics, and watch for TTL violations?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek
On Mon, Jul 30, 2018 at 11:02 AM Robin Sommer  wrote:

> True, although it's not cycles in the connection topology that matter,
> it's cycles in topic subscriptions.

Right, good point.

> I need to think about this a bit
> more (and I need to remind myself how our topics currently look like)

I think we just have the "broadcast_topic" to which all nodes
subscribe, but not sure if there's more.

> but could we set up topics so that even in a cluster, messages don't
> go into a cycle?

I don't see why not, but it takes planning and prudence on everyone's
part (including users) to not break that rule.

I'd be more comfortable if one could automate answering the question:
"if I add a subscription to a given node in the network, will I create
a cycle?".

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek
On Fri, Jul 27, 2018 at 7:30 PM Azoff, Justin S  wrote:
>
>
> > On Jul 27, 2018, at 6:10 PM, Jon Siwek  wrote:
> >
> > On Fri, Jul 27, 2018 at 3:55 PM Azoff, Justin S  wrote:
> >
> >> I do agree that there's room for a lot of simplification, for example a 
> >> worker broadcasting a message efficiently to all
> >> other workers needs to do something like this from the docs:
> >>
> >>Cluster::relay_rr(Cluster::proxy_pool, "example_key",
> >>  Cluster::worker_topic, worker_to_workers,
> >>  Cluster::node + " (via a proxy)");
> >>
> >> But a lot of that could have defaults:
> >>
> >> Most use cases would want to relay through the default proxy pool
> >> Since round robin is in use, they key shouldn't matter.
> >
> > At the moment, one could write their own wrapper function around that
> > if they find it too verbose and always want to use certain defaults?
>
> Yeah.. The wrapper would be trivial.. Should bro include it so that the API 
> scripts use is simpler?

Maybe.  We can see how it fits in the mix of what Robin suggested:

  # Supports variadic args in place of Broker::Event.
  Broker::publish(topic: string, args: Broker::Event, relay_topic:
string ="", process_on_relayer: bool =F)

  # Supports variadic args in place of Broker::Event.
  Cluster::publish(pool: Cluster::pool, key: any, strategy: enum,
args: Broker::Event, relay_topic: string ="",
process_on_relayer: bool =F)

  # Supports variadic args in place of Broker::Event.  Use proxy pool
and RR method w/ arbitrary, internal key by default.
  Cluster::publish_via_proxy(relay_topic: string, args: Broker::Event)

That last one being the wrapper you're asking for.  Also, I compressed
the ideas of having a separate "relay: bool" / "relay_hops: int" and
"relay_topic: string" args -- a non-empty relay topic implicitly means
you want to enable relaying on the receiving node.  Thinking more
about original idea of giving the number of relay hops: it may be
better to leave that until Broker multihop is more robust and allow
it's automatic forwarding mechanisms to take care of those scenarios
whereas a "relay" is a simple mechanism at the Bro application level
(has it's own unique message format) that serves to aid load-balancing
use-cases (rather than routing use-cases).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer
On Fri, Jul 27, 2018 at 14:47 -0500, Jonathan Siwek wrote:

> Broker does not yet have automatic multihop where subscriptions are
> globally flooded automatically.

Yep, that's what I meant: dynamic multihop where each node tracks what
its peers are subscribing to, and forwards messages independent of its
own subscriptions.

> Possibly a downside is now you need to store original hop limit in
> addition to current TTL in each message if you want to detect the "is
> 1st hop" condition for the "relay_topic" option below.

Yeah, that's right. Actually I think ideally the 1st hop wouldn't have
any special role anyways if we didn't need that "relay_topic".

> It's maybe both a concern and a reality -- Bro clusters currently
> contain cycles (e.g. worker -> manager -> proxy -> worker)

True, although it's not cycles in the connection topology that matter,
it's cycles in topic subscriptions. I need to think about this a bit
more (and I need to remind myself how our topics currently look like)
but could we set up topics so that even in a cluster, messages don't
go into a cycle?

Is there a summary somewhere of what events & topics the cluster nodes
are currently exchanging?

> > - Add a second function publish_pool() that has all the same
> >   options, but receives a pool type instead of a topic (just an
> >   enum: RR, HRW).
> 
> What's the goal of the enums instead of just publish_hrw() and publish_rr() ?

Similar to what Justin wrote, it would more directly express the
intent, with less emphasis on the mechanism; we could set a
default to whatever we recommend people normally use; and it'd be more
extensible.

> At the moment, one could write their own wrapper function around that
> if they find it too verbose and always want to use certain defaults?

They could, but my general point is that it'd be nice to have a simple
API that covers the most common uses cases directly and intuitively.
Then let people change defaults if they have to and know what they are
doing.

Robin



-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Azoff, Justin S

> On Jul 27, 2018, at 6:10 PM, Jon Siwek  wrote:
> 
> On Fri, Jul 27, 2018 at 3:55 PM Azoff, Justin S  wrote:
> 
>> I do agree that there's room for a lot of simplification, for example a 
>> worker broadcasting a message efficiently to all
>> other workers needs to do something like this from the docs:
>> 
>>Cluster::relay_rr(Cluster::proxy_pool, "example_key",
>>  Cluster::worker_topic, worker_to_workers,
>>  Cluster::node + " (via a proxy)");
>> 
>> But a lot of that could have defaults:
>> 
>> Most use cases would want to relay through the default proxy pool
>> Since round robin is in use, they key shouldn't matter.
> 
> At the moment, one could write their own wrapper function around that
> if they find it too verbose and always want to use certain defaults?

Yeah.. The wrapper would be trivial.. Should bro include it so that the API 
scripts use is simpler?


>> The round robin part itself is really an implementation detail for proxy 
>> load balancing and maybe not something that
>> should be exposed in the API.  Now that I think of it I'm not sure why one 
>> would ever use relay_hrw over relay_rr.
> 
> Theoretically, a more favorable load distribution that's consistent
> over time?  e.g. if you do RR of the same messaging pattern from
> multiple nodes, you could have waves of "randomly" overlapping loads
> on the relayer-node since everyone is cycling through all the proxies
> at their own rate when choosing the relayer.  With HRW, you'd stick
> with the same relayer over time and only change on outages, but
> everyone should have chosen their relayer in a uniformly distributed
> fashion.
> 
> - Jon

I'd expect that round robin would give the most uniform load distribution, for 
N proxies each proxy
would see 1/N relay messages, but I guess in general round robin isn't the best 
load balance mechanism
since it doesn't take into account the responsiveness of each proxy.  With some 
of the information CAF provides it
may be possible to also support weighted round robin.  That way if a proxy node 
doesn't die outright but starts
having issues for one reason or another, relay_rr could avoid sending it 
messages.

— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Jon Siwek
On Fri, Jul 27, 2018 at 3:55 PM Azoff, Justin S  wrote:

> I do agree that there's room for a lot of simplification, for example a 
> worker broadcasting a message efficiently to all
> other workers needs to do something like this from the docs:
>
> Cluster::relay_rr(Cluster::proxy_pool, "example_key",
>   Cluster::worker_topic, worker_to_workers,
>   Cluster::node + " (via a proxy)");
>
> But a lot of that could have defaults:
>
> Most use cases would want to relay through the default proxy pool
> Since round robin is in use, they key shouldn't matter.

At the moment, one could write their own wrapper function around that
if they find it too verbose and always want to use certain defaults?

> The round robin part itself is really an implementation detail for proxy load 
> balancing and maybe not something that
> should be exposed in the API.  Now that I think of it I'm not sure why one 
> would ever use relay_hrw over relay_rr.

Theoretically, a more favorable load distribution that's consistent
over time?  e.g. if you do RR of the same messaging pattern from
multiple nodes, you could have waves of "randomly" overlapping loads
on the relayer-node since everyone is cycling through all the proxies
at their own rate when choosing the relayer.  With HRW, you'd stick
with the same relayer over time and only change on outages, but
everyone should have chosen their relayer in a uniformly distributed
fashion.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Azoff, Justin S

> On Jul 27, 2018, at 1:39 PM, Robin Sommer  wrote:
> 
> I'm wondering if we should give it another try to simply this API
> while we still can (i.e., before 2.6 goes out). To me, the most
> intuitive publish operation is "send to topic T and propagate to
> everybody subscribed to that topic". I'd structure the API around
> that, making that the main publish function for that simply:
> 
>Broker::publish(topic, args);
> 
> That would send to all neighbors, which then process locally and relay
> to their neighbors. Right now, that would propagate just across one
> hop but once we have multihop that'd start being broadcasted out
> broadly.

This would do weird things on workers, since they connect to both the manager 
and proxies.

Worker 1 would send to it's neighbors [manager, proxy1, proxy2] but then those 
3 nodes would
relay to all of the other workers.  The TTL would stop the propagation, but 
you'd still end up sending
3 copies of the same message to each worker.

I do agree that there's room for a lot of simplification, for example a worker 
broadcasting a message efficiently to all
other workers needs to do something like this from the docs:

Cluster::relay_rr(Cluster::proxy_pool, "example_key",
  Cluster::worker_topic, worker_to_workers,
  Cluster::node + " (via a proxy)");

But a lot of that could have defaults:

Most use cases would want to relay through the default proxy pool
Since round robin is in use, they key shouldn't matter.
The round robin part itself is really an implementation detail for proxy load 
balancing and maybe not something that
should be exposed in the API.  Now that I think of it I'm not sure why one 
would ever use relay_hrw over relay_rr.

Removing a lot of that gets close to what you are suggesting:

Cluster::relay(Cluster::worker_topic, worker_to_workers, Cluster::node + " 
(via a proxy)");

which is I guess just

Cluster::relay(topic, args)

like you said.

— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Jon Siwek
On Fri, Jul 27, 2018 at 12:40 PM Robin Sommer  wrote:

> I'm wondering if we should give it another try to simply this API
> while we still can (i.e., before 2.6 goes out). To me, the most
> intuitive publish operation is "send to topic T and propagate to
> everybody subscribed to that topic". I'd structure the API around
> that, making that the main publish function for that simply:
>
> Broker::publish(topic, args);
>
> That would send to all neighbors, which then process locally and relay
> to their neighbors. Right now, that would propagate just across one
> hop but once we have multihop that'd start being broadcasted out
> broadly.

Can you remind/clarify what's meant by "multihop" ?  I thought:

Broker already has manual multihop if you set up subscriptions on all
relevant nodes on the path yourself.  Bro doesn't use it right now.

Broker does not yet have automatic multihop where subscriptions are
globally flooded automatically.

A difference between "manual multihop" and "automatic multihop" would
be that in the later, some relaying nodes may not actually hold a
subscription to the message they are relaying and so, in the case of
Bro events, I think they would not process them locally.

> - Give publish() another argument "relay: bool =T" to prevent
>   it from going beyond the immediate receiver. Or maybe instead:
>   "relay_hops: int =-1" to specify the max number of hops
>   to relay across, with -1 meaning no limit.

Going with the generalized approach of configurable number of hops per
message via "relay_hops" from the start would be better than finding
out we need it later.

Possibly a downside is now you need to store original hop limit in
addition to current TTL in each message if you want to detect the "is
1st hop" condition for the "relay_topic" option below.

> . (I recall concerns
>   about loops being too easy to create; we could set the default
>   here to F/0 to default to no forwarding, although conceptually I
>   don't really like that :-)

It's maybe both a concern and a reality -- Bro clusters currently
contain cycles (e.g. worker -> manager -> proxy -> worker)

> - Give publish() another argument "relay_topic: string =""
>   to change the topic when relaying on the 1st hop.
>
> - Give publish() another argument "process_on_relays: bool =T"
>   to change whether a relaying hop also sees the event locally.

Those seem fine to me.

> - Add a second function publish_pool() that has all the same
>   options, but receives a pool type instead of a topic (just an
>   enum: RR, HRW).

What's the goal of the enums instead of just publish_hrw() and publish_rr() ?

Instead of the API being 2 functions, it then seems like 2 enums that
are never used elsewhere + 1 function that now always branches
internally.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev