New attachment viewer for macOS

2024-04-20 Thread Robin Sommer


Hi all,

I used run `qlmanage` to view attachments and HTML bodies on my Mac
but that has always been feeling a bit clumsy. So a little while ago I
ended up trying to write a replacement: a little standalone document
previewer wrapping macOS' *Quick View* for easy use with mutt (and,
more generally, from the command line). I did this originally mostly
as an exercise in SwiftUI but ended up quite liking it, so I figured
I'd send a pointer here in case anybody else is looking for
alternatives.

It's on GitHub at https://github.com/rsmmr/qlview. You can either
build it yourself through Xcode or download a pre-built binary, see
the `README`. There's also some more information on possible ways to
integrate it with mutt through `mailcap` and a macro. This is all a bit
in-progress still too, so feel free to send suggestions or even patches.

Best,

Robin

-- 
Robin Sommer * ICSI * ro...@icir.org * www.icir.org/robin


Re: GMail SMTP: no authenticators available?

2021-01-28 Thread Robin Sommer

On Tue, Jan 26, 2021 at 10:32 -1000, Baron Fujimoto wrote:

> - I don't seem to have /usr/lib/libsasl2.2.dylib

Me neither actually, but it still works:

# ls /usr/lib/libsasl2.2.dylib
ls: /usr/lib/libsasl2.2.dylib: No such file or directory
# otool -L ~/bin/mutt
/Users/robin/bin/mutt:
[...]
/usr/lib/libsasl2.2.dylib (compatibility version 3.0.0, current version 
3.15.0)
[...]

I believe Big Sur has started to do some magic there where libraries
are stored elsewhere.

> - Assuming I did have an alternate version of libsasl2 available, how would I 
> link to that library specifically when building mutt?

Try simply uninstalling the MacPorts version ("port uninstall
cyrus-sasl2") and then recompiling mutt from scratch just as before.
That worked for me and now picked up the system's version of the
libsasl.

> Also using MacOS 11.1, if that helps

Same here.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin


Re: GMail SMTP: no authenticators available?

2021-01-25 Thread Robin Sommer


On Fri, Jan 22, 2021 at 13:48 +, I wrote:

> so things must have reverted. But it's all still working fine

I take that back: the problem persists when linking against MacPorts'
libsasl2. Linking against the system's library lets SMTP work for me.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin


Re: GMail SMTP: no authenticators available?

2021-01-22 Thread Robin Sommer


On Wed, Jan 20, 2021 at 16:22 -1000, Baron Fujimoto wrote:

> Our og's email is hosted by Gmail (via GSuite). I had been using
> neomutt (built from MacPorts) successfully for years.

Maybe I can point you in some useful direction: I had exactly this
problem with a self-built mutt recently after upgrading macOS and
rebuilding all the ports (and mutt). It took me a while to find what
was going on: MacPorts' libsasl2 seemed to have trouble with GMail. I
uninstalled that and had mutt link against /usr/lib/libsasl2.2.dylib,
and everything went back to working normally for me. 

Now, here's the funny thing: as I'm writing this, I just double
checked my mutt binary. Turns out it's back to linking against
MacPorts (now /opt/local/lib/libsasl2.3.dylib). I've rebuilt mutt in
the meantime a couple of times, so things must have reverted. But it's
all still working fine, which probably means that it was libsasl2
version thing somehow that's been corrected by now.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin


[Zeek-Dev] Re: Proposed change to lambda semantics - shallow copying rather than references

2020-12-11 Thread Robin Sommer
Sounds like a way forward then to both address the current concern,
and improve this overall. Does this work for everybody?

Robin

On Thu, Dec 10, 2020 at 18:26 +, Johanna Amann wrote:

> I like this idea, including just deprecating the old syntax - that makes it
> explicit what exactly happens.
> 
> Johanna
> 
> On 10 Dec 2020, at 18:06, Robin Sommer wrote:
> 
> > It's interesting how different people have different intuitions on
> > semantics here. I also see it as consistent with function arguments,
> > that's why I'd be fine it. That said, I was also thinking along the
> > same lines of adding explicit capture specifications: deprecate the
> > current, capture-spec-less syntax, and generally just require people
> > to list what they want to capture; seems like a useful practice to me.
> > And then we let them tell Zeek if they want deep or shallow copies
> > (but always copies, not references). "when" could then move into the
> > same direction as well; maybe it could even change to take a lambda
> > instead of its own body. That would simplify the implementation, too.
> > 
> > Robin
> > 
> > On Thu, Dec 10, 2020 at 09:40 -0800, Vern Paxson wrote:
> > 
> > > > for sure on my wishlist is consideration for some deprecation-path,
> > > > differentiating-syntax (maybe event just temporary), or other
> > > > warning/notice that can help users along instead of potentially
> > > > breaking their code outright.
> > > 
> > > Good point.  Seems a natural way to do this is to add C++-style []
> > > capture
> > > syntax, and a deprecation warning (and the current semantics) if
> > > it’s
> > > missing.  (And maybe no warning if the body doesn’t use any of the
> > > outer
> > > variables, since that form will continue to work.)
> > > 
> > > — Vern
> > > ___
> > > zeek-dev mailing list -- zeek-dev@lists.zeek.org
> > > To unsubscribe send an email to zeek-dev-le...@lists.zeek.org
> > 
> > -- 
> > Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
> 
> 
> > ___
> > zeek-dev mailing list -- zeek-dev@lists.zeek.org
> > To unsubscribe send an email to zeek-dev-le...@lists.zeek.org

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Proposed change to lambda semantics - shallow copying rather than references

2020-12-10 Thread Robin Sommer
It's interesting how different people have different intuitions on
semantics here. I also see it as consistent with function arguments,
that's why I'd be fine it. That said, I was also thinking along the
same lines of adding explicit capture specifications: deprecate the
current, capture-spec-less syntax, and generally just require people
to list what they want to capture; seems like a useful practice to me.
And then we let them tell Zeek if they want deep or shallow copies
(but always copies, not references). "when" could then move into the
same direction as well; maybe it could even change to take a lambda
instead of its own body. That would simplify the implementation, too.

Robin

On Thu, Dec 10, 2020 at 09:40 -0800, Vern Paxson wrote:

> > for sure on my wishlist is consideration for some deprecation-path,
> > differentiating-syntax (maybe event just temporary), or other
> > warning/notice that can help users along instead of potentially
> > breaking their code outright.
> 
> Good point.  Seems a natural way to do this is to add C++-style [] capture
> syntax, and a deprecation warning (and the current semantics) if it’s
> missing.  (And maybe no warning if the body doesn’t use any of the outer
> variables, since that form will continue to work.)
> 
> — Vern
> ___
> zeek-dev mailing list -- zeek-dev@lists.zeek.org
> To unsubscribe send an email to zeek-dev-le...@lists.zeek.org

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: [Zeek-Def] Re: Platform support policy

2020-11-10 Thread Robin Sommer

On Mon, Nov 09, 2020 at 14:16 -0800, Christian Dreibach wrote:

> Hah, I asked them too and they actually mentioned their email reply to you!

Yeah, and in a further followup email they went a bit further even and
committed to generally keeping the previous base image around in the
future when new macOS release come out (but not the Xcode image).

> https://github.com/zeek/zeek/pull/1268

Excellent. Look like we're all good with our new policy. Thanks for
driving this forward, Christian!

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: [Zeek-Def] Re: Platform support policy

2020-11-09 Thread Robin Sommer


On Fri, Nov 06, 2020 at 12:27 -0800, you wrote:

> we could just point at our CI? That'd be Dockerfiles for anything
> Linux and prepare.sh for FreeBSD and macOS. The benefit would be
> that we'd maintain this in one place only. We could invest a bit of
> time in documenting the Dockerfiles/prepare.sh scripts so they
> explain these?

Yeah, agree, that sounds better than maintaining the information
separately.

> :) -- yes, I mislabeled that one. I actually meant to say
> "Maintenance Updates". Fixed in the wiki page.

Perfect. :).

> I'm definitely not the expert here but it all looks like Catalina
> with varying additions:

Yeah, I saw that, but not sure if that means they are actively
removing older images. I'll see if I can find out.

> Btw I didn't include anything about architectures ... for Debian 9
> we currently have a 32-bit container, for some other platforms those
> are still available too. Do we still care about 32-bit?

Limiting to 64-bit seems fine to me for our current CI platforms. I'm wondering 
about
supporting ARM (32- & 64-bit for Linux, 64bit for future macOS), but looks
like CI doesn't support that yet either way: 
https://github.com/cirruslabs/cirrus-ci-docs/issues/218

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: [Zeek-Def] Re: Platform support policy

2020-11-06 Thread Robin Sommer
This looks great to me. Some questions/notes:

- Which of these come with additional requirements beyond just OS base
  packages? We should note those. What I can think of:

- CentOS: We require devtoolset (which version?) and probably
  EPEL? I don't recall how people get some of the dependencies in
  place (bison, flex, cmake).

- macOS: Homebrew I suppose.

- On CentOS, if we limit support to "full updates", CentOS 7 will drop
  off soon. I'm wondering if we should use versions with maintenance
  updates instead? (I'm glad CentOS 6 just went out of life support :-)

- Does anybody know if Cirrus keeps previous macOS images available?
  If so, I'd suggest we keep the two most recent macOS versions.
 
Robin

On Thu, Nov 05, 2020 at 20:51 -0800, Christian Dreibach wrote:

> Hi folks,
> 
> Sorry for the delay here -- I've now put together a page:
> 
> https://github.com/zeek/zeek/wiki/Platform-Support-Policy
> 
> I've also added an entry to the calendar for macOS (Catalina, using the
> availability of images in Cirrus CI as a driver), and I'm putting together a
> PR (still a draft) for bringing our CI in line with what's in that page:
> 
> https://github.com/zeek/zeek/wiki/Platform-Support-Policy
> 
> Let me know your thoughts...
> 
> Thanks,
> Christian

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Platform support policy

2020-10-22 Thread Robin Sommer
On Wed, Oct 21, 2020 at 14:14 -0700, Christian Kreibich wrote:

> https://bit.ly/zeek-os-calendar-ical
> https://bit.ly/zeek-os-calendar-google

Cool, very nice!

Want to start a Wiki page where we collect the pieces for our new
policy? Can link to those calendars from there.

> I think we've got this, too: when a new release starts, it should go into
> CI, and when one drops off LTS EOL, it can go. What do you think?

Yeah, though I think we'll need to discuss on per distribution basis
what exactly the versions are that we want to support at any point of
time (and hence have in CI), plus any additional assumptions we're
making (e.g., requiring devtools-X on CentOS).

That might be as easy as supporting anything currently in either
release or LTS, but spelling it out in the terms of the corresponding
distributions would be helpful to make sure everybody's on the same
page (e.g., some phrasing of "on Ubuntu, we support the current
release as well as any LTS release not EOL yet; and we assume just
standard packages"). Could you give that a try on that Wiki page as
well for those distros, and then we can send it around to the users'
list to see if people can get behind it?

One more OS to add is macOS. While nobody knows the deadlines there,
we should record how far back we go in supporting previous versions,
and what package management we're assuming for dependencies (Homebrew
I suppose).

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Platform support policy

2020-10-07 Thread Robin Sommer



On Tue, Oct 06, 2020 at 22:21 -0700, Christian Kreibich wrote:

> We could put together a release calendar to remind us when it's time to
> update CI. I took a quick look and couldn't immediately find one. I'd be
> happy to put one together.

Yes, that would be great. The other thing we'll need to do is define
the per distribution policies. You could add a 1st stab at that to the
calendar as well, or we can do it afterwards.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Platform support policy

2020-10-05 Thread Robin Sommer
[I had posted this on Slack in #development originally, copying here
for visibility]

We now have the nice list of platforms that Zeek currently supports
at: https://github.com/zeek/zeek/wiki/Zeek-Operating-System-Support-Matrix.
We however don’t have a policy yet what platform we generally aim to
support going forward. Dominik and I discussed this further the other
day and came up with a proposal:

1. For each distribution, we decide on a dedicated policy which
versions we want to support at any point. A general rule of thumb
would be “the two most recent stable releases still within their
support window”. But we’d actually look at each distribution and
see what policy makes sense, including any additional dependencies
we will rely on (like devtools on CentOS). And we do
mini-request-for-comments to get feedback from users if we’re
making a reasonable choice.

2. We update CI to test those versions, and *only* those versions.

3. We then rely on CI to decide if changes are ok: If, e.g.,
somebody wants to use a new C++ feature, that’s ok as long as CI
passes. If it breaks CI, it won’t go in.

The advantage of this is that we won’t rely on having to specify
specific compiler versions, which isn’t well-defined (because things
also depend on libstdc++ version, system libs, etc.), but instead on
whether people can actually compile Zeek on the platforms they are
using.

A disadvantage is that we’ll need to spend the time to ensure our CI
setup keeps matching the policies, and gets updated as distribution
updates come out.

Thoughts?

Dominik added:

> I think you’ve summarized it nicely! I’d only add that we’ll have to
> keep our CI up-to-date regardless. Jon is already doing a great job
> staying on top of things and adding EOL dates to dockerfiles as
> comments to make it obvious when things can get dropped from Cirrus.

> Also, I’m obviously I’m in favor of this proposal. :wink:
> The “everything that passes CI is fair game” approach not only makes
> sure we naturally catch up to new C++ features eventually (which is
> the main motivation), but we can also steadily modernize our CMake
> scaffold, Python scripts, and so on.

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Cluster Controller Framework Thoughts

2020-09-03 Thread Robin Sommer


Hi Vlad,

thanks for the feedback, that's quite helpful. I'll dig a bit into
some of your points below.

As a general point, there's nothing wrong with chosing a different
deployment model than whatever becomes the new default. Quite the
opposite: Part of the thinking here has been that there's no single
approach that'll work for everybody, hence we want to offer multiple
layers that people can hook into depending on their needs and
expertise. The Controller would be the highest level abstraction that
gives you an experience not too far from current ZeekControl (more on
that below). On the other end of the spectrum, skipping everything
altogether and going with a manual systemd config is the lowest-level
way of doing it. In between those two we have: using the Supervisor
API through a custom Zeek management script (i.e., no Cluster
Agent/Controller), and writing a custom controller to interface with
the Agent API while doing your own state management. 

> Ultimately, given the choice between systemd + supervisor versus just
> systemd, for our use case, just systemd gave us some distinct benefits and
> reduced complexity.

Ack, I can see that for you guys, especially with the current state of
things. I'll just add here that (1) not everybody can/want to use
systemd, so we'll need to have ways to build clusters in other
settings; and (2) some of the technical advantages you mention should
be addressable with the Supervisor/Controller, too; that's just not
there yet (e.g., nicer process visualization).

> it feels like I'm left with a choice between one orchestration tool +
> the cluster controller framework, versus just using a single
> orchestration tool to keep these files in sync and handle cluster
> stop/start/restarts.

I think part of the question here is how much effort one is willing to
invest into installing and maintaining Zeek. If you (1) are very
familiar with Zeek, and (2) have a current orchestration tool in place
that's easy to extend with all the necessary pieces (incl. managing of
restarts, logging, health monitoring), I agree that one tool sounds
better than two. However, if we look at it from the perspective of a
new user who wants to get Zeek running on their network quickly,
figuring out all those pieces is probably quite a hurdle. That
trade-off seems similar to ZeekControl today: people already have the
option to go through systemd, but ZeekControl remains the standard way
to run Zeek, even with all its quirks.

Re/ putting files in place everywhere: Per the design doc, I
definitely see distribution of packages and site-specific scripts in
scope for future versions of the Controller. That would then leave
people with the task to just install the same Zeek version everywhere,
which seems a reasonable expectation to me.

> If I need to reboot a system in a cluster, and it's running the manager and
> logger, I'd like to see another system in the cluster get promoted to being
> the manager and logger, and all the nodes to start talking to that instead.

I would like to see that, too. :-) However, this seems to be quite a
different thing than the systemd approach you are describing. How
would such a dynamic scheme operate without some kind of control layer
in between doing the coordination? In some future version, the Cluster
Controller would be the management component that can initiate changes
like dynamic fall-over. We can argue about whether that control layer
should be a central component (as the Controller proposes) vs some
distributed consensus scheme; and also whether we should really
implement this ourselves or rather go with some 3rd party tool for
coordination. But either way, I think something needs to be there.

> it feels a bit like the Cluster Controller framework is trying to take
> the old zeekctl features and get them to fit into a new model.

The proposal for the Supervisor/Controller model has been out for a
while, and the main point of feedback so far has been from folks who
wanted to ensure that we don't loose functionality that ZeekControl
offers today. So yes, that has been a starting point for fleshing out
a bunch of this: Can we retain what people like about ZeekControl, but
move it over into a new architecture that removes what they don't like
(e.g., copying binaries around)--and all that while facilitating a
more dynamic future world that increases Zeek's flexibility and
resilience. I'm not saying that the current Controller design achieves
all that already, but it has indeed been designed as an incremental
path forward rather than a lets-redo-it-from-scratch approach. Happy
to discuss if that's the right trade-off.

Robin

--
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Moving policy scripts into packages

2020-09-03 Thread Robin Sommer



On Mon, Aug 31, 2020 at 10:57 -0700, Christian Kreibich wrote:

> ... if we strengthen the notion of packages from the core distribution, we
> may want to ensure zkg can be available from the outset

Yeah, I'd be in favor of tying zkg more closely to Zeek itself so that
it's always available as a required dependency. I think that also
makes sense more generally as it has become a key part of our
ecosystem. We can add it to auxil/ as another submodule. The Zeek-side
can then use the code to get get packages in place directly, either
through the command-line client or through the zkg Python API.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Moving policy scripts into packages

2020-08-31 Thread Robin Sommer
To summarize this a bit, below is what I think what I heard so far.
Feel free to respond further, I'll move this over into the ticket
later once we have consensus.

Robin


- General preference to keep packages in individual repositories
  hosted inside a new GitHub organization "zeek-packages".

- Management through a meta-package that lists lists all desired
  packages as dependencies. The meta-package can version content by
  pinning packages to blessed versions.

- Tie these meta-packages to the current Zeek releases. To take this a
  bit further:

- I imagine this means that we'll have three meta-packages at any
  point of time: "zeek-packages-current", "zeek-packages-lts", and
  "zeek-package-devel". When a new release comes out, these rotate
  through.

- People can install packages for older (now unsupported) Zeek
  versions by picking a older version of the corresponding
  meta-package.

- The Zeek distribution can either download the current version of
  the meta package on install; or even just include the full
  content somehow (also see "zkg" below).

- Testing
- Need to make tests standalone and less dependent on Zeek versions.

- Should make standard btest infrastructure available to tests
  (e.g., Zeek's btest helpers, pcaps).

- Provide integration tests that execute across the full set of
  "zeek-packages".

- Development
- Make it it easy to work multiple packages at once (e.g., to
  update baseline; get all dependencies in place)

- Documentation
- Use Zeekygen to document the full content of a meta package at
  once; can host either on docs.zeek.org or packages.zeek.org.

- Make it easy to autogen docs for individual packages (ideas:
  GitHub Pages through Vlad's cookie-cutter; autogen on
  packages.zeek.org)

- zkg:
- Distinguish standard/recommended packages from others.

- Could we add a way to "prime" zkg's package cache so that a Zeek
  distribution could distribute a snapshot of "zeek-packages" for
  direct use; but zkg would still pull in updates if online access
  is available?
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Moving policy scripts into packages

2020-08-25 Thread Robin Sommer



On Mon, Aug 24, 2020 at 14:15 -0700, Jon Siwek wrote:

> * What's the LTS policy for packages?

Good question. I think would tie it to the Zeek LTS policy, with a
"blessed" version of the meta-package that we recommend (and maintain)
for each currently maintained Zeek version.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Moving policy scripts into packages

2020-08-25 Thread Robin Sommer



On Mon, Aug 24, 2020 at 11:49 -0700, Johanna Amann wrote:

>   This, from my point of view, it would be neat to have a way to still
>   easily install a rather large set of packages (potentially nearly
>   everything that is in policy at the moment) and run test on them.

While I agree that integration testing is useful, too, ideally the new
packages would primarily rely on tests that are standalone. Do you see
a problem with that for, e.g., the SSL functionality?

>   that change a lot of the test baselines - especially when we touch
>   something that affects connection-ID hashing, or the order of elements
>   in hashmaps.

Agree with Jon here: This might be an opportunity to make the tests
less fragile, more like what we'd recommend for external packages
anyways.

>   It would be nice if, afterwards, it would still be possible to install a
>   working set of a script for the running version of Zeek.

Yeah, if we worked with a meta-package, we could "bless" a specific
version of that for a given Zeek release. People could update further,
but with less of a guarantee, though we'd try hard to ensure they work
with different versions, can even CI them against a bunch of recent
releases.

Overall, our current policy/ scripts haven't required version-specific
changes very often, so I'm not too worried here. The most common use
case it probably some script starting to use a newly introduced
feature, and that's pretty easy to catch / guard against. 

> It would be neat to have a place that contains the combined
> documentation of these scripts.

Agree, and I'd extend that to packages in general, you be a job for an
extended packages.zeek.org to provide autogen'ed documentation.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Moving policy scripts into packages

2020-08-24 Thread Robin Sommer



On Mon, Aug 24, 2020 at 11:26 -0500, Michael Dopheide wrote:

> I like (2) for cleanliness.

Vote counted!

> there should be an easy way to distinguish them from other packages
> when doing a 'zkg list'.

Good point.

Also, one additional thought: Jon reminded me that zkg can manage
dependencies already. So the "collection" I mentioned could be a
meta-package that depends on all the ones we want. We might need to
make that a bit more explicit for this use case (like you say for
example in the output of "list"), but the basic functionality is
there.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Moving policy scripts into packages

2020-08-24 Thread Robin Sommer
Looking for some thoughts here. One of the items on the roadmap for
4.0 is moving scripts that currently live in policy/ over into Zeek
packages. The goals here are to (1) facilitate maintaining & testing
them independently of Zeek releases; and (2) come to a more flexible
notion of "default scripts" that can incorporate community-maintained
packages as well. This is tracked by issue
https://github.com/zeek/zeek/issues/414, including a 1st pass over the
existing policy scripts to understand what should/can be moved.
(Thanks, Vlad!)

Before we can begin working on this, we need to figure out how to
organize this new world. One particular question is where the moved
packages will live. I see the following options so far:

1. Move each into a a separate repository on the zeek/ GitHub
   account.

2. Similar, but to avoid cluttering zeek/, create a new GitHub
   organization "zeek-packages".

3. Put them all into a single mono-repository (e.g.,
   zeek/standard-packages), i.e., treat them a one package.

4. Do (1) or (2), and additionally create "zeek-standard-packages"
   that's full of submodules pointing to them (and also to
   community packages).

5. Do (1) or (2), and teach zkg to understand "collections" of
   packages that can be installed/managed as a group, defined
   through some meta data somewhere.

Along with all of this comes a question of how to make it easy for
people to install a set of default packages now that these won't come
with Zeek itself anymore. Some of the schemes above make that easier
than others.

Thoughts/opinions/more ideas?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


[Zeek-Dev] Re: Discussion: the guidance we want to give to package authors on the tags they assign

2020-08-18 Thread Robin Sommer
Sounds good to me, unifying tags makes sense. The one thing I'd add is
a selection of standardized tags for general categorization, along the
lines of the existing: 
https://docs.zeek.org/projects/package-manager/en/stable/package.html#suggested-tags

Probably best to start with extending that section with the guidelines
you propose before approaching package authors with individual PRs.
That way, there'll be something to point them to.

Robin

On Mon, Aug 17, 2020 at 17:14 -, Duffy OCraven wrote:

> I want to start a discussion here of the guidance we want to give to package 
> authors on the tags they assign in zkg.meta, to ensure people have a chance 
> to chime in, and we start-out with the benefit of multi-perspective group 
> process, so we reach for the best result.
> 
> My proposal is just to articulate principles for good tag selection, to rein 
> in the existing scattershot we've seen so far, by giving the authors guidance 
> on what we want to see. I think we need to do this, to counteract that nearly 
> everyone takes their guidance from what they see the people before them have 
> done. If bad habits occurred and are allowed to persist, people will 
> dutifully adopt those bad habits.
> 
> I posit that: the ideal set of tags will provide matches with queries of the 
> form: "Has a plugin for X already been coded?" And also matches with some of 
> the relevant queries for: "What other plugins have been coded for aspect Y?" 
> Find the words by filling in the sentences: "I implemented X." and "I 
> implemented an instance of Y." For Y, use the plural (indicators, scanners, 
> scripts) except when only the singular makes sense.
> 
> Use the hyphen where punctuation is needed. Never use underscore.
> 
> Don't add "analyzer" nor "protocol" nor "plugin" as a suffix.
> 
> Don't mention bro or zeek. These are all Bro/zeek analyzers and plugins.
> 
> The ideal set of tags can also include one that is perhaps unique to this 
> package (but not four or five that are unique to this package). This is as a 
> moniker, so that saying "go look at fizzamajig" should lead, by following the 
> fizzamajig tag, to what you intended the listener to see. 
> 
> Conversely avoid banal tags. If you write a piece of software, nonetheless 
> "a", "piece", "of", and "software" are all bad tags.
> 
> Capital letters should be a rarity, i.e. in DoS because dos to many eyes, 
> immediately connotes a pre-Windows Microsoft operating system. att is fine 
> punctuated that way, and PostgreSQL and all the CVE are reasonable to 
> capitalize. SSL, TLS, TCP, PKI, UPnP, and EternalBlue are stalking-horses, to 
> consider, while we reach consensus, whether we are better off just 
> lowercasing where the capitalization is not essential. If in doubt, just use 
> alllowercase. Tags function quite well in alllowercase, and that is what most 
> people have done. 
> 
> If anyone uses the hyphen-form for a word, then everyone shall use the 
> hyphen-form for consistency. It does often increase readability, and is a 
> small price for the increase of understanding in the portion of our community 
> which it benefits.
> 
> Anyone who disagrees with any of these details, PLEASE do chime in as I only 
> seek that we we reach for the best result, not that we we reach for my idea 
> of what the best result is.
> 
> Anyone who has additional heuristics of goodness to add, also chime in with 
> them. We'll probably, after consensus, enact change by sending some PRs to a 
> few packages to unify them more. I did a sort of census last evening. Of 273 
> tags used, I would banish 51 of them, and revise the punctuation or 
> capitalization of 15 others.
>   - Duffy O'Craven
> ___
> zeek-dev mailing list -- zeek-dev@lists.zeek.org
> To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org


Re: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe

2020-07-17 Thread Robin Sommer


On Wed, Jul 15, 2020 at 14:57 -0700, Bob Murphy wrote:

> use a single, common logging API, but let it send its output to
> different output mechanisms that support different use cases.

I get that in general. It's just that afaik this is the first time
this need comes up. Adding a full-featured, thread-safe logging
framework is a trade-off against complexity and maintainance costs.
Not saying it's impossible, but I'd like to hear more people thinking
this is a good idea before committing to such a route. 

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting

2020-07-17 Thread Robin Sommer


On Thu, Jul 16, 2020 at 17:15 -0700, Bob Murphy wrote:

> Here’s how it would work:

It would be helpful to see a draft API for the full batch writing
functionality to see how the pieces would work together. Could you
mock that up?

That said, couple of thoughts:

> 2. The failure_type value would still indicate generally what
> happened, with predefined values indicating things like “network
> failure”, “protocol error”, “unable to write to disk”, or
> “unspecified failure".

In my experience, such detailed numerical error codes are rarely
useful in practice. Different writers will implement them to different
degrees and associate different semantics with them, and callers will
never quite know what to expect and how to react.

Do you actually need to distinguish the semantics for all these
different cases? Seems an alternative would be having a small set of
possible "impact" values telling the caller what to do. To take a
stab:

- temporary error: failed, but should try again with same log data
- error: failed, and trying same log data again won't help; but ok to 
continue with new log data
- fatal error: Panic, shutdown writer.

Depending on who's going to log failures, we could also just include a
textual error message as well. Logging is where more context seems
most useful I'd say.

> 3. first_index and index_count would specify a range. That way, if
> several successive log records aren’t sent for the same reason, that
> could be represented by a single struct, instead of a different struct
> for each one.

One reason I'm asking about the full API is because I'm not sure where
the ownership of logs resides that fail to write. Is the writer
keeping them? If so, it could handle the retry case internally. If the
writers discards after failure, and the caller needs to send the data
again, I'd wonder if there's a simpler return type here where we just
point to the first failed entry in the batch. The writer would simply
abort on first failure (how likely is it really that the next succeeds
immediately afterwards?)

And just to be clear why I'm making all these comments: I'm worried
about the difficulty of using this API, on both ends. The more complex
we make the things being passed around, the more difficult it gets to
implement the logic correctly and efficiently.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting

2020-07-15 Thread Robin Sommer



On Thu, Jul 09, 2020 at 18:19 -0700, Bob Murphy wrote:

> Proposed Solution: Add a new optional API for writing a batch all at once, 
> while
> still supporting older log writers that don't need to write batches.

That sounds good to me, a PR with the proposed API would be great.

> a. For non-batching log writers, change the "false" status to just mean
>"There was an error writing a log record". The log writing system will then
>report those failures to other Zeek components such as plug-ins, so they 
> can
>monitor a log writer's health, and make more sophisticated decisions about
>whether a log writer can continue running or needs to be shut down.

Not quite sure what this would look like. Right now we just shut down
the thread on error, right? Can you elaborate how "report those
failures to other Zeek components" and "make more sophisticated
decisions" would look like?

Could we just change the boolean result into a tri-state (1) all good;
(2) recoverable error, and (3) fatal error? Here, (2) would mean that
the writer failed with an individual write, but remains prepared to
receive further messages for output. We could the also implicitly
treat a current "false" as (3), so that existing writers wouldn't even
notice the difference (at the source code level at least).

> b. Batching log writers will have a new API anyway, so that will let log
>writers report more detail about write failures, including suggestions 
> about
>possible ways to recover.

Similar question here: how would these "suggestions" look like?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe

2020-07-15 Thread Robin Sommer
Reading through this thread, I'm wondering if we should focus on
improving identification of log lines in terms of where they come from
and when they were generated, while keeping to go through the existing
mechanism of sending messages back to main process for output (so that
we don't need the mutex). If we sent timestamps & thread IDs along
with the Debug() messages, one could later post-process debug.log to,
get things sorted/split as desired.

This wouldn't support the use case of "millions of lines" very well,
but I'm not convinced that's what we should be designing this for. A
mutex becomes potentially problematic at that volume as well, and it
also seems like a rare use case to begin with. In cases where it's
really needed, a local patch to get logs into files directly (as you
have done already) might just do the trick, no?

Robin

On Tue, Jul 14, 2020 at 14:58 -0700, Bob Murphy wrote:

> 
> > On Jul 14, 2020, at 1:14 PM, Jon Siwek  wrote:
> > 
> > On Tue, Jul 14, 2020 at 11:56 AM Bob Murphy  
> > wrote:
> > 
> >> The code you show should give correct ordering on when Foo() and Bar() 
> >> finish.
> > 
> > Wondering what's meant by "correct ordering" here.  Bar() can finish
> > before Foo() and yet debug.log can report "I did Foo" before "I did
> > Bar" for whatever thread-scheduling reasons happened to make that the
> > case.  Or Foo() and Bar() can execute together in complete concurrency
> > and it's just the LockedDebugMsg() picking an arbitrary "winner".
> > 
> > - Jon
> 
> I see your point.
> 
> For example:
> a. Foo() in thread 1 finishes before Bar() in thread 2 finishes
> b. The scheduler deactivates thread 1 for a while between the return from 
> Foo() and the execution of LockedDebugMsg("I did Foo.”)
> c. Thread 2 proceeds from the return from Bar() without interruption
> 
> Then debug.log would contain the message “I did Bar” before “I did Foo”.
> 
> So the ordering in the log file really reflects how the kernel sees the 
> temporal order of mutex locking inside LockedDebugMsg. That’s an inexact 
> approximation of the temporal order of calls to LockedDebugMsg, and that’s an 
> even more inexact approximation of the temporal order of code executed before 
> LockedDebugMsg.
> 
> For what I was doing, though, that proved to be good enough. :-)
> 
> I’d be very interested in ideas about how to improve that, especially if 
> they’re simple. I can think of a way to improve it, but it would be 
> substantially more complicated than just a mutex.
> 
> 
> 
> ___
> Zeek-Dev mailing list
> Zeek-Dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-02 Thread Robin Sommer


On Wed, Jul 01, 2020 at 14:03 -0700, Jon Siwek wrote:

> What if an open() rarely or never happens again for a given log?

Ah, right, forgot about that case. So yeah, agree, the shadow files
are useful for this and to retain whatever information we need.

> * Changed: running through a function of same-name, but it happened to
> get changed between restart is probably still going to be closer to
> what user expects than running it through the default post-processor
> which is completely different ?

I was thinking not the default post-processor, but whatever is
configured for the log file we are just opening (if we did it at
open() time). But yeah, won't work when the cleanup happens already
before the new open.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-07-01 Thread Robin Sommer
On Tue, Jun 30, 2020 at 14:29 -0700, Jon Siwek wrote:

> Maybe the important observation is that the logic can be performed
> anywhere that has access to the Zeek-Supervisor process.

Agree.

> So where we put the logic at this point may not be important.  If we
> can find a single-best-place for the logic to live, that's great

I believe that's what Seth is arguing for: have a Zeek-side script be
the single point of that logic, rather than implement it multiple
times and/or outside of Zeek.

I can see doing that in Zeek but I think there's a trade-off here: if
we want to do the singe-place approach with a multi-system setup, we'd
need an authoritative place to run this logic and hence depend on
*that* Zeek supervisor being up and running for performing the
operation. That may be a reasonably assumption (say if we dedicated
the supervisor running the manager to also be the cluster
coordinator), but it's different from a world where the client can
execute higher-level operations on its own.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Supervisor client (Re: Zeek Super-isor: designing client and log archival behavior)

2020-07-01 Thread Robin Sommer


> * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Client

Some thoughts on the commands:

> $ zeekc status [all | ]

> Do we need to include any other metrics in the returned status?

That information is mostly static, would be nice to get some dynamic
information in there as well, like uptime, CPU/memory/traffic stats,
No need to have that right away, but worth keeping in mind.

> # Do we need more categories to filter by (e.g. node type) ?

I'd skip for now.

> # If there's downed nodes at this point, what do we expect users to do?
> # Check the standard services logs for stderr/stdout info?  Check 
> reporter.log ?

Yeah, would be cool if zeekc had access to the stderr/stdout from the
nodes through their supervisors. The supervisors could buffer that for
a while and return on request. More generally, the supervisor could
get a "diagnostics buffer" that, over time, we could use for more
stuff like store backtraces etc.

"reporter.log" is out I'd say, that will go through the normal log
rotation & archival, and be accessible that way.

> # A `zeekc diag` command could help gather information, like ask Zeek 
> supervisor
> # to find core dumps and extract stack trace.  Would it do more than that, 
> like
> # show last N lines of downed nodes' stderr, or last N lines of reporter.log?

> $ zeekc check

I'm wondering which supervisor that would be be talking to in a
multi-system setup? All?

> $ zeekc terminate
>  ...

> # Normally wouldn't terminate the supervisor if a service-manager is handling
> # the Zeek supervisor process itself and will just restart it, but`terminate`
> # would be helpful for anyone running a supervised Zeek cluster
> "manually".

Another use case: If for some reason one wants to restart the
supervisor itself, "terminate" would kill it and the service
manager would then restart it.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-01 Thread Robin Sommer



On Tue, Jun 30, 2020 at 01:39 -0700, Jon Siwek wrote:

> * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Log-Handling

This overall sounds good to me. Some notes & questions:

> Log Rotation

> To help bridge/replace Step (4) and (5), suggest adding a new option:
> Log::default_rotation_dir. The Log::rotation_format_func() will use
> this as part of its default return value.

Seems we should then set this to "." by default, and have the cluster
framework override it.

> The log_mgr will attempt to create necessary dirs just-in-time,
> failing to do so emits an error, but otherwise continues with rotation
> using working directory instead.

I'd extend this to any error case: if moving from current location to
Log::default_rotation_dir fails (e.g., because the latter is a on
different file system), continue with new name inside the current
working directory (and report the error).

Once moved, I suppose we would continue to optionally run a
post-processor, right? For a supervised cluster, we wouldn't use that
and suggest that people go with "zeek-archive" instead; but with
ZeekControl we'd keep the current behavior of gzipping behavior so
that we don't break any setups.

We can implement that distinction through the post-processer function:
the new default function would just do the rename according to the new
scheme, and a separate legacy function for ZeekControl spawns the
"archive-log" script.

> zeek-archiver

I like making this a standard tool, but seems like something we could
postpone doing right now and prioritize getting the Zeek-side
infrastructure in place.

> We can potentially have the Zeek Supervisor process configurable to
> auto-start and keep a zeek-archiver child alive. 

I'd say that's a job for systemd (or whatever service manager). I know
Seth disagress. :-)

> Leftover Log Rotation

> The rotation for such a leftover log file uses the metadata in the
> shadowfile to help try to go through the exact rotation that it should
> have occurred, including running the postprocessor function.

Not sure it's worth retaining the information about the post-processor
function, and it could to potentially lead to trouble if the function
changed somehow in between (or disppeared). We could instead just run
the leftovers through whatever the restarted config says to do with
files.

Do we even need any other meta data at all in the new scheme? I'm
wondering if we could simplify this all to: "If at open() time, X.log
exists, first rotate it away through the currently configured
postprocessor function". If we did that, we should probably have an
global boolean that allows to choose between that and just overwriting
existing files. The latter would be the default to retain current
command-line behavior, and the cluster framework would enable leftover
recovery.

Hmm, actually, there's a piece of meta that we'll need: the opening
timestamp, so that one can incorporate that into the name of the
rotated file (assuming we want to retain that capability). Unless we
parsed that out of the X.log itself ...

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-30 Thread Robin Sommer



On Tue, Jun 30, 2020 at 09:35 -0400, I wrote:

> I think that the script we ship with zeek that effectively implements the
> supervisor behavior should understand the business logic of shutting down a
> cluster in the correct order.

How would that then work across multiple systems?

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-19 Thread Robin Sommer



On Thu, Jun 18, 2020 at 13:00 -0700, Jon Siwek wrote:

> > For (1), the above applies: we'll rely on standard sysadmin processes
> > for updating. That means you'd use "zeekcl" to shutdown the cluster
> > processes, then run "yum update" (or whatever), then use "zeekcl"
> > again to start things up again.

> I have a slightly different take: isn't it more common to expect
> "start" and "stop" operations here to be done by the service-manager
> rather than Zeek client?

I believe we're pretty close to saying the same thing. I'm making a
distinction between the supervisor Zeek process (which the service
manager starts & stops), and the cluster's node processes (manager,
workers, etc). The supervisor manages the latter and will by default
shut them down when it gets the "stop" from its service-manager. But I
think we also want their state controllable from the client as well,
so that one can have an orderly shutdown of a multi-system cluster
without loss of data (e.g., one probably wants to shutdown workers
first to collect remaining log data). This what I meant above by
"shutdown the cluster processes": "zeek-client stop" would tell the
supervisors to shutdown their node processes (or rather: "zeek-client
stop workers", or maybe "zeek-client" would now the order in which to
stop nodes or systems). And I imagine one would do that before
starting to a cluster-wide upgrade to the next Zeek version.

That said, your note on Slack sounds right: let's figure out the
single-system operation first and get that usable. I'm pretty
confident that we will then be able to build the multi-system model on
top of that without too much trouble, and it'll we easier to collect
requirements for administration/management of multi-system setups once
we got some experience with single-system setups.

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Robin Sommer



On Thu, Jun 18, 2020 at 03:32 +, Vlad Grigorescu wrote:

> As a concrete example, what does a cluster upgrade look like?

The idea is to handle this more like other system services: you'll be
in charge of getting the new Zeek version onto all your systems
yourself, using whatever method you use for other software as well.
For example, if you're installing through a package manager, you'd
just run "update" on all systems. If you're installing from source,
you'll either need to compile on each system, or copy the installation
over manually.

The underlying assumption is that people will already have a mechanism
in place for administration of their systems, and we shouldn't be
trying to reinvent the wheel, as ZeekControl oddly does. From a
sysadmin perspective, ZeekControl is really doing a lot more right now
that it should be doing; other tools don't work that way. We don't
want it look like an APT anymore (https://github.com/zeek/zeek/issues/259). :-)

> Today, that means install the new version on the manager, and then do
> `zeekctl deploy`, which copies the files to the nodes and restarts the
> cluster. All of that is done without Broker.

There are two parts here: (1) deploying the Zeek installation itself,
and (2) deploying any configuration changes (incl. new Zeek scripts).

For (1), the above applies: we'll rely on standard sysadmin processes
for updating. That means you'd use "zeekcl" to shutdown the cluster
processes, then run "yum update" (or whatever), then use "zeekcl"
again to start things up again. (The Zeek supervisor will be running
already at that point, managaged through systemd or whatever you're
using).
 
(2) is still a bit up in the air. With 3.2, there won't be any support
for distributing configurations automatically, but we could add that
so that config files/scripts/packages do get copied around over
Broker. Feedback would be appreciated here: What's better, having
zeekcl manage that, or leave it to standard sysadmin process as well?

> Reading the script linked in [2], I notice that zeekcl would not support
> copying files from one node to another?

Correct right now, (2) may or may not change that.

> zeekctl print

"print" will be supported (roadmap says not in 3.2 yet, but it should
be easy to do, maybe we can get it in still).

> zeekctl exec.

"exec" will likely not be supported. We *could* support it, no
technical reason for not doing that over Broker. It just s seems like
another things that's better handled with different tools.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Robin Sommer
> Suggestion: `zeekcl`, Zeek (Command-Line) CLlient.

"zeekcl" is very close to "zeekctl", which could lead to confusion.
"zcl" maybe?

> Is use of Python still desirable for other reasons?  Otherwise, I lean
> towards `zeekcl` being C++.

No particular preference from my side, I can see either. Effort is
probably about the same in this model, and C++ does have the advantage
of less dependency issues.

> Zeek's scripting language (e.g. `ctl.zeek`), but I don't suggest that

Ack, agree.

> I plan to have `zeekcl` code/tests live inside the main Zeek repo.

Makes sense to me as well.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Next supervisor steps

2020-03-26 Thread Robin Sommer


Zeek 3.1 introduced a first, experimental version of the new
supervisor framework that we expect to eventually replace ZeekControl
as the primary mechanism to run Zeek clusters, both single- and
multi-system. See the earlier blog posting for more context:
https://blog.zeek.org/2019/03/beyond-brocontrol-new-process.html.
The Zeek docs summarize what's now available in 3.1:
https://docs.zeek.org/en/current/frameworks/supervisor.html. (Thanks,
to Jon for all that great work!)

I'd like to start planing out the next steps here, and hope that we
can start deprecating (*not* remove) ZeekControl with Zeek 4.0 late
this year. To that end, I drafted a supervisor roadmap for the next
couple of Zeek versions, with a proposal on which parts of
ZeekControl's functionality to port over into the new supervisor
world, and when:


https://docs.google.com/spreadsheets/d/1oNYqHMU1b6FEfST8F86bdpv9Bpl6M58qq4AZULCIYlU/edit?usp=sharing

I'd be interested in thoughts on that, both in terms of timeline and
generally any questions or concerns you might have. Best places for
feedback are this mailing list and the "Supervisor Project" ticket
(https://github.com/zeek/zeek/issues/582).

Robin

(I'll keep this on zeek-dev for now; will post more broadly as well
once the timeline stabilizes).

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Wrapping up 3.1

2020-02-04 Thread Robin Sommer
We're planing to wrap up Zeek 3.1 this week with code freeze on Friday
and then a beta version out early next week. The 3.1.0 GitHub project
shows the current state, please make sure we have tickets on there for
everything that's critical to still go in.

Please all take a look at master's NEWS to see if it mentions
everything we should point us. I'm also planing to take that as the
starting point for a blog posting announcing 3.1.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Do we still need pysubnettree?

2019-10-15 Thread Robin Sommer
On Mon, Oct 14, 2019 at 15:24 +, Vlad Grigorescu wrote:

> Does it still make sense to maintain pysubnettree?

No strong opinion either way from my side.

It looks like pytricia does indeed offer a very similar interface, and
being able to stop maintaining a custom dependency would certainly be
a plus. On the other hand, this might be a case of "if it's not
broken, don't fix it". pysubnettree hasn't required a lot of work
recently, and users would need to install a new dependency if we
switched.

I don't know what constraints LGPL imposes when applied to Python
modules.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek 3.0.0+ "master" versioning process change

2019-07-25 Thread Robin Sommer



On Thu, Jul 25, 2019 at 09:35 -0700, Jonathan Siwek wrote:

> My main reason for preferring alpha/beta is "it's less different than
> before", otherwise don't have much argument against dev/rc.

Let's just do dev/rc then, seems that's what more people prefer. And
then we'll go ahead with your scheme for 3.0.0, that should work well.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek 3.0.0+ "master" versioning process change

2019-07-25 Thread Robin Sommer
Using "3.1.0-X" would also feel semantically a bit confusing I think
as we'd be changing the meaning of a scheme we're already using.

I like the idea of using "dev.X" and "rcX". I was originally feeling
similar about "alpha" but the sorting is a nice property to have.
Swtiching from "beta" to "rc" would address that.

In the end, either scheme works for me.

Robin

On Wed, Jul 24, 2019 at 20:36 -0700, Jonathan Siwek wrote:

> On Wed, Jul 24, 2019 at 6:02 PM Johanna Amann  wrote:
> >
> > Actually, thinking about it some more - could we just not have the
> > -alpha (or -dev) label, and go back to how it was before - with a
> > changed meaning?
> >
> > so - just 3.1.0-[commit-number] for the development builds.
> 
> Our versioning script uses the last-reachable tag in "master".  At the
> time we start the 3.1.0 development cycle, we don't have that 3.1.0
> tag, and also that tag won't ever be made along the "master" branch,
> it will be made sometime later within the "release/3.1" branch.
> 
> > > I generally like this - the only thing that I am not sure about is the
> > > alpha label.
> > >
> > > I get that it works great with alphabetic ordering - but for me alpha
> > > tends to signify some kind of test release.
> 
> What's meant by "test release" here ?
> 
> Could essentially consider any given commit in "master" to be a "test
> release" -- and if we decide to be more formal/vocal about providing
> builds of "master" (e.g. the OBS nightlies), then "alpha" may describe
> exactly what you think it signifies ?
> 
> - Jon
> ___
> zeek-dev mailing list
> zeek-dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev




-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Compiling with --coverage flag

2019-06-11 Thread Robin Sommer
Hi Jim,

interesting, could you send some numbers on the kind of improvements
you saw, and on what traffic?

Robin

On Mon, Jun 10, 2019 at 11:30 -0700, Jim Mellander wrote:

> Greetings:
> 
> I've been tinkering with the --coverage flag to capture runtime statistics
> which can then be used to compile zeek with branch prediction hints.  My
> preliminary tests indicate a substantial performance increase, enough to
> justify engaging the zeek community.
> 
> I noticed that the configure script includes --enable-coverage, which
> doesn't quite do what I want, as it compiles with debug support. and I'm
> most interested in optimization for production use.
> 
> In brief, I've been testing:
> 
> ./configure --enable-coverage
> 
> for the initial compile, then run against pcaps and live traffic, and use
> that profiling data to recompile:
> 
> CFLAGS=’-fprofile-use -fprofile-correction -flto’ CXXFLAGS=’-fprofile-use
> -fprofile-correction -flto’ ./configure
> 
> with a substantial performance boost against a regular compile (can
> additionally do --build-type=Release for compiling with -O3 flag).
> 
> Has anyone else tinkered with this? - I would be happy to elaborate, and
> discuss with others.
> 
> Jim

> ___
> zeek-dev mailing list
> zeek-dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev



-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Help wanted: Remaining renaming tasks

2019-03-25 Thread Robin Sommer
We have about 10 tasks left for the renaming from Bro to Zeek before
the next release. Any help addressing those is appreciated, see this
board: https://github.com/zeek/zeek/projects/2

We're hoping to get these in place within the next 4 weeks. If you can
work on any of these, please assign the ticket to yourself. It's best
to start with a short proposal on what you plan to do. You can also
use the ticket discussion for any further clarification you might
need.

Thanks!

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Writing a Protocol Analyzer Plugin

2019-03-13 Thread Robin Sommer
See if this helps:
https://github.com/zeek/zeek/blob/master/testing/btest/plugins/protocol.bro

That may be the most compact tutorial on writing a protocol analyzer
plugin. :)

Robin

On Wed, Mar 13, 2019 at 09:16 -0600, anthony kasza wrote:

> Hello Zeek Devs,
> 
> I would like to write a protocol analyzer and need some direction. I would
> like to write something simple which works on TCP, similar to the ConnSize
> analyzer. I would like my analyzer to be distributed as a plugin, similar
> to MITRE's HTTP2 analyzer, so I am following the docs here:
> https://docs.zeek.org/en/stable/devel/plugins.html
> 
> However, the docs don't detail much beyond creating a built in function. A
> colleague pointed me at this quickstart script for binpac:
> https://github.com/grigorescu/binpac_quickstart
> 
> The quickstart script seems to be intended for writing a protocol analyzer
> which gets merged into the Zeek source. This is not how plugins operate.
> 
> I'm looking for some guidance on how to proceed. Thanks in advance.
> 
> -AK

> ___
> zeek-dev mailing list
> zeek-dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev



-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-28 Thread Robin Sommer
On Thu, Feb 28, 2019 at 11:35 +0100, Jan Grashöfer wrote:

> The question here would be whether LL-analyzers have to be linked
> dynamically.

Well, the point of the plugin API is being able to add new
functionality externally through an independently compiled shared
library. Excluding link-layer analyzers from that would feel like a
gap to me. That said, we definitely need to benchmark performance to
make sure it's feasible. My hunch is that a lookup table should be
good enough, but we'll see.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-27 Thread Robin Sommer



On Wed, Feb 27, 2019 at 16:07 +0100, Jan Grashöfer wrote:

> At first glance it looks like IP-layer multiplexing is done in
> NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled
> in Manager::BuildInitialAnalyzerTree in context of initializing a
> connection.

Well, there, too. :) That's indeed doing the packet dispatching, while
DoNextPacket() sets up state mgmt. It's all not quite clear cut, which
is part of the problem.

> That is the central point. So a first step would be to rely on TCP/IP in the
> "middle" of the stack but allow pluggable Link-layer protocols. Those might
> feed their data to the TCP/IP pipeline or handle them on their own. The next
> step would be the IP-layer.

Yeah, that sounds good to me.

> One question here would be whether it makes sense to assume that the set of
> LL-analyzers tash should be available is known at compile-time?

The built-in ones can be known, but any added through dynamic plugins
can't really. We'll know only at runtime what the final set is. But we
could precompute a lookup table in advance at startup that maps link
types to analyzers.

> I think this would be part of the larger effort to re-think Zeek's notion of
> connections. This could be addressed together with implementing a flexible
> mechanism to make meta data like LL-addresses available in context of a
> connection.

Yep.

> In case we allow to plug in new transport protocols, they might need
> their own PIA to support the analysis of known protocols like HTTP
> etc.

Yeah, or a more generic PIA that provides its own hook for plugins.
The main difference between TCP/UDP PIAs is packet vs stream
semantics, iirc. That might generalize sufficiently, but not sure.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-19 Thread Robin Sommer



On Mon, Feb 18, 2019 at 16:32 -0800, Vern Paxson wrote:

>   event foo%(a: string%) 
>   event foo%(a: string, b: string%);

I like this. It addresses my main concern of making it clear what's
happening. If I see an implementation of the event and look up the
prototype, I'll find both and can understand what's going on.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-05 Thread Robin Sommer



On Fri, Feb 01, 2019 at 11:17 -0600, Jonathan Siwek wrote:

> > > > global my_event: event(a: count, b: string);
> > > > event my_event(b: string)
> > > > { print "my_event", b; }

> Did it look like an error in the sense of the user making a mistake or
> in the sense of traditional way functions in other languages like
> C/C++ require matching signatures?

It's probably the latter, but I'm not sure that helps: when I see the
above code, I wonder: "Heck, why is that working ... Or, wait, maybe
it isn't?". It makes it confusing to me to read the code.

If at least the prototype told me somehow "it's ok if parameters are
left off", then I'd have a clue that everything is fine.

The following would be even worst in terms of confusion:

global my_event: event(a: string, b: string);
event my_event(b: string)

Now I need to know if the language goes by order of parameters or by
parameter name.

I do see the appeal of making things just work when event handlers
change, but is there really no different way to support that?

(I don't have an opinion on the event overloading discussion yet, need
to digest that more)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Robin Sommer



On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote:

> > global my_event: event(a: count, b: string);
> > event my_event(b: string)
> > { print "my_event", b; }

> Is there a compelling use-case that's motivating this change?

I'm sure the main use case is changing an existing event's parameters
without breaking existing scripts -- someting we've been increasingly
running into as a major challenge.

It's a nice a idea to relax parameter passing to work by name, and
allow subsets. However, I can't quite get myself to really like it in
this form, because it *looks* like an error to not have matching
argument lists. Is there some syntax that would make it more clear 
what's going on?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] docs.zeek.org

2019-01-11 Thread Robin Sommer



On Fri, Jan 11, 2019 at 10:20 -0600, Jonathan Siwek wrote:

> Yes, those should already be set up, but let me know if I missed
> anything (note that the "release" version of the manual still lives on
> zeek.org as there's not yet any release that can be build on RTD).

Would it be worth aiming to do that update with the next 2.6.x patch
release? Would be nice to get the modern look for the release version,
too.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Bro-Dev] Building bro 2.6 with static broker/caf libraries

2018-12-06 Thread Robin Sommer



On Wed, Dec 05, 2018 at 19:03 -0800, Craig Leres wrote:

> (I'm working on updating the FreeBSD port to 2.6 and can't install 
> things like libcaf_io.so in /usr/local/lib because they conflict with 
> libraries potentially installed by the devel/caf port.)

What's the version of the CAF port? If it's recent, Bro should be able
to link against that.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Consistent error handling for scripting mistakes

2018-11-12 Thread Robin Sommer



On Mon, Nov 12, 2018 at 12:27 -0600, Jonathan Siwek wrote:

> I recently noticed there's a range of behaviors in how various
> scripting mistakes are treated.

There's a 4th: InterpreterException.

> 1st question: should these be made more consistent? I'd say yes.

Yes, definitely.

> that it's only the *current function body* (yes, *function*, not
> event) that exits early -- hard to reason about what sort of arbitrary
> code was depending on that function to be fully evaluated and what
> other sort of inconsistent state is caused by exiting early.

... and what happens if the function is supposed to return a value,
but now doesn't?

> I propose, for 2.7, to aim for consistent error handling for scripting
> mistakes and that the expected behavior is to unwind all the way to
> exiting the current event handler (all its function bodies).

Agree with that. Can we do that cleanly though? The problem with
InterpreterException is that it may leak memory. We'd need to do the
unwinding manually throughout the interpreter code, but that means
touching a number of places to pass the error information back.

> One exception may be within bro_init(), if an error happens at that
> time, I'd say it's fine to completely abort -- it's unlikely or hard
> to say whether Bro would operate well if it proceeded after an error
> that early in initialization.

Yeah, that makes sense.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] attributes & named types

2018-11-05 Thread Robin Sommer



On Sat, Nov 03, 2018 at 21:58 +, Vlad Grigorescu wrote:

> In my mind, if the keyword is applied to a record, I would expect any new
> fields added to that record to also be logged.

I believe the reason for not doing that is that then one couldn't add
a field that's *not* being logged (because currently we don't have
remove-an-attribute support).

I like the "=T|F" syntax to control this more directly, as long as
"" remains being equivalent to "=T".

Generally we need to be very careful changing if we want to change any
current semantics here, as it will impact custom log files that people
create in their own scripts.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Config Framework Feedback

2018-11-01 Thread Robin Sommer
The oberservations / thoughts in this thread seem worth a ticket I'd
say. We can refine this over time if the current semantics aren't
quite ideal yet.

Robin

On Tue, Oct 30, 2018 at 13:17 -0700, Christian Kreibich wrote:

> Hi folks,
> 
> I would agree that it takes a bit of experimentation to figure out 
> exactly when a change handler fires and how to reliably initialize or 
> update things based on an option's value.
> 
> Consider this:
> 
>module Foo;
> 
>export { option foo = F; }
> 
>function foo_handler(ID: string, foo_new: bool): bool
>{
>print fmt("New foo: %s", foo_new);
> 
># Update stuff here based on foo's value
># ...
> 
>return foo_new;
>}
> 
>event bro_init() {
>Option::set_change_handler("Foo::foo", foo_handler);
>}
> 
> ... foo_handler doesn't get called when you simply run the script 
> without redefing Config::config_files. When you do redef it, the handler 
> fires both when the config file sets foo to T, and when it sets it to F.
> 
> So you have to make sure that your initialization happens even when the 
> handler doesn't get called, and you cannot write your handler assuming 
> that the new value is actually different from the old one.
> 
> These arguably aren't bugs, but imo they do take getting used to.
> 
> Best,
> -C.
> _______
> bro-dev mailing list
> bro-dev@bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] consistency checking for attributes

2018-10-31 Thread Robin Sommer
On Mon, Oct 29, 2018 at 11:49 -0700, Vern Paxson wrote:

> I'm planning to add basic consistency checking, which will look for
> (1) attributes that are repeated (which doesn't appear to be meaningful for
> any of them) and (2) attributes that don't make sense in a given context,
> like the ones listed above.

Sounds good to me.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-14 Thread Robin Sommer



On Fri, Sep 14, 2018 at 13:45 -0500, Jonathan Siwek wrote:

> Anything else to worry about?

Are Jenkins and Coverity already pulling from GitHub?

I don't know if there's anything we can do on the old server to make
existing clones deal with the relocation more gracefully. I don't
wthink there's a way just redirect a git client, but maybe we could get
some error message into the output or something? Not sure.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-14 Thread Robin Sommer



On Fri, Sep 14, 2018 at 11:36 -0500, Jonathan Siwek wrote:

> I did some label tweaking and reduced some prefix names: "Component"
> -> "Area" and "Difficulty" -> "Pain".

Ok, thanks, makes sense.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-14 Thread Robin Sommer
On Thu, Sep 13, 2018 at 19:39 -0500, Jon Siwek wrote:

> A preview of what migrated issues will look like along with new labeling 
> scheme:

Looks great, nice job. The only thing I noticed is that the labels are
quite long, making the list of tickets appear somewhat crowded. Could
we skip the prefixes ("Type:", "Component:") and instead use colors to
encode them? So, say, all types would be green, all components yellow
(which they already are), etc.

> Remaining tasks:

We are leaving switching to github as authoritative source for the
repositories to later, right? Doing it all at the same time could
avoid confusion ("everything is on github now" is an easier
statement), but would also make the process more complex. Maybe the
real question here is if we want to switch repositories before or
after 2.6?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-28 Thread Robin Sommer


On Tue, Aug 28, 2018 at 17:12 +0200, Dominik Charousset wrote:

> 1) Matthias threw in memory-mapping, but I’m not so sure if this is
> actually feasible for you.

Yeah, our normal use case is different, memory-mapping won't help much
with that.

> 2) CAF already does batching. Ideally, Broker should not need to do
> any additional batching on top of that.

Yep, but (3) was the problem with that:

> Do you still remember what showed up during your investigation that
> triggered you to go with the blob?

Looking back through emails, at some point Jon replaced CAF
serialization with these blobs and got substantially better
performance. He also had a patch that reproduced the effect with the
benchmark tool you wrote. I'm pasting that in below, I'm assuming it
still applies. Looks like the conclusion at that time was that it is
indeed an issue with the serialization and/or copying the data.

> An in-depth performance analysis of Broker’s streaming layer is on my
> todo list for months at this point. I hope I get something done before
> the Bro Workshop in Europe.

That would be great. :)

Robin

```
diff --git a/tests/benchmark/broker-stream-benchmark.cc
b/tests/benchmark/broker-stream-benchmark.cc
index 821ac39..26b0778 100644
--- a/tests/benchmark/broker-stream-benchmark.cc
+++ b/tests/benchmark/broker-stream-benchmark.cc
@@ -1,6 +1,7 @@
 #include 

 #include 
+#include 

 using std::cout;
 using std::cerr;
@@ -55,8 +56,11 @@ void publish_mode(broker::endpoint& ep, const std::string&
topic_str) {
   // nop
 },
 [=](caf::unit_t&, downstream>& out, size_t num) {
-  for (size_t i = 0; i < num; ++i)
-out.push(std::make_pair(topic_str, "Lorem ipsum dolor sit amet."));
+  for (size_t i = 0; i < num; ++i) {
+auto ev = broker::bro::Event(std::string("event_1"),
+ std::vector{42, "test"});
+out.push(std::make_pair(topic_str, std::move(ev)));
+  }
   global_count += num;
 },
 [=](const caf::unit_t&) {
```

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-27 Thread Robin Sommer



On Sat, Aug 25, 2018 at 17:42 +0200, Matthias Vallentin wrote:

> Okay. In the future, we probably need some form of
> "serialization-free" batching mechanism to ship data more efficiently.

Do you guys have a sense of how load splits up between serialization
and batching/communication? My hope has been that batching itself can
take care of the performance issues, so that we'll be able to send
logs as standard CAF messages, each one representing a batch of N log
lines. The benchmark I had created a little while ago to examine that
wasn't able to get the necessary performance out of Broker/CAF to do
that (hence the fall-back to Bro's old serialization of log messages
for now, sent over CAF). But iirc, the conclusion was that there's
still room for improvement in CAF that should make this feasible
eventually. However, if you guys believe it's really CAF's
serialization that's the bottle-neck, then we'll need to come up with
something else indeed.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-24 Thread Robin Sommer



On Fri, Aug 24, 2018 at 16:32 +0200, Matthias Vallentin wrote:

> It sounds like this is critical also for regular operation:

Agree. Right now a newly connecting peer gets a round of explicit
LogCreates, but that's probably not the best way forward for larger
topologies.

> is it currently impossible to parse Bro logs with Broker, because all
> logs come in the LogWrite message, wich is a binary blob?

Correct. (This was different at first, but the switch was necessary
for performance. It's waiting for a better solution at this point.)

> In other words, can Broker currently be used if one writes a Bro
> script that publishes plain events (message type 1 in bro.hh)?

Yes to that. Non-Bros can exchange events (assuming they know the
schema), but not logs.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-23 Thread Robin Sommer



On Thu, Aug 23, 2018 at 10:01 -0500, Jonathan Siwek wrote:

> Yeah, that's one problem, but a bigger issue is you can't parse
> LogWrite because the content is a serial blob whose format is another
> thing not intended for public consumption.

I guess my earlier comment might have been misleading: there's
certaily work that needs to be done to open this up. Right now, it's
probably not even realistic at all because we still have a work around
in place in there that uses the old (non-Broker) serialization code
for creating that blob. That was to get around a performance issue,
and still needs to be addressed. As part of upgrading that, I think it
can make sense to think about documenting the format we end up
chosing.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-23 Thread Robin Sommer



On Thu, Aug 23, 2018 at 15:32 +0200, Dominik Charousset wrote:

> Does that mean I need to receive the LogCreate even first to
> understand successive LogWrite events?

I don't really see a way around that without substantially increasing
volume. We could send LogCreate updates regularly, so that it's easier
to synchronize with an ongoing stream.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-22 Thread Robin Sommer



On Tue, Aug 21, 2018 at 14:05 -0500, Jonathan Siwek wrote:

> Though the Broker data corresponding to log entry content is also
> opaque at the moment (I recall that was maybe for performance or
> message volume optimization),

Yeah, but generally this is something I could see opening up. The log
structure is pretty straight-forward and self-describing, it'd be
mostly a matter of clean up and documentation to make that directly
accessible to external consumers I think. Events, on the other hands,
are semantically tied very closely to the scripts generating them, and
also much more diverse so that self-description doesn't really seem
feasible/useful. Republishing a relevant subset certainly sounds
better for that; or, if it's really a bulk feed that's desired, some
out-of-band mechanism to convey the schema information somehow.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data layouts

2018-08-21 Thread Robin Sommer



On Tue, Aug 21, 2018 at 12:34 -0500, Jonathan Siwek wrote:

> Maybe there's a more standardized approach that could be worked
> towards, but likely we just need more experience in understanding and
> defining common use-cases for external Bro data consumption.

Dominik, wasn't the original idea for VAST to provide an event
description language that would create the link between the values
coming over the wire and their interpretation? Such a specification
could be auto-generated from Bro's knowledge about the events it
generates.

Also, this question is about events, not logs, right? Logs have a
different wire format and they actually come with meta data describing
their columns.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [Administrativa] Mailing list archives

2018-08-15 Thread Robin Sommer
Quick reminder: Please keep in mind that mails to the Bro mailing
lists are archived publically. We had a couple of cases recently where
internal information went to a list, ending up in the archive, where
it's difficult to remove from. Thanks,

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Robin Sommer



On Tue, Aug 14, 2018 at 10:51 -0500, Jonathan Siwek wrote:

> Not sure, is Broker::auto_publish() currently able to do the same thing?

Hmm .. Good point. Scope is different between the two (event vs topic)
but the effect is similar in the end.

> I can also see the opposite being intuitive: If I told
> Broker::subscribe() to raise locally, then I get just always use
> Broker::publish() and not think about the difference between using
> "event" versus "publish".  Would Broker::auto_publish() be removable
> then?

I would like to say "yes" (because I like the subscribe() approach
better than auto_publish() :-), but would that work well with our
cluster topics? If we didn't have the event-specific auto_publish(),
we would have to turn on local raise for *all* events going to, e.g.,
bro/cluster/worker. And thinking about it, maybe that's in fact also
an argument against my original thinking how this could help unify
scripts --- well, unless we'd go with Jan's thought of subtopics
(e.g., subscribe("bro/cluster/worker/intel", local_raise=T).

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Robin Sommer



On Mon, Aug 13, 2018 at 13:55 -0500, Jonathan Siwek wrote:

> associating node IDs with subscription state and also message state
> (push node IDs into messages upon receipt before forwarding),

Yeah, that sounds like the right direction. Some reading might be
worthwile doing here, there are quite a few papers out there on
routing in overlay networks.

> (1) Remove relay(...) functions
> (2) Reduce unique topic names (use pre-existing cluster topics where possible)
> (3) Add Broker::forward(topic_prefix) function + enable Broker forwarding

Yes, that sounds good to me, plus whatever that means for "publish()"
itself. I like what we have arrived at here.

One more question: what about raising published events locally as well
if the sending node is subscribed to the topic? I'm kind of torn on
that. I don't think we want that as a default, but perhaps as an
option, either with the publish() call or, likely better, with the
subscribe() call? I can see that being helpful in cases like unifying
standalone vs cluster operation; and more generally, for running
multiple node types inside the same Bro instance.

> An alternative to (3) would be implementing "real" routing in Broker
> right from the start.

In an ideal world, yes, that would certainly be nice to have. But it's
a larger task that I don't think we would be able to finish for 2.6
anymore. So, I'd put that on the list for later.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Robin Sommer



On Fri, Aug 10, 2018 at 10:24 -0500, Jonathan Siwek wrote:

> Or is it a matter of "if a user needed it for something, then it's
> available" ?

Yeah, including matching expectations: if there's a
"bro/cluster/worker" topic, I'd expect I can publish there to reach
all the workers (from anywhere). However, I think I'm with you now
that maybe we just shouldn't do do/support any forwarding in the
cluster right now. Pools and manual relaying are a (currently better)
alternative, and we can change things later. And at least it's a clear
message: no forwarding across cluster nodes.

> However, I can see Broker::forward() could make it a bit easier for a
> user wanting to manually set up a forwarding route between clusters or
> other external applications.  Is that a clear use-case we need to
> cater to now?

Well, if it were easy to add the forward() function, that could indeed
be quite useful for external integrations still. With that, one could
selectively forward custom topics (at one's own risk), without causing
a mess for the cluster. I'm thinking osquery integration for example,
where messages might go through an intermediary Bro. One advantage
that Broker-internal forwarding has compared to manual relaying is
that messages won't be propagated back to the sender.

But it's a matter of effort at this point I'd say.

> RR via proxy is not just load-balancing either, but fault-tolerance as
> well.

Yeah, that's right.

> But here you're talking more about removing the relay() functions and
> doing the RR-via-proxy "manually", right?  That seems ok to me -- once
> "real" routing is available, you then have the option to simplify your
> script and get a minor optimization by not having to manually
> handle+forward the event on proxies.

Ok, let's make that change then, I think removing relay() will help
for sure making the API easier.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Robin Sommer



On Fri, Aug 10, 2018 at 15:22 +0200, Jan Grashöfer wrote:

> different purposes. If that is still a design goal, it feels like the
> structure of a cluster could be more volatile than it used to be.

It is, and we have some of that, and I think it fits in with the
discussion here too. In my mind, I see two separate things in this
discussion: one is a general Broker API that facilitates some very
different applications; and the 2nd is our cluster framework that uses
that API for a specific use-case. The latter is much easier to tune
for us in terms of how it uses Broker, as we can hide much of it
internally and adjust later, i.e., by adding a new node type. The
question for the cluster framework, then, is what API *it* provides
for scripts to share state in a cluster. And a part of the answer to
that could be "standardized topics" that are guaranteed to get the
information to where it needs to go.

> Maybe a silly question: Would that work using further "specialized" topics
> like bro/cluster/worker/intel? From my understanding one feature of topics
> is that one would be able to subscribe only the the things that one is
> interested in. Having a bunch of events just published to bro/cluster/worker
> seems counterproductive.

I hear you, but I think I haven't quite understood the concern yet.
Can you give me an example where the difference matters? What's
different between publishing intel events to bro/cluster/worker/intel
vs bro/cluster/worker if both go to all workers? Or is it so that some
workers can decide not to receive the intel events?

(And technically, subscriptions are prefixed based, so anybody
subscribing to bro/cluster/worker automatically gets
bro/cluster/worker/intel as well; not sure if that helps or hurts
here?)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-09 Thread Robin Sommer
Yeah, and let me add one thing: What if as a starting point for
modeling things, we assumed that we have global topic-based routing
available. Meaning if node A publishes to topic X, the message will
show up at all nodes that are subscribed to topic X anywhere, no
matter what the topology --- Broker will somehow take care of that. I
believe that's where we want to get eventually, through whatever
mechanism; it's not trivial, but also not rocket science.

Then we (A) design the API from that perspective and adapt our
standard scripts accoordingly, and (B) see how we can get an
approximation of that assumption for today's Broker and our simple
clusters, by having the cluster framework hardcode what need.

> (1) enable the "explicit/manual" forwarding by default?

Coming from that assumption above, I'd say yes here, doing it like you
suggest: differentiate between forwarding and locally raising an event
by topic. Maybe instead of adding it to Broker::subscribe() as a
boolean, we add a separate "Broker::forward(topic_prefix)" function,
and use that to essentially hardcode forwarding on each node just like
we want/need for the cluster. Behind the scenes Broker could still
just store the information as a boolean, but API-wise it means we can
later (once we have real routing) just rip out the forward() calls and
let Magic take its role. :)

As you say, we don't get load-balancing that way (today), but we still
have pools for distributing analyses (like the known-* scripts do).
And if distributing message load (like the Intel scripts do) is
necessary, I think pools can solve that as well: we could use a RR
proxy pool and funnel it through script-land there: send to one proxy
and have an event handler there that triggers a new event to publish
it back out to the workers. For proxies, that kind of additional load
should be fine (if load-balancing is even necessary at all; just going
through a single forwarding node might just as well be fine.

> (2) re-implement any existing subscription cycles?

Now, here I'm starting to change my mind a bit. Maybe in the end, in
large topologies, it would be futile to insist on not having cycles
after all. The assumption above doesn't care about it, putting Broker
in charge of figuring it out. So with that, if we can set up
forwarding through (1) in a way that cycles in subscriptions don't
matter, it may be fine to just leave them in. But I guess in the end
it doesn't matter, removing them can only make things better/easier.

> Also maybe begs the question for later regarding the "real" routing
> mechanism: I suppose that would need to be smart enough to do
> automatic load-balancing in the case of there being more than one
> route to a subscriber.

Yeah, I'm becoming more and more convinced that in the end we won't
get around adding a "real" routing layer that takes of such things
under the hood.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer



On Wed, Aug 08, 2018 at 12:36 -0500, Jonathan Siwek wrote:

> * publish() API simplifications/compressions (pending decision on
> exactly what those should be)

Yeah, with an eye on the semantics for forwarding (now and later),
and whether to raise published events locally as well if the host is
subscribed itself.

And maybe the 2nd eye on: can define these semantics so that we can
get rid of some of the "what node type am I?" checks? I'm not sure how
that would look like, but generally it would be nice if one could just
publish stuff liberally without worrying too much and the
subscriptions and forwarding semantics do the right thing (not always,
but often)).

> * enable message forwarding by default (meaning re-implement the one
> or two subscription patterns that might create a cycle)

Haven't quite made up my mind on this one. In principlel yes, but
right now a host needs to be subscribed to a topic to forward it if I
remember than right. That may limit how we use topics, not sure (e.g.,
if a worker wanted to talk to other workers, with "real"
forwarding/routing they'd just publish to the worker topic and that
message would get routed there, but not be processed at the
intermediary hops as well. With our current forwarding, the hops would
need to subscribe to the worker topic as well and hence the event got
raised there, too.)

> * see if any script-specific topics can instead use a pre-existing
> "cluster" topic

Yep.

> difficult due to having to hunt down things in various scripts and
> whether a more centralized config could be something to do?

Yeah, that sounds useful for the cluster case: it could be part of the
cluster framework to define all the relevant node types with their
characeristics. That would also make later changes easier &
centralized to how topics and connections are set up.

For other use cases, it should still be possible to configure things
independently, too, though (say, for talking to external Broker
applications).

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer



On Wed, Aug 08, 2018 at 14:20 +, Justin Azoff wrote:

> There's also a bunch of places that I think were written standalone first and 
> then updated to work on a cluster in
> place resulting in some awkwardness..

Yeah, indeed, that's another other source of complexity with these
scripts.

> But if this was written in a more 'cluster by default' way, it would just 
> look like:

Nice example. That's the kind of thing I hope we can do during the
next cycle: streamline the scripts to unify these kinds of logic.

> Broker::publish could possibly be optimized for standalone to raise the event 
> directly if not being ran in a cluster.

Or we generally raise published events locally as well if the node is
subscribed to the destination topic. There are pros and cons for that
I think.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer
Yeah, I realize that. A direct port of the old logic was of course the
goal so far, with the drawbacks of that approach accepted &
understood. That's what's in place now; that's great and exactly as
planned. We can get 2.6 out this way, and it'll be fine.

My point is that now also seems like a good time to take stock of what
we got that way. That direct porting is finally getting us some sense
of where things aren't an ideal match between API and use cases yet.
And if there's something easy we can do about that before people start
relying on the new API, it seems that would be beneficial to do. But
we can see.

Robin

On Tue, Aug 07, 2018 at 13:39 -0500, Jonathan Siwek wrote:

> How much is due to new API usage and how much is due to things mainly
> being a direct port of old communication patterns (which I guess are
> written by various people over extended lengths of time and so there's
> inconsistencies to be expected) ?  Or due to being a mishmash of both
> old and new?


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Robin Sommer


On Tue, Aug 07, 2018 at 12:05 +0200, Jan Grashöfer wrote:

> What I can recall, it's about simplifying the API in the light of
> multi-hop routing, which is not fully functional yet.

To level up a bit, what I'm hoping for is that we can find some easy
ways to simplify the API a bit more now, with an eye towards dynamic
multi-hop coming later. I don't know if it'll work out before 2.6
still, but changing the API later is more painful.

We don't need to (or even can) solve multi-hop topologies right now, I
think nobody really has the use cases clear in their heads yet. But if
we could simplify the API a bit more for our current use cases in a
way that may extend to multihop naturally later, that would probably
save us some headaches at that point.

> How does forwarding work if I add another node type?

That's actually something I realized yesterday: we don't have direct
worker-to-worker communication right now, correct? A worker cannot
just publish to "bro/cluster/workers".

> Do we assume a certain cluster structure here? If yes: Is that a valid
> assumption?

I think it's safe to assume we have the cluster structure under our
own control; it's whatever we configure it to be. That's something
that's easier to change later than the API itself. Said differently:
we can always adjust the connections and topics that we set up by
default; it's much harder to change how the publish() function works.
 
> From my understanding this would mean going back to the old 
> communication patterns. What's the point of having topics if we don't 
> use them?

Let me try to phrase it differently: If there's already a topic for a
use case, it's better to use it. That's easier and less error-prone.
So if, e.g., I want to send my script's data to all workers,
publishing to bro/cluster/worker will do the job. And that will even
automatically adapt if things get more complex later. For example, I
can see having multiple otherwise independent cluster sharing a
communication channel. In that case, we could internally change the
topic to "bro/cluster//workers", and everybody using the
predefined worker topic would still reach "their" workers without any
further changes.

> That's something I would have expected. I don't think this is 
> necessarily an indicator of bad design.

Maybe it's a *necessary* design, but that doesn't make it nice. ;-) It
makes it very hard to follow the logic; when reading through the
scripts I got lost multiple times because some "@if I-am-a-manager"
was somewhere half a page earlier, disabling the code I was currently
looking at for most nodes. We probably can't totally avoid that, but
the less the better.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-06 Thread Robin Sommer


On Mon, Jul 30, 2018 at 09:01 -0700, I wrote:

> Is there a summary somewhere of what events & topics the cluster nodes
> are currently exchanging?

So I went through the exercise of collecting this information: what
connections do we have between nodes, who's subscribing to what, and
who's publishing what; see the attached PDF. This is based on all the
standard scripts, with some special cases ignored (like the control
framework).

I'm not fully sure yet what to conclude from this, but a few quick
observations:

- The main topics are bro/cluster/ and
  bro/cluster/node/. For these we wouldn't have a problem
  with loops if we enabled automatic, topic-driven forwading as
  far as I can see.

- bro/cluster/broadcast seems to be the main case with a looping
  problem, because everybody subscribes to it. It's hardly used
  though. (bro/config/change is used similarly though).

- Relaying is hardly used.

- There are a couple of script-specific topics where I'm wondering
  if these could switch to using bro/cluster/ instead
  (bro/intel/*, bro/irc/dcc_transfer_update). In other words: when
  clusterizing scripts, prefer not to introduce new topics.

- There's a lot of checks in publishing code of the type "if I am
  (not) of node type X".

- Pools are used for two different things: 1. the known-* scripts
  pick a proxy to process and log the information; whereas 2. the
  Intel scripts pick a proxy just as a relay to broadcast stuff
  out, reducing load. That 1st application is a good, but the 2nd
  feels like should be handled differently.

Need to mull over this more, thoughts welcome.

Overall I have to say I found it pretty hard to follow this all
because we don't have much consistency right now in how scripts
structure their communication. That's not surprising, given that we're
just starting to use all this, but it suggests that we have room for
improvement in our abstractions. :)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com


Broker Communication.pdf
Description: Adobe PDF document
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-06 Thread Robin Sommer



On Fri, Aug 03, 2018 at 15:57 -0500, Jonathan Siwek wrote:

> Another use is hidden within Cluster::relay_rr():

Yeah, though at least from an API perspective this is different: The
caller gives relay_rr() only one topic to send to (indicator_topic).
It's then using a different topic internally to get it over to the
proxy first, but that feels more like an implementation detail. So in
that sense I would argue that this is not a use-case for the Broker
API letting users change the topic on relay. (I'm not saying that that
capability can't be useful, I'm just still looking for actual use
cases.)

I have another question about this specific case: we use relay_rr()
only for sending Intel::insert_indicator. Intel::remove_indicator gets
published normally through auto_publish(). Why the difference?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-08-03 Thread Robin Sommer



On Fri, Jul 27, 2018 at 10:39 -0700, I wrote:

> Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, 
> val, location);

Can somebody remind me what the use-case is for changing the topic on
relay? Grepping over our standard scripts, I see only one use of
relay(), and that's the one above.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer



On Mon, Jul 30, 2018 at 13:30 -0500, Jonathan Siwek wrote:

> Seems clunky and could get dicey

Agreed. :) It'd just be a heuristic to catch some obvious errors. I
don't think there's more we can do, we can't really catch loops
statically by looking at the code.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer



On Mon, Jul 30, 2018 at 11:15 -0500, Jonathan Siwek wrote:

> I don't see why not, but it takes planning and prudence on everyone's
> part (including users) to not break that rule.

Yeah, question is we can pre-configure the cluster so that user's
don't need to worry about it most of the time.

> I'd be more comfortable if one could automate answering the question:
> "if I add a subscription to a given node in the network, will I create
> a cycle?".

Hmm ... What about a test mode where we'd spin up a dummy cluster
(similar to what the bests do), have each node send a message to all
subscribed topics, and watch for TTL violations?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Robin Sommer
On Fri, Jul 27, 2018 at 14:47 -0500, Jonathan Siwek wrote:

> Broker does not yet have automatic multihop where subscriptions are
> globally flooded automatically.

Yep, that's what I meant: dynamic multihop where each node tracks what
its peers are subscribing to, and forwards messages independent of its
own subscriptions.

> Possibly a downside is now you need to store original hop limit in
> addition to current TTL in each message if you want to detect the "is
> 1st hop" condition for the "relay_topic" option below.

Yeah, that's right. Actually I think ideally the 1st hop wouldn't have
any special role anyways if we didn't need that "relay_topic".

> It's maybe both a concern and a reality -- Bro clusters currently
> contain cycles (e.g. worker -> manager -> proxy -> worker)

True, although it's not cycles in the connection topology that matter,
it's cycles in topic subscriptions. I need to think about this a bit
more (and I need to remind myself how our topics currently look like)
but could we set up topics so that even in a cluster, messages don't
go into a cycle?

Is there a summary somewhere of what events & topics the cluster nodes
are currently exchanging?

> > - Add a second function publish_pool() that has all the same
> >   options, but receives a pool type instead of a topic (just an
> >   enum: RR, HRW).
> 
> What's the goal of the enums instead of just publish_hrw() and publish_rr() ?

Similar to what Justin wrote, it would more directly express the
intent, with less emphasis on the mechanism; we could set a
default to whatever we recommend people normally use; and it'd be more
extensible.

> At the moment, one could write their own wrapper function around that
> if they find it too verbose and always want to use certain defaults?

They could, but my general point is that it'd be nice to have a simple
API that covers the most common uses cases directly and intuitively.
Then let people change defaults if they have to and know what they are
doing.

Robin



-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] Broker::publish API

2018-07-27 Thread Robin Sommer
The other day when merging Johanna's code to clusterize the
configuration framework, I noticed this code in there:

 # [Send id=val to everyone else]

 Broker::publish(change_topic, Config::cluster_set_option, ID, val, 
location);

 if ( Cluster::local_node_type() != Cluster::MANAGER )
 Broker::relay(change_topic, change_topic, Config::cluster_set_option, 
ID, val, location);

It took me a bit to understand that ... The goal here is that a change
in a configuration value gets propagated out to all nodes in the
cluster. The Broker::publish() sends it to a node's immediate
neighbors, but not further. That means that for workers it goes (only)
to their manager; for the manager it means, it goes to all workers. If
we're not a manager, we then separately (through Broker::relay()) ask
our neighbors (that's the manager) to forward the change to *their*
neighbors (that's the other workers), without reraising it locally.

I remember we have discussed this API before, but I wanted to bring it
up again as I keep finding it confusing. I believe the code above
could be simplified by using the newer Broker::publish_and_relay(),
which was added to combine the two operations. Still, I'm realizing
now that I don't like thinking about this in terms of separate
publishing and relaying operations.

It all won't become easier once we add multi-hop routing to the mix
(which is in the works). And on top of all that, we also have
Cluster::publish_rr, Cluster::publish_hew, Cluster::relay_rr, and
Cluster::relay_hew -- another set of separate publishing & relay
options.

I'm wondering if we should give it another try to simply this API
while we still can (i.e., before 2.6 goes out). To me, the most
intuitive publish operation is "send to topic T and propagate to
everybody subscribed to that topic". I'd structure the API around
that, making that the main publish function for that simply:

Broker::publish(topic, args);

That would send to all neighbors, which then process locally and relay
to their neighbors. Right now, that would propagate just across one
hop but once we have multihop that'd start being broadcasted out
broadly.

To support the other use cases, we can then add modifiers & functions
to tweak this default, e.g.:

- Give publish() another argument "relay: bool =T" to prevent
  it from going beyond the immediate receiver. Or maybe instead:
  "relay_hops: int =-1" to specify the max number of hops
  to relay across, with -1 meaning no limit. (I recall concerns
  about loops being too easy to create; we could set the default
  here to F/0 to default to no forwarding, although conceptually I
  don't really like that :-)

- Give publish() another argument "relay_topic: string =""
  to change the topic when relaying on the 1st hop.

- Give publish() another argument "process_on_relays: bool =T"
  to change whether a relaying hop also sees the event locally.

- Add a second function publish_pool() that has all the same
  options, but receives a pool type instead of a topic (just an
  enum: RR, HRW).

What I'm not quite sure about is if some of these modifiers are better
to leave for the receiver to specify (e.g., whether to raise events
received on a given topic locally, or just forward). I think I can see
that either way.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-12 Thread Robin Sommer



On Tue, Jun 12, 2018 at 14:16 -0500, you wrote:

> This has lead to the fix/workaround at [1], now in master, which

Cool, that indeed solved it! It also helps significantly when data
stores *are* being used; that still takes about 2x the time, but
that's much less than before. Now I'm wondering if we can get that
back down to normal, too ...

One question about Broker's endpoint::advance_time(): that's locking
on each call when in pcap mode, i.e., once per packet. Could we limit
that to cases where something actually needs to be done? Quick idea:
use an atomic<> for current_time plus another atomic<> counter
tracking if there are any pending message. And go into the locked case
only if the latter is non-zero?

> General changes/improvements in CAF itself may be warranted here

Yeah, sounds like it.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-11 Thread Robin Sommer


On Fri, Jun 08, 2018 at 10:47 -0700, I wrote:

> Ok, I'll dig around a bit more and also double-check that I didn't
> change any settings in the meantime.

Confirmed that I'm still seeing that difference even when using the
options to turn the stores. Tried it on two different Fedora 28
systems, with similar results.

I haven't been able to nail down what's going on though. The
AdvanceTime() method does seem to do a lot of locking in pcap mode,
independent of whether there's actually any data store operations.
However, I tried a quick hack to get that down and that didn't change
anything.

I then ran it through oprofile. Output is attached. There's indeed
some locking showing up high in there, but I can't tell if that's just
expected idling in CAF. I am bit surprised to see a number of
std::chrono() methods showing rather prominently that I would expect
to be cheap.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
Using /home/robin/bro/master/tmp/oprofile_data/samples/ for samples directory.

WARNING! Some of the events were throttled. Throttling occurs when
the initial sample rate is too high, causing an excessive number of
interrupts.  Decrease the sampling frequency. Check the directory
/home/robin/bro/master/tmp/oprofile_data/samples/current/stats/throttled
for the throttled event names.


WARNING: Lost samples detected! See 
/home/robin/bro/master/tmp/oprofile_data/samples/operf.log for details.
warning: /hpsa could not be found.
warning: /kvm could not be found.
warning: /nf_conntrack could not be found.
warning: /nf_defrag_ipv4 could not be found.
warning: /nf_nat could not be found.
warning: /nf_nat_ipv4 could not be found.
warning: /tg3 could not be found.
CPU: Intel Sandy Bridge microarchitecture, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %image name   symbol name
48605 3.3738  kallsyms find_busiest_group
31108 2.1593  libtcmalloc.so.4.5.1 /usr/lib64/libtcmalloc.so.4.5.1
24088 1.6720  bro  RE_Match_State::Match(unsigned char 
const*, int, bool, bool, bool)
22833 1.5849  kallsyms native_write_msr
20525 1.4247  kallsyms finish_task_switch
20314 1.4100  kallsyms syscall_return_via_sysret
16822 1.1677  kallsyms __softirqentry_text_start
16520 1.1467  libcaf_core.so.0.15.7
caf::detail::double_ended_queue::lock_guard::lock_guard(std::atomic_flag&)
15112 1.0490  kallsyms update_blocked_averages
12897 0.8952  kallsyms run_timer_softirq
12890 0.8947  kallsyms pick_next_task_fair
12729 0.8836  libpthread-2.27.so   nanosleep
12495 0.8673  kallsyms update_curr
12209 0.8475  kallsyms _raw_spin_lock
12186 0.8459  libcaf_core.so.0.15.7caf::resumable* 
caf::policy::work_stealing::dequeue
 >(caf::scheduler::worker*)
11979 0.8315  kallsyms __schedule
11886 0.8250  kallsyms __update_load_avg_cfs_rq.isra.34
11463 0.7957  kallsyms idle_cpu
11239 0.7801  kallsyms __update_load_avg_se.isra.33
11178 0.7759  kallsyms native_sched_clock
11010 0.7642  kallsyms update_load_avg
10854 0.7534  libcaf_core.so.0.15.7
std::atomic::node*>::load(std::memory_order)
 const
10770 0.7476  kallsyms load_balance
10737 0.7453  libcaf_core.so.0.15.7decltype (({parm#1}->data)()) 
caf::policy::unprofiled::d 
>(caf::scheduler::worker*)
10582 0.7345  bro  DFA_State::Xtion(int, DFA_Machine*)
10554 0.7326  libcaf_core.so.0.15.7
caf::detail::double_ended_queue::take_head()
10185 0.7070  bro  RandTest::add(void const*, int)
10133 0.7034  libcaf_core.so.0.15.7
std::__uniq_ptr_impl::node, 
std::default_delete::node> 
>::_M_ptr()
9920  0.6886  libstdc++.so.6.0.25  /usr/lib64/libstdc++.so.6.0.25
9813  0.6811  kallsyms 
swapgs_restore_regs_and_return_to_usermode
9685  0.6723  kallsyms trigger_load_balance
9431  0.6546  libcaf_core.so.0.15.7std::tuple_element<0ul, 
std::tuple::node*, 
std::default_delete::node> > 
>::type& std::get<0ul, caf::detail::double_ended_queue::node*, 
std::default_delete::node> 
>(std::tuple::node*, 
std::default_delete::node> >&)
9404  0.6528  libcaf_core.so.0.15.7
caf::scheduler::worker::data()
9350  0.6490  kallsyms do_syscall_64
9311  0.6463  libcaf_core.so.0.15.7
std::enable_if::node*>
 >, 
std::is_move_constructible::node*>,
 
std::is_move_assignable::node*> 
>::value, void>::type 
std::swap::node*>(caf::detail::do

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Robin Sommer


Ok, I'll dig around a bit more and also double-check that I didn't
change any settings in the meantime.

Robin

On Fri, Jun 08, 2018 at 12:40 -0500, you wrote:

> On Fri, Jun 8, 2018 at 12:16 PM Jon Siwek  wrote:
> 
> > Only thing I'm thinking is that your test system maybe
> > does a poorer job of scheduling the right thread to run in order for
> > the FlushPendingQueries() spin-loop to actually finish.
> 
> Actually, realized you still had bad timing with data stores off, so
> it would more likely be a problem with the AdvanceTime() code path.
> There's some mutex locking going on there, but w/o data stores
> involved there shouldn't be anyone competing for them.
> 
> - Jon
> 


-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Robin Sommer



On Thu, Jun 07, 2018 at 17:05 -0500, you wrote:

> Thanks, if you pull master changes [1] again there's likely some improvement.

Yeah, a little bit, not much though.

> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> Known::use_host_store=F Known::use_service_store=F
> Known::use_cert_store=F

That indeed gets it way down, though still not back to the same level
on my box:

170.49user 58.14system 1:55.94elapsed 197%CPU

(pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU)

Are there more places where Bro's waiting for Broker in pcap mode?

> With that, I get the same timings as from before pre-Broker.  At least
> a good chunk of the difference when using data stores is that, for
> every query, Bro will immediately block until getting a response back
> from the data store thread/actor.

Yeah, I remember that discussion. It's the trade-off between
performance and consistency. Where's the code that's doing this
blocking? Would it be possible to not block right away, but sync up
with Broker when events are flushed the next time? (Like we had
discussed before as a general mechanism for making async operations
consistent)

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-07 Thread Robin Sommer
Hmm, I'm still seeing much larger runtimes on that M57 trace using our
test configuration, even with yesterday's change:

; Master, pre-Broker
# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata 
199092maxresident)

; Current master
# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata 
228400maxresident)

Bro must still be blocking somewhere when reading from trace files.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker has landed in master, please test

2018-05-24 Thread Robin Sommer


On Wed, May 23, 2018 at 20:16 +, Adam wrote:

> I think those question belong on the main list which is for using Bro
> and its language. This list is really more about development of Bro
> itself.

Just to give context here, the reason I sent the original mail about
Broker here, including the request for feedback, was to limit the
initial round of testing to folks quite familiar with Bro and its
internals. That gives us a chance to spot any obvious issues quickly
before annoying everybody else. :-) But discussing it at either place
is fine of course, whatever works best for folks. If things seem to
work, we should definitely also announce the merge more broadly.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Moving to GitHub?

2018-05-22 Thread Robin Sommer
> * Someone is likely to report the same problem again
> * There's clear directions on how to reproduce an undesired behavior
> * There been a proposed plan of action recently
> 
> And many tickets can be ruled out:
> 
> * Vague feature requests
> * Not enough details  / difficult to reproduce
> * Exceptionally old proposals / plans

Yeah, I'm on board with these. I'd probably interpret them more
conservatively than you towards closing more tickets, but that's fine.
As you have volunteered to take this one, I'd say you get to make the
call: just go through and port over what you think makes sense along
those lines. As one additional piece, let's also think about some tags
to use for classifying tickets, including one for what's good tasks
for newcomers who want to get into the code.

(In principle I also like Alan's suggestion of moving everything over
and then just close them out so that the history remains. But I'm
afraid that couldn't be automated easily and would then just be too
much work.)

> starting with a clean slate on GitHub now only means it's likely to
> eventually end up in the same situation as JIRA later.  What then?
> Move to another tracker again?

Doesn't need to be as drastic: as some people here can confirm, I
have no problem doing extensive sweeps if things get too overwhelming. :-)

But yes, point taken, my hope is that we can stay on top of things on
the new tracker and make an effort to get stuff addressed and
resolved. We'll see. :)

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] Broker has landed in master, please test

2018-05-22 Thread Robin Sommer
We merged the new Broker version into Bro master yesterday. As this a
major change to one of Bro's core components, I wanted to send a quick
heads-up here, along with a couple of notes.

With this merge we are completely replacing usage of Bro's classic
communication system with Broker. All the standard scripts have been
ported over, and BroControl has been adapted as well. The old
communication system remains available for the time being, but is now
deprecated and scheduled to be removed in Bro 2.7 (not 2.6). Broccoli
is now turned off by default.

With such a large change, I'm sure there'll be some more kinks to iron
out still; that's where we need everybody's help. If you have an
environment where you can test drive new Bro versions, please give
this a try. We're interested in any feedback you have, both specific
issues you encounter (best to file tickets) and general experiences
with the new version, including in particular any observations about
performance (best to send to this list).

>From a user's perspective, not much should even be changing, most of
the new stuff is under the hood. The exception are custom scripts
doing communication themselves, they need to be ported over to Broker.
Documentation for that is here:
https://www.bro.org/sphinx-git/frameworks/broker.html, including a
porting guide for existing scripts. Let us know if there's anything
missing there that would be helpful. The Broker library itself comes
with a new user manual as well, we'll get that online shortly.

One specific note on upgrading existing Bro clusters: the meaning of
"proxy" has changed. They still exist, but play a quite different role
now. If you're currently using more than one proxy, we recommend going
back to one; that'll most likely be fine with the standard scripts
(and if not, please let us know!)

Many thanks to Jon Siwek for the recent integration work tying up all
the loose ends and getting Broker mergeable. Also thanks to those who
have tested it already from the actor-system branch.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Moving to GitHub?

2018-05-18 Thread Robin Sommer


On Fri, May 18, 2018 at 08:27 -0500, you wrote:

> Doing a half-hearted effort to migrate tickets from JIRA undermines the goal
> of having an authoritative/central location for all code + tickets.  Can we
> instead try to deal with it once and for all?

What I was envisioning is more or less a clean slate: we'd migrate
over a few tickets, but essentially we'd start with an empty list. I
realize that sounds pretty harsh. However, I hardly ever see any
activity on older tickets in JIRA, and I generally believe that the
less open tickets a tracker has, the easier it is for people to
understand what's actually relevant and worth spending cycles on.
Tagging tickets may help, but in the end if everybody just filters 95%
out all the time anyways, I'm not sure what the value is.

That said, I'm open to a real porting effort if people do believe it's
helpful to get all the JIRA tickets into GitHub. What do others think?

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Moving to GitHub?

2018-05-17 Thread Robin Sommer


On Thu, May 17, 2018 at 11:21 -0500, you wrote:

> * For porting over JIRA tickets to GitHub, "most recent" doesn't seem like a
> good metric to use.

Agree. :)

> they may as well just port all the older ones that are still valid
> over to GitHub.

That may be a bit too broad though. How about "still valid and either
(1) quite important or (2) something we expect will be addresses
reasonably soon"? We have many old tickets that are technically still
valid but unlikely to see any work anytime soon (otherwise they would
have been addressed already), and I'm worried that they would just add
noise without value. The old tickets won't go away, the JIRA will
remain. If something becomes relevant/active, we can always bring it
over at that time.

> I find myself in that situation quite often, actually, so
> transitioning to GitHub PRs, I wonder if we'd want a PR to be created
> against each individual repo?

Good point. Creating just one root PR that mentions the others sounds
good to me for such cases. 

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Moving to GitHub?

2018-05-16 Thread Robin Sommer


On Wed, May 16, 2018 at 15:23 +, you wrote:

> I too would miss the commit / change notifications, however, I think
> that this can be set up in GitHub in some way.

We can still get the same email notifications as today (which have a
bit more information that GitHub's standard ones), they will just come
with a little bit of a delay (within 10-15 minutes should be
reasonable). And if we want we can trigger that through webbhooks,
too, for immediate notification, would just need a bit of work to get
it set up.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Writing analyzer for Siemens PLC

2018-05-03 Thread Robin Sommer


On Wed, May 02, 2018 at 22:22 +0200, you wrote:

> 1) Reassembling packets: Some S7CommPlus packets which payload is over a 
> certain amount of bytes will be split and need to be reassembled.

As a couple quick pointers, the DNP3 and DTLS analyzers face a similar
task, you might find some ideas there.

>  If I want to generate a Bro events which contains the payload as a
>  parameter, how do I do that?

If with "payload" you mean the raw bytes, you would pass that as a
string into the event. But it's hard to do much with raw data that in
script-land. The common way would be instead creating one event per
type of payload and then raising the corresponding event as you parse
packets and find out what's in there.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] set and vector operators

2018-04-30 Thread Robin Sommer


On Mon, Apr 30, 2018 at 07:10 -0700, you wrote:

> Okay, I can live with this as long as '|' and '-' support add-to-set and
> remove-from-set.   But I think those have to work, given we'll enable them
> for operations on two sets.

Well, my vote then remains not adding new set operators for
add/delete, so that we don't have multiple ways to do the same thing.
Just looked at Python again, as a data point: That's what they do,
too. There are '|'/'&'/'-' for set/set operations, but no versions of
those for individual elements (they do that through methods instead;
add/delete are kind of our version of methods). Same for Ruby. I
looked around for a few more minutes for other languages, but didn't
immediately find any that even have any set operators at all (only
methods/functions for union/intersection/etc.).

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] set and vector operators

2018-04-30 Thread Robin Sommer


On Mon, Apr 30, 2018 at 07:10 -0700, you wrote:

> "vector(v op e)".  Wrapped in "vector(...)", the operation becomes the
> current semantics (apply "op e" separately to each element of v).

One additional piece of context here: That vector(...) syntax could
then be used more broadly in the sense of creating a different
semantic context for the operations inside. That kind of opens up a
whole new set of of type-specific operator meanings, without affecting
current/standard ones (it's like introducing R inside parentheses :-).
It's not the super-great, but at least it's explicit and we couldn't
come up with anything better if we want to have such operations as
operators. Might work for some other types as well.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Final Broker branch testing

2018-04-27 Thread Robin Sommer


On Thu, Apr 26, 2018 at 16:54 -0500, you wrote:

> (1) Users whose OS has insufficient CMake will need to compile/obtain a  
> newer one.

> (2) We go back to CMake 2.8.12 and have people compile CAF themselves. 
> (Or maybe we could conditionally require only 2.8.12 users to compile 
> CAF and others get the embedded CAF).
 
> (3) I need to try to hack our CMake system more to try to get back down 
> to 2.8.12 while still being able to embed CAF.

If there's something quick that ends up making (3) work, that'd be
ideal of course, but I don't think it's worth spending much time on,
given that there are reasonable ways to get a more recent cmake.

I wouldn't want to go back to not shipping CAF at all, but if we can
tell cmake that 2.8.12 is fine if users build CAF themselves, that
would be the 2nd best option I think. (1) ist worst case, which still
isn't too bad IMO, unless it does actually prevent us from building
binary packages for RH, that would be a problem.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] How to deal with stale branches?

2018-04-27 Thread Robin Sommer
Personally I don't really mind such branches sticking around for
reference purposes. We have plenty stale branches anyways all over, it
would probably be more to clean up those (looking at myself there, too :)

Robin

On Fri, Apr 27, 2018 at 03:01 +, you wrote:

> Yeah, that's certainly one option, but I think it'd be hard for people to
> find.
> 
> On Thu, Apr 26, 2018 at 8:15 PM, Jon Siwek <jsi...@corelight.com> wrote:
> 
> >
> >
> > On 4/26/18 11:06 AM, Vlad Grigorescu wrote:
> >
> > I'm torn between deleting the branches, in an effort to not clog up git
> >> with unneeded branches, and leaving them around or perhaps archiving them
> >> somewhere, in order to not completely lose the work in case it's of value
> >> to someone down the road.
> >>
> >> I'm curious if anyone has thoughts on the best way to proceed.
> >>
> >
> > Maybe delete the branch from the official git repo and push it to your own
> > github fork.
> >
> > - Jon
> >

> _______
> bro-dev mailing list
> bro-dev@bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev



-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Final Broker branch testing

2018-04-26 Thread Robin Sommer


On Thu, Apr 26, 2018 at 14:30 -0700, you wrote:

> It might be. I am honestly not sure - I suspect that this still will 
> mean that some places might not be able to easily use Bro 
> anymore--adding external package sources does not seem to be a viable 
> option everywhere.

Is it a feasible compromise to allow cmake 2.8 if we don't need to
build CAF? So either people have cmake 3.0 or they need to build CAF
themselves and say --with-caf=...?

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] set and vector operators

2018-04-26 Thread Robin Sommer


On Wed, Apr 25, 2018 at 22:19 -0700, you wrote:

> Now there's no problem, since the lexer only recognizes ""
> as a unit, with no whitespace allowed.

Good idea, sounds right. And in case it did turn out to be
problematic, we could still go the way of adding all as keywords
later.

> How does that sound?

Sounds good to me, the bitwise operations will be great to have, too.

Just one more thing still: I'm actually feeling pretty strongly
against having multiple different operators for the same operation
(set union, set addition/removal). I just see that as leading to
confusion: scripts will inconsistently use on or the other, people
will have different preferences which version to prefer; they may not
even remember the other one. And we'd end up having to explain why
there are two versions, without having much of a great explanation
("one is ugly" doesn't sound great to me :-). Is it just me feeling
that way?

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] set and vector operators

2018-04-25 Thread Robin Sommer


On Wed, Apr 25, 2018 at 10:40 -0700, you wrote:

>   s1 + s2 Set union (for sets of the same type, of course)
>   s1 || s2Set union

(What's the difference between the two? Or do you mean either one or
the other?)

Like Justin, I was also thinking "|" and "&" might be more intuitive.
"||"/"&&" is really typically associated with boolean contexts, and
other languages mgiht also coerce set operands into booleans in such a
context, so that, e.g., "s1 || s2" evaluates to true if either is
non-empty. 

I see the problem with the parser but maybe adding keywords is the way
to go.

>   s += e  Add the element 'e' to set 's'
>   (same as the current "add s[e]")
>   s -= e  Remove the 'e' element from 's', if present
>   (same as the current "delete s[e]")

I'd skip these. I don't think we should add an additional set of
operators for things that Bro already supports, that's seems confusing
to me (like Perl :)

>   s1 += s2Same as "s1 = s1 + s2"

(Or s1 |= s2 if we pick "|" for union.)

>   v += e  Append the element 'e' to the vector 'v'

That's probably the most requested Bro operator ever! :)

>   v += s  Append the elements of 's' to the vector 'v',
>   with the order not being defined

This one I'm unsure about. The point about the order being undefined
seems odd. If I don't care about order, wouldn't I stay with a set?

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Overload Bro Events

2018-04-12 Thread Robin Sommer


On Thu, Apr 12, 2018 at 14:44 -0500, you wrote:

> > event overload%(c: connection%);
> > event overload%(c: connection, h: header%);
> > event overload%(c: connection, h: header, d: data%);
> 
> Overloading is not supported for functions in general (function/event/hook).

This has interesting implication for BIT-1431: if overloading worked
work, that could take the place of the  attribute suggested
there.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Robin Sommer


On Thu, Mar 08, 2018 at 13:28 -0600, you wrote:

> * Rename "proxy" nodes?

"compute" maybe?

> (1) , ,   Seems fine to deprecate these 
> now.

Agree.

> (2) Communication framework scripts.  It's awkward to deprecate stuff
> here since they internally will be using what is deprecated.  I think
> it makes sense to just remove it and let users manually make a copy if
> they still need it.

Likewise agree.

> (3) Old C++ comm. code, e.g. RemoteSerializer.  I think we'll leave
> this untouched for the 2.6 release?

Yep, I'm looking forward to ripping that out for 2.7. :)

> (4) BIFs associated with old comm. system.  Depends on (3) (and also
> actually (2)), though if the internal core code remains, I think it
> makes sense to add  to these.

Deprecating them sounds good.

> It makes more sense to me to assume the user wants to insert the key
> with a sane default value (e.g. zero/empty) in those cases, otherwise,
> it's awkward/race-prone to require they do it themselves.

Agree with this as well, has always felt a bit awkward to me too.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] Offline Broker usage (Re: [Bro-Commits] [git/bro] topic/actor-system: Fix Known scripts to be able to use alternate implemenation (50e1498))

2018-03-08 Thread Robin Sommer
Jon, I noticed your commit message on data store expiration:

> commit 50e1498d2b39d6af1f70dbc042ab544506a67e43
> Author: Jon Siwek <jsi...@corelight.com>
> Date:   Wed Mar 7 21:24:46 2018 -0600
> 
> Fix Known scripts to be able to use alternate implemenation
> 
> And run the external test suite using the alternate implementation
> due to data stores behaving differently when running on offline pcaps.
> E.g. expirations are based on wall time, not packet time, and timeouts
> (which *are* based on packet time) may occur when the store is still
> initializing due to a large interval of packet time passing.

That brings up an interesting question on data store semantics in
offline vs online mode. Ideally, there wouldn't be any difference
between the two operation modes, so that running on a trace gives
exactly the same results as online. That would match how Bro generally
operates. Could we make data store expiration driven by network time?
That'd need an API for Bro to drive Broker time forward. And for the
initialization, maybe Bro could wait for the initialization to finish?
Although I'm not quite sure here which initialization that refers too,
may not be feasible.

Are there other differences with stores between online and offline
operation?

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


  1   2   3   4   5   6   7   8   9   10   >