[DISCUSS] MPack components that don't support Kerberos

2017-04-13 Thread Ryan Merriman
There is a PR up for review (
https://github.com/apache/incubator-metron/pull/518) that updates our MPack
to support a Kerberized environment.  There is also a PR up for review that
adds the REST service to the MPack (
https://github.com/apache/incubator-metron/pull/500).

However, the REST application currently does not work in a kerberized
environment.  That work has already started so it won't be an issue for
long but how should we handle situations like this in the future where we
want to add a service but it's not quite ready for Kerberos?  Should
Kerberos support be a prerequisite before it's added to the MPack?  Should
we look at ways to make these services optional?  Any other thoughts or
ideas?

Ryan


Re: metron UI

2017-04-11 Thread Ryan Merriman
That PR was merged this morning.  Do a pull and you should see it in master.

On Tue, Apr 11, 2017 at 11:44 AM, moshe jarusalem  wrote:

> Hi All,
> Am I missing something basic ? or something other? Would you promptly
> redirect to a doc ?
>
> On Tue, Apr 11, 2017 at 4:08 PM, moshe jarusalem  wrote:
>
> > Hi All,
> > I am trying to install metron UI mentioned at the following link
> >
> >   https://github.com/apache/incubator-metron/pull/489#
> > discussion_r109166246
> >
> > but the metron-config project is not available at the repository.
> >
> > Regards,
> >
>


Re: [DISCUSS] next release proposal

2017-04-05 Thread Ryan Merriman
We just finished responding to the first round of feedback so I don't think
we're that far away on METRON-623.

On Wed, Apr 5, 2017 at 3:30 PM, Matt Foley  wrote:

> Totally agree would be good to have MPack support.  Let’s see how it
> goes.  Wouldn’t want to cut it out for the sake of a day or two.
>
> On 4/5/17, 1:14 PM, "Justin Leet"  wrote:
>
> I've made fairly good progress on
> https://issues.apache.org/jira/browse/METRON-799 (The MPack should
> function
> in a kerberized cluster).  The PR itself might cut close to the
> deadline,
> and in particular might be tough to get reviewed in time.
>
> I'll do a best effort attempt to get it in to make our Kerberos story
> more
> complete, but I'd say the release can go on without this (and we use
> manual
> Kerberos in its absence).
>
> Justin
>
> On Wed, Apr 5, 2017 at 4:07 PM, Matt Foley  wrote:
>
> > Sure.  To be clear, I wasn’t proposing an exclusive list, just
> making the
> > argument that there seemed to be enough to proceed with.  Any duly
> > committed content in the master branch, at the time we create the
> first RC
> > (ie, some time after METRON-623 goes in, but not before Monday) will
> surely
> > be included in the RC, unless something has a bug that can’t be
> readily
> > resolved.
> >
> > Thanks,
> > --Matt
> >
> > On 4/5/17, 12:56 PM, "David Lyle"  wrote:
> >
> > I'm working on METRON-826 right now. I'll have a PR up today or
> > tomorrow at
> > the latest. I'd like to see it go as well.
> >
> > https://issues.apache.org/jira/browse/METRON-826
> >
> > -D...
> >
> >
> > On Wed, Apr 5, 2017 at 3:52 PM, Nick Allen 
> wrote:
> >
> > > I would like to include #509 with the Fastcapa improvements..
> > Already have
> > > a +1.  I'm just letting it soak giving others some time to
> review if
> > they
> > > feel so inclined.
> > >
> > > https://github.com/apache/incubator-metron/pull/509
> > >
> > >
> > > On Wed, Apr 5, 2017 at 3:50 PM, James Sirota <
> jsir...@apache.org>
> > wrote:
> > >
> > > > I second this.  I want to see 623 go in in addition to the
> > kerberos work.
> > > > When both are in I think it makes sense to do the release
> > > >
> > > > 04.04.2017, 11:33, "Simon Elliston Ball" <
> > si...@simonellistonball.com>:
> > > > > I'd really like to see METRON-623 (the ui) get into the
> release.
> > It
> > > > feels like the current PR review is getting close, and that
> > getting it in
> > > > then focussing on follow on tasks in a separate release
> would work
> > well.
> > > > >
> > > > > I would be all for getting a release out if only for the
> > Kerberos work.
> > > > >
> > > > > Simon
> > > > >
> > > > >>  On 4 Apr 2017, at 20:15, zeo...@gmail.com <
> zeo...@gmail.com>
> > wrote:
> > > > >>
> > > > >>  How far out is the management UI?
> > > > >>
> > > > >>  Jon
> > > > >>
> > > > >>>  On Tue, Apr 4, 2017, 2:09 PM Matt Foley <
> ma...@apache.org>
> > wrote:
> > > > >>>
> > > > >>>  Hi all,
> > > > >>>  Although it’s only been a few weeks since the last
> release was
> > > finally
> > > > >>>  published, that process started in January :-)
> > > > >>>  Also, the last commit in 0.3.1 was Feb 23, and there’s
> been a
> > ton of
> > > > >>>  really cool new stuff added since then:
> > > > >>>
> > > > >>>  Biggest items:
> > > > >>>  - Multiple commits for REST API (base Jira: METRON-503)
> > > > >>>  - Multiple commits to work with Kerberized (secure)
> clusters
> > (mult.
> > > > Jiras)
> > > > >>>
> > > > >>>  Other major new features:
> > > > >>>  - METRON-690: DSL-based sparse time window
> specification for
> > > Profiler
> > > > >>>  - METRON-733: Remove Geo db from ParserBolt
> > > > >>>  - METRON-686: Record rule set that fired during Threat
> Triage
> > > > >>>  - METRON-743: Sort files when reading results from Pcap
> > > > >>>  - METRON-701: Triage metrics produced by Profiler
> > > > >>>  - METRON-744: Stellar external functions loaded from
> HDFS
> > (and huge
> > > > >>>  speed-up for function resolution)
> > > > >>>  - METRON-694: Index errors from Topologies, and
> > > > >>>  - METRON-745: Create Error dashboards
> > > > >>>  - METRON-712: Separate eval from parse in Stellar
> > > > >>>  - METRON-765: Add GUID to messages
> > > > >>>  - METRON-793: Updated to storm-kafka-client spout
>

Re: [MENTORS] Minified javascript license

2017-04-05 Thread Ryan Merriman
Very helpful, thanks Taylor.

On Wed, Apr 5, 2017 at 4:34 PM, P. Taylor Goetz <ptgo...@gmail.com> wrote:

> Sorry to reply from a phone...
>
> If this is code generated from Metron, just add the header to the
> minified  code. If it came from an external source, the L & E files should
> be updated appropriately.
>
> -Taylor
>
> > On Apr 5, 2017, at 10:25 PM, James Sirota <jsir...@apache.org> wrote:
> >
> > Wanted to bump this up to the top.  Did we ever get a resolution on this?
> >
> > 05.04.2017, 05:51, "Ryan Merriman" <merrim...@gmail.com>:
> >> Mentors,
> >>
> >> We package javascript code in a tar.gz archive for one of our modules.
> >> Before it is added to the archive it is compiled down from Typescript,
> then
> >> minified. Would that code still be considered source at that point and
> >> still require a license header?
> >>
> >> I found this section on the Apache website that seems relevant (
> >> https://www.apache.org/legal/src-headers.html#is-a-short-
> form-of-the-source-header-available)
> >> because it mentions minified javascript as an example. But I'm not clear
> >> on what exactly must be done to the minified javascript file. How and
> >> where should the shorter form be applied? If it should be applied as a
> >> comment then why not just apply the normal header?
> >>
> >> Thanks in advance for any help or advice you can give me. I am not able
> to
> >> find a clear example or explanation and want to make sure this gets done
> >> correctly.
> >>
> >> Ryan Merriman
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
>


[MENTORS] Minified javascript license

2017-04-05 Thread Ryan Merriman
Mentors,

We package javascript code in a tar.gz archive for one of our modules.
Before it is added to the archive it is compiled down from Typescript, then
minified.  Would that code still be considered source at that point and
still require a license header?

I found this section on the Apache website that seems relevant (
https://www.apache.org/legal/src-headers.html#is-a-short-form-of-the-source-header-available)
because it mentions minified javascript as an example.  But I'm not clear
on what exactly must be done to the minified javascript file.  How and
where should the shorter form be applied?  If it should be applied as a
comment then why not just apply the normal header?

Thanks in advance for any help or advice you can give me.  I am not able to
find a clear example or explanation and want to make sure this gets done
correctly.

Ryan Merriman


Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-22 Thread Ryan Merriman
I hear ya on Ansible.

On Wed, Mar 22, 2017 at 2:07 PM, David Lyle <dlyle65...@gmail.com> wrote:

> I'm -1 on Ansible for testing for the same reasons I recommended reducing
> reliance on it for deployment. It simply isn't well suited to be a general
> purpose installer (or testing framework) for the variety of OS/Ansible
> versions that we find in the wild.
>
> I'd figure out how/what we want to test and then see what we need for a
> suitable framework. I'm certain we can find one that generalizes better
> than Ansible has for us so far.
>
> -D...
>
>
> On Wed, Mar 22, 2017 at 3:00 PM, Ryan Merriman <merrim...@gmail.com>
> wrote:
>
> > I don't think a cluster installed by ansible is a prerequisite to using
> > ansible to integration test.  They would be completely separate modules
> > except maybe sharing some property or inventory files.  Just need to run
> > scripts and hit rest endpoints right?  Just an idea, maybe it's overkill.
> > I'm cool with rolling our own.
> >
> > On Wed, Mar 22, 2017 at 1:49 PM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> > > Maybe, but I'd argue that we would want this to be run against a
> > > non-ansible installed cluster.  For a first pass, I'd recommend just a
> > set
> > > of shell scripts utilizing the REPL and the REST API along with shell
> > > commands.  Most of our capabilities are quite scriptable.
> > >
> > > On Wed, Mar 22, 2017 at 2:47 PM, Ryan Merriman <merrim...@gmail.com>
> > > wrote:
> > >
> > > > Bumping this thread.  Looks like we have several +1s so I propose we
> > move
> > > > to the next step.  I'm anxious to get this done because these tests
> > would
> > > > have saved me time over the last couple weeks.  The management UI in
> > > > https://github.com/apache/incubator-metron/pull/484 has a set of e2e
> > > tests
> > > > being maintained in another branch so those could also be included in
> > > this
> > > > test suite when the UI makes it into master.
> > > >
> > > > Ideas for an "Acceptance Testing Framework"?  Could Ansible be good
> fit
> > > for
> > > > this since we already have it in our stack?
> > > >
> > > > On Mon, Mar 6, 2017 at 1:01 PM, Michael Miklavcic <
> > > > michael.miklav...@gmail.com> wrote:
> > > >
> > > > > Ok, yes I agree. In my experience with e2e/acceptance tests,
> they're
> > > best
> > > > > kept general with an emphasis on verifying that all the plumbing
> > works
> > > > > together. So yes, there are definite edge cases I think we'll want
> to
> > > > test
> > > > > here, but I say that with the caveat that I think we should ideally
> > > cover
> > > > > as many non-happy-path cases in unit and integration tests as
> > possible.
> > > > As
> > > > > an example, I don't think it makes sense to cover most of the
> > profiler
> > > > > windowing DSL language edge cases in acceptance tests instead of or
> > in
> > > > > addition to unit/integration tests unless there is something
> specific
> > > to
> > > > > the integration with a given an environment that we think could be
> > > > > problematic.
> > > > >
> > > > > M
> > > > >
> > > > > On Mon, Mar 6, 2017 at 11:32 AM, Casey Stella <ceste...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > No, I'm saying that they shouldn't be restricted to real-world
> > > > use-cases.
> > > > > > The E2E tests I laid out weren't real-world, but they did
> exercise
> > > the
> > > > > > components similar to real-world use-cases.  They should also be
> > able
> > > > to
> > > > > be
> > > > > > able to tread outside of the happy-path for those use-cases.
> > > > > >
> > > > > > On Mon, Mar 6, 2017 at 6:30 PM, Michael Miklavcic <
> > > > > > michael.miklav...@gmail.com> wrote:
> > > > > >
> > > > > > > "I don't think acceptance tests should loosely associate with
> > real
> > > > > uses,
> > > > > > > but they should
> > > > > > > be free to delve into weird non-happy-pathways."
> > > > > > >
> > > 

Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-22 Thread Ryan Merriman
I think we'll have non manual rest/web deployment soon regardless of this
discussion.

On Wed, Mar 22, 2017 at 2:00 PM, Ryan Merriman <merrim...@gmail.com> wrote:

> I don't think a cluster installed by ansible is a prerequisite to using
> ansible to integration test.  They would be completely separate modules
> except maybe sharing some property or inventory files.  Just need to run
> scripts and hit rest endpoints right?  Just an idea, maybe it's overkill.
> I'm cool with rolling our own.
>
> On Wed, Mar 22, 2017 at 1:49 PM, Casey Stella <ceste...@gmail.com> wrote:
>
>> Maybe, but I'd argue that we would want this to be run against a
>> non-ansible installed cluster.  For a first pass, I'd recommend just a set
>> of shell scripts utilizing the REPL and the REST API along with shell
>> commands.  Most of our capabilities are quite scriptable.
>>
>> On Wed, Mar 22, 2017 at 2:47 PM, Ryan Merriman <merrim...@gmail.com>
>> wrote:
>>
>> > Bumping this thread.  Looks like we have several +1s so I propose we
>> move
>> > to the next step.  I'm anxious to get this done because these tests
>> would
>> > have saved me time over the last couple weeks.  The management UI in
>> > https://github.com/apache/incubator-metron/pull/484 has a set of e2e
>> tests
>> > being maintained in another branch so those could also be included in
>> this
>> > test suite when the UI makes it into master.
>> >
>> > Ideas for an "Acceptance Testing Framework"?  Could Ansible be good fit
>> for
>> > this since we already have it in our stack?
>> >
>> > On Mon, Mar 6, 2017 at 1:01 PM, Michael Miklavcic <
>> > michael.miklav...@gmail.com> wrote:
>> >
>> > > Ok, yes I agree. In my experience with e2e/acceptance tests, they're
>> best
>> > > kept general with an emphasis on verifying that all the plumbing works
>> > > together. So yes, there are definite edge cases I think we'll want to
>> > test
>> > > here, but I say that with the caveat that I think we should ideally
>> cover
>> > > as many non-happy-path cases in unit and integration tests as
>> possible.
>> > As
>> > > an example, I don't think it makes sense to cover most of the profiler
>> > > windowing DSL language edge cases in acceptance tests instead of or in
>> > > addition to unit/integration tests unless there is something specific
>> to
>> > > the integration with a given an environment that we think could be
>> > > problematic.
>> > >
>> > > M
>> > >
>> > > On Mon, Mar 6, 2017 at 11:32 AM, Casey Stella <ceste...@gmail.com>
>> > wrote:
>> > >
>> > > > No, I'm saying that they shouldn't be restricted to real-world
>> > use-cases.
>> > > > The E2E tests I laid out weren't real-world, but they did exercise
>> the
>> > > > components similar to real-world use-cases.  They should also be
>> able
>> > to
>> > > be
>> > > > able to tread outside of the happy-path for those use-cases.
>> > > >
>> > > > On Mon, Mar 6, 2017 at 6:30 PM, Michael Miklavcic <
>> > > > michael.miklav...@gmail.com> wrote:
>> > > >
>> > > > > "I don't think acceptance tests should loosely associate with real
>> > > uses,
>> > > > > but they should
>> > > > > be free to delve into weird non-happy-pathways."
>> > > > >
>> > > > > Not following - are you saying they should *tightly* associate
>> with
>> > > real
>> > > > > uses and additonally include non-happy-path?
>> > > > >
>> > > > > On Fri, Mar 3, 2017 at 12:57 PM, Casey Stella <ceste...@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > It is absolutely not a naive question, Matt.  We don't have a
>> lot
>> > (or
>> > > > > any)
>> > > > > > docs about our integration tests; it's more of a "follow the
>> lead"
>> > > type
>> > > > > of
>> > > > > > thing at the moment, but that should be rectified.
>> > > > > >
>> > > > > > The integration tests spin up and down infrastructure
>> in-process,
>> > > some
>> > > > of
>> > > > > > which are real and s

Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-22 Thread Ryan Merriman
I don't think a cluster installed by ansible is a prerequisite to using
ansible to integration test.  They would be completely separate modules
except maybe sharing some property or inventory files.  Just need to run
scripts and hit rest endpoints right?  Just an idea, maybe it's overkill.
I'm cool with rolling our own.

On Wed, Mar 22, 2017 at 1:49 PM, Casey Stella <ceste...@gmail.com> wrote:

> Maybe, but I'd argue that we would want this to be run against a
> non-ansible installed cluster.  For a first pass, I'd recommend just a set
> of shell scripts utilizing the REPL and the REST API along with shell
> commands.  Most of our capabilities are quite scriptable.
>
> On Wed, Mar 22, 2017 at 2:47 PM, Ryan Merriman <merrim...@gmail.com>
> wrote:
>
> > Bumping this thread.  Looks like we have several +1s so I propose we move
> > to the next step.  I'm anxious to get this done because these tests would
> > have saved me time over the last couple weeks.  The management UI in
> > https://github.com/apache/incubator-metron/pull/484 has a set of e2e
> tests
> > being maintained in another branch so those could also be included in
> this
> > test suite when the UI makes it into master.
> >
> > Ideas for an "Acceptance Testing Framework"?  Could Ansible be good fit
> for
> > this since we already have it in our stack?
> >
> > On Mon, Mar 6, 2017 at 1:01 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > Ok, yes I agree. In my experience with e2e/acceptance tests, they're
> best
> > > kept general with an emphasis on verifying that all the plumbing works
> > > together. So yes, there are definite edge cases I think we'll want to
> > test
> > > here, but I say that with the caveat that I think we should ideally
> cover
> > > as many non-happy-path cases in unit and integration tests as possible.
> > As
> > > an example, I don't think it makes sense to cover most of the profiler
> > > windowing DSL language edge cases in acceptance tests instead of or in
> > > addition to unit/integration tests unless there is something specific
> to
> > > the integration with a given an environment that we think could be
> > > problematic.
> > >
> > > M
> > >
> > > On Mon, Mar 6, 2017 at 11:32 AM, Casey Stella <ceste...@gmail.com>
> > wrote:
> > >
> > > > No, I'm saying that they shouldn't be restricted to real-world
> > use-cases.
> > > > The E2E tests I laid out weren't real-world, but they did exercise
> the
> > > > components similar to real-world use-cases.  They should also be able
> > to
> > > be
> > > > able to tread outside of the happy-path for those use-cases.
> > > >
> > > > On Mon, Mar 6, 2017 at 6:30 PM, Michael Miklavcic <
> > > > michael.miklav...@gmail.com> wrote:
> > > >
> > > > > "I don't think acceptance tests should loosely associate with real
> > > uses,
> > > > > but they should
> > > > > be free to delve into weird non-happy-pathways."
> > > > >
> > > > > Not following - are you saying they should *tightly* associate with
> > > real
> > > > > uses and additonally include non-happy-path?
> > > > >
> > > > > On Fri, Mar 3, 2017 at 12:57 PM, Casey Stella <ceste...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > It is absolutely not a naive question, Matt.  We don't have a lot
> > (or
> > > > > any)
> > > > > > docs about our integration tests; it's more of a "follow the
> lead"
> > > type
> > > > > of
> > > > > > thing at the moment, but that should be rectified.
> > > > > >
> > > > > > The integration tests spin up and down infrastructure in-process,
> > > some
> > > > of
> > > > > > which are real and some of which are mock versions of the
> services.
> > > > > These
> > > > > > are good for catching some types of bugs, but often things sneak
> > > > through,
> > > > > > like:
> > > > > >
> > > > > >- Hbase and storm can't exist in the same JVM, so HBase is
> > mocked
> > > in
> > > > > >those cases.
> > > > > >- The FileSystem that we get for Hadoop is the
> > LocalRawFileSystem,
> > > > not
> > > > > &g

Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-22 Thread Ryan Merriman
Bumping this thread.  Looks like we have several +1s so I propose we move
to the next step.  I'm anxious to get this done because these tests would
have saved me time over the last couple weeks.  The management UI in
https://github.com/apache/incubator-metron/pull/484 has a set of e2e tests
being maintained in another branch so those could also be included in this
test suite when the UI makes it into master.

Ideas for an "Acceptance Testing Framework"?  Could Ansible be good fit for
this since we already have it in our stack?

On Mon, Mar 6, 2017 at 1:01 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Ok, yes I agree. In my experience with e2e/acceptance tests, they're best
> kept general with an emphasis on verifying that all the plumbing works
> together. So yes, there are definite edge cases I think we'll want to test
> here, but I say that with the caveat that I think we should ideally cover
> as many non-happy-path cases in unit and integration tests as possible. As
> an example, I don't think it makes sense to cover most of the profiler
> windowing DSL language edge cases in acceptance tests instead of or in
> addition to unit/integration tests unless there is something specific to
> the integration with a given an environment that we think could be
> problematic.
>
> M
>
> On Mon, Mar 6, 2017 at 11:32 AM, Casey Stella  wrote:
>
> > No, I'm saying that they shouldn't be restricted to real-world use-cases.
> > The E2E tests I laid out weren't real-world, but they did exercise the
> > components similar to real-world use-cases.  They should also be able to
> be
> > able to tread outside of the happy-path for those use-cases.
> >
> > On Mon, Mar 6, 2017 at 6:30 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > "I don't think acceptance tests should loosely associate with real
> uses,
> > > but they should
> > > be free to delve into weird non-happy-pathways."
> > >
> > > Not following - are you saying they should *tightly* associate with
> real
> > > uses and additonally include non-happy-path?
> > >
> > > On Fri, Mar 3, 2017 at 12:57 PM, Casey Stella 
> > wrote:
> > >
> > > > It is absolutely not a naive question, Matt.  We don't have a lot (or
> > > any)
> > > > docs about our integration tests; it's more of a "follow the lead"
> type
> > > of
> > > > thing at the moment, but that should be rectified.
> > > >
> > > > The integration tests spin up and down infrastructure in-process,
> some
> > of
> > > > which are real and some of which are mock versions of the services.
> > > These
> > > > are good for catching some types of bugs, but often things sneak
> > through,
> > > > like:
> > > >
> > > >- Hbase and storm can't exist in the same JVM, so HBase is mocked
> in
> > > >those cases.
> > > >- The FileSystem that we get for Hadoop is the LocalRawFileSystem,
> > not
> > > >truly HDFS.  There are differences and we've run into
> > > them..hilariously
> > > > at
> > > >times. ;)
> > > >- Things done statically in a bolt are shared across all bolts
> > because
> > > >they all are threads in the same process
> > > >
> > > > It's good, it catches bugs, it lets us debug things easily, it runs
> > with
> > > > every single build automatically via travis.
> > > > It's bad because it's awkward to get the dependencies isolated
> > > sufficiently
> > > > for all of these components to get them to play nice in the same JVM.
> > > >
> > > > Acceptance tests would be run against a real cluster, so they would:
> > > >
> > > >- run against real components, not testing or mock components
> > > >- run against multiple nodes
> > > >
> > > > I can imagine a world where we can unify the two to a certain degree
> in
> > > > many cases if we could spin up a docker version of Metron to run as
> > part
> > > of
> > > > the build, but I think in the meantime, we should focus on providing
> > > both.
> > > >
> > > > I suspect the reference application is possibly inspiring my
> > suggestions
> > > > here, but I think the main difference here is that the reference
> > > > application is intended to be informational from a end-user
> > perspective:
> > > > it's detailing a use-case that users will understand.  I don't think
> > > > acceptance tests should loosely associate with real uses, but they
> > should
> > > > be free to delve into weird non-happy-pathways.
> > > >
> > > > On Fri, Mar 3, 2017 at 2:16 PM, Matt Foley  wrote:
> > > >
> > > > > Automating stuff that now has to be done manually gets a big +1.
> > > > >
> > > > > But, Casey, could you please clarify the relationship between what
> > you
> > > > > plan to do and the current “integration test” framework?  Will this
> > be
> > > in
> > > > > the form of additional integration tests? Or a different test
> > > framework?
> > > > > Can it be done in the integration test framework, rather than
> > creating
> > > > new
> > > > > mechanism?

Re: [DISCUSS] Stepping down as release manager

2017-03-21 Thread Ryan Merriman
+1 for Matt

On Tue, Mar 21, 2017 at 9:44 AM, Matt Foley  wrote:

> Casey, you’ve been a great release manager.  I know how much detail effort
> goes into this role.
>
> I am willing to serve as RM for the next while, if the community would
> like.  I was the RM for Hadoop for about a year, and in fact was RM for its
> 1.0 release. Granted that was a while ago, but overall process doesn’t seem
> to have changed much :-)
>
> Cheers,
> --Matt
>
> On 3/21/17, 7:32 AM, "Casey Stella"  wrote:
>
> Right, Billie is exactly right.  Working with the community to
> constructing
> releases that conform to apache standards and policies is the main
> duty.
> This will (hopefully) be our first set of releases outside of the
> incubator, so if I'm allowed to be biased, I'm hoping that someone with
> previous release management experience in other projects will
> volunteer.
> We're leaving the nest a bit and having an experienced hand at the
> tiller
> would be advantageous.
>
>
> On Tue, Mar 21, 2017 at 10:21 AM, Billie Rinaldi 
> wrote:
>
> > See http://www.apache.org/dev/release-publishing#release_manager and
> > http://www.apache.org/legal/release-policy.html for information on
> the
> > tasks that a release manager performs.
> >
> > On Tue, Mar 21, 2017 at 7:10 AM, Khurram Ahmed <
> khurramah...@gmail.com>
> > wrote:
> >
> > > Casey it would be helpful if you could outline the
> responsibilities of a
> > > release manager for the Metron project.
> > >
> > > On Mar 21, 2017 6:57 PM, "Casey Stella" 
> wrote:
> > >
> > > > I've been extremely honored to spend the last few months as the
> Metron
> > > > Release Manager.  That being said, my watch is ended and it's
> time for
> > > > another release manager to step into my place.
> > > >
> > > > Who would like to volunteer to be release manager for the next
> release
> > of
> > > > Metron?
> > > >
> > > > Best,
> > > >
> > > > Casey
> > > >
> > >
> >
>
>
>
>
>


Re: [GitHub] incubator-metron issue #477: METRON-766: Release 0.3.1

2017-03-16 Thread Ryan Merriman
The delete topic function is Kafka has had some issues in the past.  I
don't think it's critical we expose that through the REST API so I propose
we just take it out.  Any objections?

On Thu, Mar 16, 2017 at 3:52 PM, cestella  wrote:

> Github user cestella commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/477
>
> I'm kicking travis, but it appears that there are some intermittent
> test failures in the REST API tests.  In particular around deleting topics
> for the KafkaControllerIntegrationTest:
> ```
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.185
> sec <<< FAILURE! - in org.apache.metron.rest.controller.
> KafkaControllerIntegrationTest
> test(org.apache.metron.rest.controller.KafkaControllerIntegrationTest)
> Time elapsed: 9.13 sec  <<< ERROR!
> org.springframework.web.util.NestedServletException: Request
> processing failed; nested exception is kafka.common.
> TopicAlreadyMarkedForDeletionException: topic bro is already marked for
> deletion
> at org.springframework.web.servlet.FrameworkServlet.
> processRequest(FrameworkServlet.java:982)
> at org.springframework.web.servlet.FrameworkServlet.
> doDelete(FrameworkServlet.java:894)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:654)
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:846)
> at org.springframework.test.web.servlet.TestDispatcherServlet.
> service(TestDispatcherServlet.java:65)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
> at org.springframework.mock.web.MockFilterChain$
> ServletFilterProxy.doFilter(MockFilterChain.java:167)
> at org.springframework.mock.web.MockFilterChain.doFilter(
> MockFilterChain.java:134)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:317)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.access.
> ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:115)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.session.
> SessionManagementFilter.doFilter(SessionManagementFilter.java:137)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.authentication.
> AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.
> java:111)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.servletapi.
> SecurityContextHolderAwareRequestFilter.doFilter(
> SecurityContextHolderAwareRequestFilter.java:169)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.savedrequest.
> RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.authentication.www.
> BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.
> java:215)
> at org.springframework.web.filter.OncePerRequestFilter.
> doFilter(OncePerRequestFilter.java:107)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.authentication.logout.
> LogoutFilter.doFilter(LogoutFilter.java:121)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.header.HeaderWriterFilter.
> doFilterInternal(HeaderWriterFilter.java:66)
> at org.springframework.web.filter.OncePerRequestFilter.
> doFilter(OncePerRequestFilter.java:107)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.context.
> SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilt
> er.java:105)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.context.request.async.
> WebAsyncManagerIntegrationFilter.doFilterInternal(

Re: [VOTE] Metron to graduate to TLP

2017-03-13 Thread Ryan Merriman
+1 (binding)

> On Mar 13, 2017, at 6:05 PM, Casey Stella  wrote:
> 
> +1 (binding)
> 
>> On Mon, Mar 13, 2017 at 6:37 PM, James Sirota  wrote:
>> 
>> +1 (binding)
>> 
>> 13.03.2017, 15:37, "James Sirota" :
>>> Do we feel it's time for us to exit the Apache incubator and petition to
>> make Metron a TLP?
>>> 
>>> Please vote 1 for yes, -1 for no, 0 for neutral.
>>> 
>>> The vote will be open for 72 hours
>>> 
>>> ---
>>> Thank you,
>>> 
>>> James Sirota
>>> PPMC- Apache Metron (Incubating)
>>> jsirota AT apache DOT org
>> 
>> ---
>> Thank you,
>> 
>> James Sirota
>> PPMC- Apache Metron (Incubating)
>> jsirota AT apache DOT org
>> 


Re: [VOTE] Cesey Stella for Metron VP

2017-03-13 Thread Ryan Merriman
+1 (binding)

> On Mar 13, 2017, at 6:04 PM, Justin Leet  wrote:
> 
> +1 (non-binding)
> 
>> On Mon, Mar 13, 2017 at 6:35 PM, zeo...@gmail.com  wrote:
>> 
>> +1 (non-binding)
>> 
>>> On Mon, Mar 13, 2017 at 6:34 PM James Sirota  wrote:
>>> 
>>> +1 (binding)
>>> 
>>> 13.03.2017, 15:34, "James Sirota" :
 This vote is to make Casey Stella our VP after graduation
 
 ---
 Thank you,
 
 James Sirota
 PPMC- Apache Metron (Incubating)
 jsirota AT apache DOT org
>>> 
>>> ---
>>> Thank you,
>>> 
>>> James Sirota
>>> PPMC- Apache Metron (Incubating)
>>> jsirota AT apache DOT org
>> --
>> 
>> Jon
>> 


Re: metron-rest - what is the default user and password

2017-03-08 Thread Ryan Merriman
I will get that added to the README.

On Wed, Mar 8, 2017 at 1:22 PM, Otto Fowler  wrote:

> Never mind.  found it in the tests.
>
>
>
> On March 8, 2017 at 14:18:22, Otto Fowler (ottobackwa...@gmail.com) wrote:
>
> I’m trying to run the metron-interface/metron-rest against quickdev.
>
> I have started the rest server using mvn spring-boot:run
> -Drun.profiles=vagrant,dev -Dserver.port=8082
>
> but I have no idea what the user names to use are to log in.
>


Re: Metron Rest - where is reflections coming from?

2017-03-07 Thread Ryan Merriman
I have a good understanding of this since I spent a good amount of time
troubleshooting it.  Let me attempt to explain it.  Hopefully if everyone
clearly understands the core issue we can put our heads together for a
solution.

The metron-common module shades guava and is installed as an uber jar (all
dependencies included) instead of a regular jar (only metron-common code).
   Now consider that metron-rest has a dependency on org.reflections.
Normally you would add a dependency in the metron-rest pom stating the
version you need and Maven would figure it out.  But in this case Maven
includes both the org.reflections jar and metron-common uber jar (because
that's what's installed) in the classpath when building metron-rest.  The
problem is now we have 2 different versions of org.reflections in the
classpath:  a normal one and one whose guava dependencies have been
shaded.  The easiest solution (and the one that is currently being used) is
to just not directly include the normal version and let metron-rest use the
version that comes with metron-common.  The metron-rest module simply calls
the org.reflections classes and doesn't care which version of guava (shaded
or not) org.reflections depends on.

The reason Intellij has a problem with this is that it doesn't include the
metron-common jar installed in the local Maven repo (like mvn package
would), instead it includes the Intellij metron-common module.  This is how
Intellij works and reimporting the project will not help.  The workaround
for Intellij is this:

   1. Navigate to Project Structure > metron-rest > Dependencies
   2. Add a new dependency of type Library
   3. Create a new Library of type Java
   4. Select
   
"~/.m2/repository/org/apache/metron/metron-common/0.3.1/metron-common-0.3.1.jar"
   5. Apply

I can think of a couple solutions for this with varying degrees of
complexity:

   - We could replace org.reflections in metron-rest with something else.
   Probably the option with the lowest effort.
   - We could not use guava in metron-common, removing the need to shade
   it.  I don't think this is even an option because guava is all over the
   place and would take a lot of effort to remove.
   - Move hbase-client dependency out of metron-common.  This is the cause
   of our guava problems right?  HBase uses an old version that conflicts with
   never versions.  This would likely take some experimenting but the end
   result could make things a lot easier in general.  Right now depending on
   metron-common is difficult and it shouldn't be.  It's common code.

Hopefully this helps.  I will continue brainstorming.  Keep in mind that
this issue is not confined to metron-rest, any module depending on
metron-common could be impacted by this.

On Tue, Mar 7, 2017 at 7:39 AM, Casey Stella  wrote:

> It should get it from Metron common and IntelliJ is showingnme the same
> issue. I'm baffled by it, slightly. Maven builds just fine, so who knows.
> Its on my list of oddities to look at post-vacation.
> On Tue, Mar 7, 2017 at 12:57 Otto Fowler  wrote:
>
> >
> > https://github.com/apache/incubator-metron/blob/master/
> metron-interface/metron-rest/pom.xml
> >
> > I don’t we where the dependency for reflections is being set, and my
> build
> > is failing.  Am I missing something post merge?
> > Is it an intellij thing?
> >
>


Re: Metron Rest - where is reflections coming from?

2017-03-07 Thread Ryan Merriman
Here is an explanation:
https://github.com/apache/incubator-metron/pull/316#issuecomment-282791185

What exactly do you mean by "build is failing".  Are you trying to spin it
up in Intellij?  If so you'll need to add the metron-common jar that is
installed in your local Maven repo (the uber jar that includes the shaded
reflections library) to your dependencies.  Intellij is not aware of the
shade plugin when resolving Maven dependencies so you have to add it
manually.

If that doesn't do it for you let me know how I can help.

Ryan

On Tue, Mar 7, 2017 at 6:57 AM, Otto Fowler  wrote:

> https://github.com/apache/incubator-metron/blob/master/
> metron-interface/metron-rest/pom.xml
>
> I don’t we where the dependency for reflections is being set, and my build
> is failing.  Am I missing something post merge?
> Is it an intellij thing?
>


Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-03 Thread Ryan Merriman
+1, great idea.  At some point our manual testing checklist is going to
grow large enough that we'll need to move to something more automated.
We're probably already there.

I very much agree with Justin's concern.  Building and running
unit/integration tests takes a long time right now and this will increase
that time.  My request would be for these tests to be organized and
granular enough for me to focus on tests I believe exercise the feature I'm
working on.  The full suite would then be used to find regressions that may
not be obvious.  We also should be careful about when we add this to our
travis job.  I think optimizing our build and current tests would be a
prerequisite for that.

Do we want to enforce a passing rate of 100% for every PR?  I think
everyone would agree this is ideal but are there cases where this might not
be practical?

On Fri, Mar 3, 2017 at 7:44 AM, Casey Stella  wrote:

> That's a very good point.  I'm hoping that this can take the place of some
> of the more rigorous manual testing scripts that we have, so it's less time
> at the keyboard for reviewers.
>
> On Fri, Mar 3, 2017 at 8:41 AM, Justin Leet  wrote:
>
> > +1 to both.  Having this would especially ease a lot of testing that hits
> > multiple areas (which there is a fair amount of, given that we're
> building
> > pretty quickly).
> >
> > I do want to point out that adding this type of thing makes the speed of
> > our builds and tests more important, because they already take up a good
> > amount of time.  There are obviously tickets to optimize these things,
> but
> > I would like to make sure we don't pile too much on to every testing
> cycle
> > before a PR.  Having said that, I think the testing proposed is
> absolutely
> > valuable enough to go forward with.
> >
> > Justin
> >
> > On Fri, Mar 3, 2017 at 8:33 AM, Casey Stella  wrote:
> >
> > > I also propose, once this is done, that we modify the developer bylaws
> > and
> > > the github PR script to ensure that PR authors:
> > >
> > >- Update the acceptance tests where appropriate
> > >- Run the tests as a smoketest
> > >
> > >
> > >
> > > On Fri, Mar 3, 2017 at 8:21 AM, Casey Stella 
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > After doing METRON-744, where I had to walk through a manual test of
> > > every
> > > > place that Stellar touched, it occurred to me that we should script
> > this.
> > > > It also occurred to me that some scripts that are run by the PR
> author
> > to
> > > > ensure no regressions and, eventually maybe, even run on an INFRA
> > > instance
> > > > of Jenkins would give all of us some peace of mind.
> > > >
> > > > I am certain that this, along with a couple other manual tests from
> > other
> > > > PRs, could form the basis of a really great regression
> acceptance-test
> > > > suite and I'd like to propose that we do that, as a community.
> > > >
> > > > What I'd like to see from such a suite has the following
> > characteristics:
> > > >
> > > >- Can be run on any Metron cluster, including but not limited to
> > > >   - Vagrant
> > > >   - AWS
> > > >   - An existing deployment
> > > >- Can be *deployed* from ansible, but must be able to be deployed
> > > >manually
> > > >   - With instructions in the readme
> > > >- Tests should be idempotent and independent
> > > >   - Tear down what you set up
> > > >
> > > > I think between the Stellar REPL and the fundamental scriptability of
> > the
> > > > Hadoop services, we can accomplish these tests with a combination of
> > > shell
> > > > scripts and python.
> > > >
> > > > I propose we break this into the following parts:
> > > >
> > > >- Acceptance Testing Framework with a small smoketest
> > > >- Baseline Metron Test
> > > >   - Send squid data through the squid topology
> > > >   - Add an threat triage alert
> > > >   - Ensure it gets through to the other side with alerts
> preserved
> > > >- + Enrichment
> > > >   - Add an enrichment in the enrichment pipeline to the above
> > > >- + Profiler
> > > >   - Add a profile with a tick of 1 minute to count per
> destination
> > > >   address
> > > >- Base PCap test
> > > >   - Something like the manual test for METRON-743 (
> > > >   https://github.com/apache/incubator-metron/pull/467#
> > > issue-210285324
> > > >    > > issue-210285324>
> > > >   )
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > Best,
> > > >
> > > > Casey
> > > >
> > >
> >
>


Re: Travis CI Changes have broken my heart..... or my builds

2017-02-27 Thread Ryan Merriman
I run into this error every now and then.  If I free up space on my hard drive 
it goes away.

> On Feb 27, 2017, at 8:31 PM, Otto Fowler  wrote:
> 
> I took out the parallel processing and now I see the error:
> 
> ---
> 
> T E S T S
> 
> ---
> 
> Running org.apache.metron.maas.service.MaasIntegrationTest
> 
> 2017-02-28 00:15:17,056 ERROR [main]
> nodemanager.LocalDirsHandlerService
> (LocalDirsHandlerService.java:updateDirsAfterTest(356)) - Most of the
> disks failed. 1/1 local-dirs are bad:
> /home/travis/build/ottobackwards/incubator-metron/metron-analytics/metron-maas-service/target/MaasIntegrationTest/MaasIntegrationTest-localDir-nm-0_0;
> 1/1 log-dirs are bad:
> /home/travis/build/ottobackwards/incubator-metron/metron-analytics/metron-maas-service/target/MaasIntegrationTest/MaasIntegrationTest-logDir-nm-0_0
> 
> No output has been received in the last 10m0s, this potentially
> indicates a stalled build or something wrong with the build itself.
> 
> Check the details on how to adjust your build configuration on:
> https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
> 
> The build has been terminated
> 
> 
> 
> 
> Ring any bells?
> 
> On February 26, 2017 at 10:42:30, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
> 
> I think the problem is with overlapping jar analysis and shading
> 
> 
> On February 26, 2017 at 08:11:30, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
> 
> So, master builds fine, there is just something with my branch
> 
> 
> On February 25, 2017 at 09:31:22, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
> 
> Of course the same commands build locally
> 
> 
> 
> On February 25, 2017 at 09:17:50, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
> 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/205276680/log.txt
> 
> 
> On February 25, 2017 at 08:20:54, Casey Stella (ceste...@gmail.com) wrote:
> 
> I have not seen those "no output received for 10m errors before. Can you
> change the Travis command to not have -q for maven so we can see more
> context?
>> On Sat, Feb 25, 2017 at 07:56 Otto Fowler  wrote:
>> 
>> I have not had a build work on Travis CI ( linked to my fork ) in 4 days.
>> 
>> https://travis-ci.org/ottobackwards/incubator-metron/builds
>> 
>> This pretty much lines up with the last set of changes made to the travis
>> build. Is anyone else having this issue?
>> 


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC5

2017-02-25 Thread Ryan Merriman
When I go to Ambari to ensure the services are all up, HDFS is down.  I tried 
it 4 or 5 times and got the same result each time.  I've seen others validate 
with quick dev so I assumed full dev was not used anymore.  I'll spin it up 
this morning and get a more detailed error.

Is anyone else able to validate it in full dev?

> On Feb 25, 2017, at 7:21 AM, Casey Stella <ceste...@gmail.com> wrote:
> 
> What exactly are the errors that you saw, Ryan?
>> On Sat, Feb 25, 2017 at 07:31 David Lyle <dlyle65...@gmail.com> wrote:
>> 
>> Is there any reason full dev shouldn't be working?
>> 
>>> On Fri, Feb 24, 2017 at 9:19 PM, Casey Stella <ceste...@gmail.com> wrote:
>>> 
>>> Sounds like a good idea to me; thanks Ryan!
>>>> On Fri, Feb 24, 2017 at 21:11 Ryan Merriman <merrim...@gmail.com> wrote:
>>>> 
>>>> +1 binding
>>>> 
>>>> Verified the signature
>>>> Passed maven tests
>>>> Started quick-dev, verified data in ES, kibana, and checked the
>>> topologies
>>>> for errors (bro topology has parsing errors but I think a couple bad
>>>> messages in bro data set is normal)
>>>> Tested REPL
>>>> RPMs built fine
>>>> 
>>>> The recommended build validation wiki page (https://cwiki.apache.org/
>>>> confluence/display/METRON/Verifying+Builds
>>>> <https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds>)
>>>> has some mistakes.  This did
>>>> not run successfully in full-dev-platform and the HDFS paths look like
>>> they
>>>> are old.  I am happy to update the wiki page if everyone agrees these
>> are
>>>> legitimate mistakes.
>>>> 
>>>> On Fri, Feb 24, 2017 at 9:22 AM, Justin Leet <justinjl...@gmail.com>
>>>> wrote:
>>>> 
>>>>> +1 (non-binding)
>>>>> 
>>>>> Verified signature
>>>>> Ran build and tests in maven
>>>>> Ran up in quick-dev and saw data flow through topologies into the UI
>>>>> Ensured the REPL spun up and performed some basic tasks
>>>>> Built rpms
>>>>> 
>>>>> Justin
>>>>> 
>>>>>> On Thu, Feb 23, 2017 at 11:18 AM, Casey Stella <ceste...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> This is a call to vote on releasing Apache Metron 0.3.1-RC5
>>> incubating
>>>>>> 
>>>>>> Full list of changes in this release:
>>>>>> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>>>>>> 1-RC5-incubating/CHANGES
>>>>>> 
>>>>>> The tag/commit to be voted upon is apache-metron-0.3.1-rc5-
>>> incubating:
>>>>>> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
>>>>>> git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc5-incubating
>>>>>> 
>>>>>> The source archive being voted upon can be found here:
>>>>>> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>>>>>> 1-RC5-incubating/apache-metron-0.3.1-rc5-incubating.tar.gz
>>>>>> 
>>>>>> Other release files, signatures and digests can be found here:
>>>>>> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>>>>>> 1-RC5-incubating/
>>>>>> 
>>>>>> The release artifacts are signed with the following key:
>>>>>> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
>>>>>> git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d
>>>>> 9c260ba55e;hb=refs/tags/
>>>>>> apache-metron-0.3.1-rc5-incubating
>>>>>> 
>>>>>> The book associated with this RC is located at
>>>>>> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>>>>>> 1-RC5-incubating/book-site/index.html
>>>>>> 
>>>>>> Please vote on releasing this package as Apache Metron 0.3.1-RC5
>>>>> incubating
>>>>>> 
>>>>>> When voting, please list the actions taken to verify the release.
>>>>>> 
>>>>>> Recommended build validation and verification instructions are
>> posted
>>>>> here:
>> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>>>>>> 
>>>>>> 
>>>>>> This vote will be open for at least 72 hours.
>>>>>> 
>>>>>> [ ] +1 Release this package as Apache Metron 0.3.1-RC5 incubating
>>>>>> 
>>>>>> [ ]  0 No opinion
>>>>>> 
>>>>>> [ ] -1 Do not release this package because...
>> 


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC5

2017-02-24 Thread Ryan Merriman
+1 binding

Verified the signature
Passed maven tests
Started quick-dev, verified data in ES, kibana, and checked the topologies
for errors (bro topology has parsing errors but I think a couple bad
messages in bro data set is normal)
Tested REPL
RPMs built fine

The recommended build validation wiki page (https://cwiki.apache.org/
confluence/display/METRON/Verifying+Builds) has some mistakes.  This did
not run successfully in full-dev-platform and the HDFS paths look like they
are old.  I am happy to update the wiki page if everyone agrees these are
legitimate mistakes.

On Fri, Feb 24, 2017 at 9:22 AM, Justin Leet  wrote:

> +1 (non-binding)
>
> Verified signature
> Ran build and tests in maven
> Ran up in quick-dev and saw data flow through topologies into the UI
> Ensured the REPL spun up and performed some basic tasks
> Built rpms
>
> Justin
>
> On Thu, Feb 23, 2017 at 11:18 AM, Casey Stella  wrote:
>
> > This is a call to vote on releasing Apache Metron 0.3.1-RC5 incubating
> >
> > Full list of changes in this release:
> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > 1-RC5-incubating/CHANGES
> >
> > The tag/commit to be voted upon is apache-metron-0.3.1-rc5-incubating:
> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> > git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc5-incubating
> >
> > The source archive being voted upon can be found here:
> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > 1-RC5-incubating/apache-metron-0.3.1-rc5-incubating.tar.gz
> >
> > Other release files, signatures and digests can be found here:
> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > 1-RC5-incubating/
> >
> > The release artifacts are signed with the following key:
> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> > git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d
> 9c260ba55e;hb=refs/tags/
> > apache-metron-0.3.1-rc5-incubating
> >
> > The book associated with this RC is located at
> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > 1-RC5-incubating/book-site/index.html
> >
> > Please vote on releasing this package as Apache Metron 0.3.1-RC5
> incubating
> >
> > When voting, please list the actions taken to verify the release.
> >
> > Recommended build validation and verification instructions are posted
> here:
> > https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
> >
> >
> > This vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache Metron 0.3.1-RC5 incubating
> >
> > [ ]  0 No opinion
> >
> > [ ] -1 Do not release this package because...
> >
>


Re: [DISCUSS] Top domains enrichment config/extractor management

2017-02-24 Thread Ryan Merriman
+1 to an Ambari view over the management UI.  If we're going to go to the
trouble of exposing this feature through a UI it should be intuitive and
easy to use.  Simply exposing a json editor in Ambari gets a -1 from me.

Are we keeping track of which enrichments have been loaded?  I believe the
enrichment loader currently does this by adding a new enrichment type to
the various enrichment configs.  It's been a while since I've been in that
part of the code so please correct me if it has evolved since then.  If my
previous statement is true, then that's not ideal because a user should
have a list of available enrichments to pick from.  If we use separate
HBase tables for enrichment types then this problem goes away but if we
continue to use one HBase table then there needs to be some kind of
registry that is maintained by the enrichment loader.

On Fri, Feb 24, 2017 at 4:46 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> The reason I posed this question to the community is because I started to
> recognize some of the shortcomings of doing this solely through Ambari, as
> you and Nick have pointed out. I think an Ambari view over the management
> UI is a great idea. And I'd love to see us provide a more robust mechanism
> for loading these enrichments via the management UI. As you said, perhaps
> Ambari could be used to manage the ZK config around active
> enrichments/locations (the "USE" part of it) while the management UI is
> used for actually loading and managing the enrichments themselves?
>
>
> On Fri, Feb 24, 2017 at 8:12 AM, Casey Stella  wrote:
>
> > Late to chime in here, but I feel that we have discussed Ambari's role
> > before and I think we should probably clarify, as a community a few
> things
> > with regards Ambari vs a management UI built around the REST PR currently
> > under review.  (I promise, I will get to the topic at hand eventually ;)
> :
> >
> >- Where functionality should live
> >- Who is responsible for what
> >
> > I will now make a couple (possibly controversial) statements (some of
> > which) we have actually discussed prior to this on the dev list:
> >
> >
> >- I view Ambari as managing the install and the static configuration
> for
> >Metron.  For us, this would include zookeeper configs as well as
> > topology
> >configuration.  This would be the persistent store of truth.
> >- I view Zookeeper to be our runtime configuration store for the
> >topologies.
> >
> >
> >- I view a management UI (and the Stellar Shell) as managing
> >functionality for interacting with the system.  Where it changes
> >configuration, it must go through Ambari.
> >- I believe the management UI should be exposed as an ambari view
> >
> > As such, I see the importation and management of enrichments, which is a
> > data task, to be squarely in the purview of the management UI, whose job
> is
> > the care and feeding of the data.  That being said, any configuration
> > changes to USE the enrichment should at least be routed through ambari,
> but
> > should be managed in the UI.
> >
> > Now the question becomes, should we have enrichment collateral (I'm
> > including both hbase as well as geo or anything else we have) loaded at
> > install-time.  I would argue that we should not.  Rather, we should
> design
> > the management UI so that the enrichments can be added easily, with a
> > wizard to enable the use of the enrichment via stellar for a sensor
> >
> > On that topic, I think we are doing too much as part of our install.  I
> > would argue that we shouldn't pre-load even the geo data or depend on it
> > for the default parsers.
> >
> > Casey
> >
> >
> >
> > On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > With the work committed in
> > > https://github.com/apache/incubator-metron/pull/445 and
> > > https://github.com/apache/incubator-metron/pull/432, we now have a
> > robust
> > > and flexible means to import enrichment sources and transform their
> > > contents as they are inserted into HBase. One of the main motivators
> for
> > > this new functionality was to add the ability to load top domain
> rankings
> > > from sources such as Alexa. The proposal is to make this type of
> > enrichment
> > > a top-level feature in Metron by introducing it to the Ambari
> management
> > UI
> > > as a configurable set of properties in the MPack install. This comes
> with
> > > some options and challenges in how we want to manage the
> configurations,
> > > which I will outline below.
> > >
> > > *Use cases:*
> > >
> > >- Single load of top domains file
> > >- Re-loading top domains file - need to be able to cleanup properly
> > >- Cleaning up/deleting old enrichment data (this is a general
> feature
> > >that we currently lack - I think it is worth a separate Jira/PR for
> > >creating a MapReduce job that enables cleanup to occur).
> > >- Modifying default 

Re: [DISCUSS] Metron Alerts UI

2017-02-24 Thread Ryan Merriman
Related to the 'What does "Escalate" do' question, one topic that needs
some discussion is how we integrate with 3rd party ticketing systems.  How
should we design this extension point?  Some basic requirements could be
that a call is made to somewhere with the alert as the payload and some
kind of ticket or issue id is received as a response.  This is a very
open-ended question and there are likely several different ways we go do
it.

As for Casey's other points:

- The most obvious choice for alert id would be the id in elasticsearch.
Are there other ids we should consider?
- Configurable display fields makes a lot of sense to me and should not be
complex to implement.
- Agreed on offering intuitive ways to filter messages by fields.

Ryan

On Thu, Feb 23, 2017 at 6:42 PM, Casey Stella  wrote:

>- What does "Escalate" do exactly?
>- Where does the Alert ID come from?
>- Are the fields displayed configurable?
>- It'd be nice to be able to select a set of fields for a message and
>have the list of messages filter to just those where those fields are
> the
>same as the one viewed.
>
>
> On Thu, Feb 23, 2017 at 3:24 PM, Houshang Livian 
> wrote:
>
> > Hello Metron Community,
> >
> > We have mocked up an Alerts UI for Metron for your consideration. Please
> > take a look and share your thoughts.
> >
> > Here is a link to our thoughts on this:
> > http://imgur.com/a/KMTKN
> >
> > Does this look like a reasonable place to start?
> > Is there anything that is an absolute MUST have or MUST NOT have?
> >
> > Houshang Livian
> >
> >
> >
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Down to 24 minutes?  Nice job.

On Tue, Feb 7, 2017 at 1:49 PM, Casey Stella  wrote:

> I spent a minute or two looking at how we might use travis
> configuration-alone to drop the wall-clock time of the build and put it up
> for review at https://github.com/apache/incubator-metron/pull/444
>
> It does 2 things:
>
>- Separates the build, the unit tests and the integration tests
>- Parallelizes the unit tests and the build and runs the integration
>tests within the travis container
>- Runs the unit tests and integration tests in separate travis
>containers using travis' build matrix
>
> This ultimately cuts the wallclock time down to 24 minutes for me on travis
> and should give us some time where we're not constantly bouncing builds to
> act on the suggestions here.
>
>
> On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
> >
> > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
> >
> > > Absolutely agree. I also think we'd want both once we've done that.
> > Travis
> > > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> > runs
> > > of medium duration tests and would be great for automating our
> > distributed
> > > testing if we found infrastructure to support it. I've seen them used
> in
> > > concert to provide a good solution.
> > >
> > > But, initially, I'd like to see us get our in-process stuff replaced
> with
> > > docker where (if) it makes sense, refactored to run in parallel, the
> poms
> > > refactored to handle our dependencies better and our uber jars removed
> > > where they can be and minimized where they cannot be.
> > >
> > > Which, I think, is a long-winded way of saying "I'd like to see us do
> > what
> > > Casey suggested." :)
> > >
> > > -D...
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > I agree with this. I don't think we should switch to an alternate
> > system
> > > > until we find that we are absolutely incapable of eking out any
> further
> > > > efficiency from the current setup.
> > > >
> > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> > wrote:
> > > >
> > > > > I believe that some people use travis and some people request
> Jenkins
> > > > from
> > > > > Apache Infra.  That being said, personally, I think we should take
> > the
> > > > > opportunity to correct the underlying issues.  50 minutes for a
> build
> > > > seems
> > > > > excessive to me.
> > > > >
> > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Is there an alternative to Travis?  Do other like sized apache
> > > projects
> > > > > > have these problems?  Do they use travis?
> > > > > >
> > > > > >
> > > > > > On February 6, 2017 at 17:02:37, Casey Stella (
> ceste...@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > For those with pending/building pull requests, it will come as no
> > > > > surprise
> > > > > > that our build times are increasing at a pace that is worrisome.
> In
> > > > fact,
> > > > > > we have hit a fundamental limit associated with Travis over the
> > > > weekend.
> > > > > > We have creeped up into the 40+ minute build territory and travis
> > > seems
> > > > > to
> > > > > > error out at around 49 minutes.
> > > > > >
> > > > > > Taking the current build (
> > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > > looking
> > > > > at
> > > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > > seconds)
> > > > in
> > > > > > tests out of 44 minutes and 42 seconds to do the build. This
> places
> > > the
> > > > > > unit tests at around 43% of the build time. I say all of this to
> > > point
> > > > > out
> > > > > > that while unit tests are a portion of the build, they are not
> even
> > > the
> > > > > > majority of the build time. We need an approach that addresses
> the
> > > > whole
> > > > > > build performance holistically and we need it soonest.
> > > > > >
> > > > > > To seed the discussion, I will point to a few things that come to
> > > mind
> > > > > > that
> > > > > > fit into three broad categories:
> > > > > >
> > > > > > *Tests are Slow*
> > > > > >
> > > > > >
> > > > > > - *Tactical*: We have around 13 tests that take more than 30
> > seconds
> > > > and
> > > > > > make up 14 minutes of the build. Considering what we can do to
> > speed
> > > > > those
> > > > > > tests as a tactical approach may be worth considering
> > > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > > multiple
> > > > > > tests, instead use the docker infrastructure to spin them up once
> > and
> > > > > then
> > > > > > use them throughout the tests.
> > > > > >
> > > > > >
> > > > > > *Tests aren't parallel*
> > > > > >
> > > > > > Currently we cannot run 

Re: BulkMessageWriterBolt and MessageGetters

2017-02-07 Thread Ryan Merriman
You are correct, the BulkMessageWriterBolt/MessageGetters combination is
not flexible enough.  You would have to modify BulkMessageWriterBolt.  I
have addressed this in METRON-695 which will be submitted as a PR shortly.
It will be easy to do what you want after that is merged in.

Ryan

On Tue, Feb 7, 2017 at 1:24 PM, Nick Allen  wrote:

> I am trying to use the `BulkMessageWriterBolt` to write a specific tuple
> field named "measurement" to a Kafka topic.
>
> -   id: "kafkaBolt"
> className: "org.apache.metron.writer.bolt.BulkMessageWriterBolt"
> constructorArgs:
> - "${kafka.zk}"
> configMethods:
> -   name: "withMessageWriter"
> args:
> - ref: "kafkaWriter"
> -   name: "withMessageGetter"
> args:
> - "measurement"
>
> Rather than wanting the name of a field, it wants the name of a valid
> `MessageGetters` enum; either RAW or NAMED.  It seems like there is no way
> for me to plugin a `NamedMessageGetter` with a custom field name like
> "measurement".
>
> Am I missing something?  Is there a way to do this out-of-the-box?
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Debugging integration tests in an IDE uses the same approach with our
current infrastructure or with docker:  start up the topology with
LocalRunner.  I've had mixed success with our current infrastructure.  As
Mike alluded to, some tests work fine (most of the parser topologies and
enrichment topology) while others fail when run in my IDE but work on the
command line (ES integration test due to guava issues and Squid topology
due to some issue with the remove subdomains Stellar function).  Of course
with Docker infrastructure you will need a test runner to launch topologies
in LocalRunner.  They are short and simple though and I have one written
for each topology that I can share when appropriate.

There are some advantages and disadvantages to switching the integration
tests to use Docker.  The infrastructure we have now works and could be
adjusted to overcome it's primary weaknesses (single classloader and start
up/shutdown after each test).  With Docker the classloader issue goes away
for the most part (or is much better than it is now) without any extra
work.  For spinning services up/down once instead of with each test, we
will need to adjust our tests to clean up after themselves or (even better)
namespace all testing objects so that tests don't step on each other.  That
work would have to be done no matter which infrastructure approach we
take.  Probably the biggest downside to using Docker is that all
integration tests will need to be adjusted and we'll likely hit some issues
that we'll need to resolve.  I was bitten several times by services that
broadcast their host address (Kafka for example) and I bet we hit more of
those.  We'll also need to add a few more containers (HDFS for sure) but
those are easy to create as long as you don't hit the issue I just
mentioned.

I think all of the suggestions so far are good ideas.  I think it goes
without saying that we should do one at a time and maybe even reassess
after we see the impact of each change.  I would vote for doing the
Maven/shading one first because it is all around beneficial, even outside
of this context.

On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to 

Re: [Discuss] Direction of metron-docker

2017-02-06 Thread Ryan Merriman
Matt, if you want it to be automated you will need to use our integration
testing framework since Docker isn't a part of our build process right
now.  If you want to use Docker to aid in your development and get you to a
point where things are working as expected, I think it would work well for
that.

Ryan

On Mon, Feb 6, 2017 at 12:29 PM, Matt Foley <ma...@apache.org> wrote:

> There may be another area of application for this.  I’m not certain, so
> tell me if I’m off base.
>
> In the context of METRON-322 (adding batchTimeout and
> 'topology.tick.tuple.freq.secs' to BulkMessageWriterBolt), there are some
> fairly obvious unit tests needed, but I find them inadequate to give me
> certainty that some fairly complex-interaction changes are actually doing
> what they’re supposed to be doing.  So I was thinking how to do integration
> testing of only a Bolt component (or worst case, only an Indexing topology)
> outside of spinning up a whole Metron / Hadoop Stack cluster.  I wasn’t
> coming up with any good answers :-) so I was about to ask the list anyway.
>
> Is it feasible/advisable to use Docker to do automated integration testing
> of small chunks of Metron, such as a single component or a single
> topology?  What’s doable?  Other better ideas?
>
> Thanks,
> --Matt
>
>
> On 2/6/17, 10:03 AM, "Ryan Merriman" <merrim...@gmail.com> wrote:
>
> From the README:
>
> "Metron Docker is a Docker Compose application that is intended for
> development and integration testing of Metron. Use this instead of
> Vagrant
> when:
>
> - You want an environment that can be built and spun up quickly
> - You need to frequently rebuild and restart services
> - You only need to test, troubleshoot or develop against a subset of
> services"
>
> The "Quick Dev" environment actually serves 2 purposes:  a development
> environment and an end-to-end testing environment.  This module was
> intended to supplement or provide an alternative to the development
> environment part of "Quick Dev", not the end-to-end testing part.  It
> does
> have "Docker" in the name of the module so I can see how that might
> suggest
> a fully supported deployment option.  It shouldn't be used for that
> though
> because it doesn't include Ambari or MPack and isn't a true
> representation
> of a production Metron cluster.
>
> What is the direction?  I could see this evolving into a collection of
> profiles or recipes.  Need to development a custom parser?  Spin up an
> application that only includes the Storm, Kafka and Zookeeper images.
> Want
> to develop a custom Kibana dashboard?  Spin up Elasticsearch and Kibana
> images preloaded with data.  Maybe an analytics profile could be
> created
> that only includes the tools you need for that?  The application that
> exists now in metron-docker could be considered a "rest" profile or a
> collection of containers that support all the functions of the rest
> API.
> It's very general purpose and supports a lot of use cases so I
> considered
> it a good starting point.  It's very useful if you're developing a UI
> and
> have limited knowledge of Ambari or big data platform services.  That
> was
> the initial motivation.
>
> I think you should view this as more of a toolbox and not a turnkey
> installation solution.  Maintaining and building development
> environments
> is something Docker is a really good fit for and I have found this
> works
> much better than our Ansible/Vagrant environment.  It's really fast and
> stays up all the time.
>
> But it's completely optional.  Use it if you think it will help you.
> Or
> don't if "Quick Dev" is good enough and you've figured out how to tune
> it
> so that it's not completely unusable.  If everybody thinks it's
> confusing
> and no one uses it then we can take it out and I'll just go back to
> maintaining it privately.  But then I would miss out on Kyle's awesome
> contribution :)
>
> Ryan
>
> On Mon, Feb 6, 2017 at 10:12 AM, Nick Allen <n...@nickallen.org>
> wrote:
>
> > So what is the direction then, Ryan?  Can you describe what this is
> > supposed to be used for?
> >
> > I had thought people wanted this to replace the existing
> Vagrant-based
> > "Quick Dev"?  But apparently this is the assumption that you think I
> am
> > wrong on.
> >
> >
> >
> > On Mon, Feb 6, 2017 at 10:46 AM, Ryan Merriman <merrim...@gmail.co

Re: [MENTORS] ICLA for non-committer contributions

2017-01-27 Thread Ryan Merriman
What would our process be if someone did contribute a commit to a pull
request?

On Fri, Jan 27, 2017 at 4:02 PM, Casey Stella  wrote:

> Yes, we should definitely not destroy authorship information.  To my
> knowledge that hasn't happened yet and we should ensure it does not happen
> in the future.
>
> On Fri, Jan 27, 2017 at 5:00 PM, P. Taylor Goetz 
> wrote:
>
> > IMO, that’s okay as long as the commits in a pull request are from a
> > single author. But it is possible for pull requests to contain commits
> from
> > multiple authors. If you squash those commits, you are potentially
> > destroying authorship information, which I would advise against.
> >
> > -Taylor
> >
> > > On Jan 27, 2017, at 4:51 PM, Casey Stella  wrote:
> > >
> > > Just so we're clear, we do squash commits upon merge (we followed the
> > suit
> > > of Apache Mahout and use --squash as described at
> > > https://mahout.apache.org/developers/github.html#
> merging-a-pr-yours-or-
> > contributors),
> > > but we do not merge commits from multiple people into a single commit.
> > I'm
> > > guessing that's kosher, but it's something we probably should clarify.
> > >
> > > On Fri, Jan 27, 2017 at 4:46 PM, P. Taylor Goetz 
> > wrote:
> > >
> > >> While it certainly doesn’t hurt to have one, it’s not strictly
> required.
> > >> It *is* required for committers though.
> > >>
> > >> When you merge a pull request, the authorship information is
> maintained.
> > >> Just make sure you don’t squash other people’s commits.
> > >>
> > >> -Taylor
> > >>
> > >>> On Jan 27, 2017, at 4:36 PM, Casey Stella 
> wrote:
> > >>>
> > >>> Hi Mentors,
> > >>>
> > >>> I was wondering if you could help me settle a question.  What is the
> > >> ASF's
> > >>> stance on ICLAs for non-committer contributions?  Are they required?
> > >>>
> > >>> On the one hand, https://www.apache.org/dev/
> committers.html#applying-
> > >> patches
> > >>> requires only that we attribute appropriately to form a legal
> > papertrail
> > >>> via the git history.  Also, this discussion (
> > >>> http://marc.info/?l=incubator-general=142175320215392=2) seems
> to
> > >>> indicate that they are not required.
> > >>>
> > >>> On the other hand, http://www.apache.org/licenses/#clas indicates
> that
> > >> the
> > >>> ASF "desires" ICLAs for contribution.  I also see some projects
> > requiring
> > >>> them (i.e. flink and brooklyn) of contributors.
> > >>>
> > >>> Thanks in advance for the clarification!
> > >>>
> > >>> Best,
> > >>>
> > >>> Casey
> > >>
> > >>
> >
> >
>


Re: [DISCUSS] Gratuating to Apache Top Level Project

2017-01-26 Thread Ryan Merriman
I think we're ready +1

On Wed, Jan 25, 2017 at 7:06 PM, Nick Allen  wrote:

> +1 I think we clearly meet all of those criteria.  Glad to see the project
> mature and grow.
>
> On Mon, Jan 23, 2017 at 7:09 PM, James Sirota  wrote:
>
> > I think the Apache Incubation was very valuable learning experience for
> > us, but it seems like we are ready to become a top-level project.  We
> have
> > been in incubation since 2015-12-06 and under the guidance of our Mentors
> > we had a clean build (0.3.0), we learned how to function well as a
> > community (as defined by Apache), and if you look at our maturity level
> > checklist we meet the criteria of the a mature Apache project. (
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=66852119)
> >
> >
> > Do you think we are ready to graduate?  Should we start putting a case
> > together for graduation?
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>
>
>
> --
> Nick Allen 
>


Re: [DISCUSS] Error Indexing

2017-01-26 Thread Ryan Merriman
Jon, I misread the code in the GenericEnrichmentBolt.  The error is
forwarded on so no issues there.

Defaulting to the common fields makes sense.  I will dig into the
GenericEnrichmentBolt more, maybe there is a way to get the error fields
without having to significantly change things.  Any opinion on a hashing
algorithm?

On Wed, Jan 25, 2017 at 9:37 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> Although hashing the whole message is better than nothing, it misses a lot
> of the benefits we could get.
>
> While I'd love to have consistency for this field across all of the
> different error.types, it appears that may not be reasonably possible
> because of the parsers.  So, how about something like hash all of the
> constant
> fields
> <https://github.com/apache/incubator-metron/blob/master/
> metron-platform/metron-common/src/main/java/org/apache/
> metron/common/Constants.java>
> excluding
> timestamp and original_string unless it is a parser, in which case hash the
> entire message?  This gives us some measure of event uniqueness and it can
> grow as we define additional constant fields (I recall discussing with
> someone else on the list regarding expanding those standard fields to
> include things like usernames but I can't find the specific email
> exchange).
>
> Because some enrichments can be heavily relied on, I think it makes sense
> to put a message onto the error queue when it throws an exception.  Not
> only does this help troubleshoot edge cases, but it makes issues more
> obvious when assembling a new enrichment in dev/test.  I can't think of a
> scenario currently where an enrichment would only be "best effort" and that
> I wouldn't want that error indexed and retrievable.  However, this gets
> interesting when talking about the various options to solve the "Enrich
> enrichment" discussion from earlier in the month.  We can keep that part of
> this separate though, as I don't think that's being actively pursued right
> now.
>
> Jon
>
> On Wed, Jan 25, 2017 at 10:49 AM David Lyle <dlyle65...@gmail.com> wrote:
>
> RE: separate JIRA for MPack/Ansible. No objection to tracking them
> separately, but for this item to be complete, you'll need both the feature
> and the ability to install it.
>
> -D...
>
>
> On Tue, Jan 24, 2017 at 5:33 PM, Ryan Merriman <merrim...@gmail.com>
> wrote:
>
> > Assuming we're going to write all errors to a single error topic, I think
> > it makes sense to agree on an error message schema and handle errors
> across
> > the 3 different topologies in the same way with a single implementation.
> > The implementation in ParserBolt (ErrorUtils.handleError) produces the
> most
> > verbose error object so I think it's a good candidate for the single
> > implementation.  Here is the message structure it currently produces:
> >
> > {
> >   "exception": "java.lang.Exception: there was an error",
> >   "hostname": "host",
> >   "stack": "java.lang.Exception: ...",
> >   "time": 1485295416563,
> >   "message": "there was an error",
> >   "rawMessage": "raw message",
> >   "rawMessage_bytes": [],
> >   "source.type": "bro_error"
> > }
> >
> > From our discussion so far we need to add a couple fields:  an error type
> > and hash id.  Adding these to the message looks like:
> >
> > {
> >   "exception": "java.lang.Exception: there was an error",
> >   "hostname": "host",
> >   "stack": "java.lang.Exception: ...",
> >   "time": 1485295416563,
> >   "message": "there was an error",
> >   "rawMessage": "raw message",
> >   "rawMessage_bytes": [],
> >   "source.type": "bro_error",
> >   "error.type": "parser_error",
> >   "rawMessage_hash": "dde41b9920954f94066daf6291fb58a9"
> > }
> >
> > We should also consider expanding the error types I listed earlier.
> > Instead of just having "indexing_error" we could have
> > "elasticsearch_indexing_error", "hdfs_indexing_error" and so on.
> >
> > Jon, if an exception happens in an enrichment or threat intel bolt the
> > message is passed along with no error thrown (only logged).  Everywhere
> > else I'm having trouble identifying specific fields that should be
> hashed.
> > Would hashing the message in every case be acceptable?  Do you know of a
> > place where we

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
Assuming we're going to write all errors to a single error topic, I think
it makes sense to agree on an error message schema and handle errors across
the 3 different topologies in the same way with a single implementation.
The implementation in ParserBolt (ErrorUtils.handleError) produces the most
verbose error object so I think it's a good candidate for the single
implementation.  Here is the message structure it currently produces:

{
  "exception": "java.lang.Exception: there was an error",
  "hostname": "host",
  "stack": "java.lang.Exception: ...",
  "time": 1485295416563,
  "message": "there was an error",
  "rawMessage": "raw message",
  "rawMessage_bytes": [],
  "source.type": "bro_error"
}

>From our discussion so far we need to add a couple fields:  an error type
and hash id.  Adding these to the message looks like:

{
  "exception": "java.lang.Exception: there was an error",
  "hostname": "host",
  "stack": "java.lang.Exception: ...",
  "time": 1485295416563,
  "message": "there was an error",
  "rawMessage": "raw message",
  "rawMessage_bytes": [],
  "source.type": "bro_error",
  "error.type": "parser_error",
  "rawMessage_hash": "dde41b9920954f94066daf6291fb58a9"
}

We should also consider expanding the error types I listed earlier.
Instead of just having "indexing_error" we could have
"elasticsearch_indexing_error", "hdfs_indexing_error" and so on.

Jon, if an exception happens in an enrichment or threat intel bolt the
message is passed along with no error thrown (only logged).  Everywhere
else I'm having trouble identifying specific fields that should be hashed.
Would hashing the message in every case be acceptable?  Do you know of a
place where we could hash a field instead?  On the topic of exceptions in
enrichments, are we ok with an error only being logged and not added to the
message or emitted to the error queue?



On Tue, Jan 24, 2017 at 3:10 PM, Ryan Merriman <merrim...@gmail.com> wrote:

> That use case makes sense to me.  I don't think it will require that much
> additional effort either.
>
> On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
>
>> Regarding error vs validation - Either way I'm not very concerned.  I
>> initially assumed they would be combined and agree with that approach, but
>> splitting them out isn't a very big deal to me either.
>>
>> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
>> where it's not possible to pick out the exact thing causing the issue) it
>> would be a hash of the complete message.
>>
>> Regarding the architecture, I mostly agree with James except that I think
>> step 3 needs to also be able to somehow group errors via the original
>> data (identify
>> replays, identify repeat issues with data in a specific field, issues with
>> consistently different data, etc.).  This is essentially the first step of
>> troubleshooting, which I assume you are doing if you're looking at the
>> error dashboard.
>>
>> If the hash gets moved out of the initial implementation, I'm fairly
>> certain you lose this ability.  The point here isn't to handle long fields
>> (although that's a benefit of this approach), it's to attach a unique
>> identifier to the error/validation issue message that links it to the
>> original problem.  I'd be happy to consider alternative solutions to this
>> problem (for instance, actually sending across the data itself) I just
>> haven't been able to think of another way to do this that I like better.
>>
>> Jon
>>
>> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrim...@gmail.com>
>> wrote:
>>
>> > We also need a JIRA for any install/Ansible/MPack work needed.
>> >
>> > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org>
>> wrote:
>> >
>> > > Now that I had some time to think about it I would collapse all error
>> and
>> > > validation topics into one.  We can differentiate between different
>> views
>> > > of the data (split by error source etc) via Kibana dashboards.  I
>> would
>> > > implement this feature incrementally.  First I would modify all the
>> bolts
>> > > to log to a single topic.  Second, I would get the error indexing
>> done by
>> > > attaching the indexing topology to the error topic. Third I would
>> create
>> > > the

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
That use case makes sense to me.  I don't think it will require that much
additional effort either.

On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> Regarding error vs validation - Either way I'm not very concerned.  I
> initially assumed they would be combined and agree with that approach, but
> splitting them out isn't a very big deal to me either.
>
> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
> where it's not possible to pick out the exact thing causing the issue) it
> would be a hash of the complete message.
>
> Regarding the architecture, I mostly agree with James except that I think
> step 3 needs to also be able to somehow group errors via the original
> data (identify
> replays, identify repeat issues with data in a specific field, issues with
> consistently different data, etc.).  This is essentially the first step of
> troubleshooting, which I assume you are doing if you're looking at the
> error dashboard.
>
> If the hash gets moved out of the initial implementation, I'm fairly
> certain you lose this ability.  The point here isn't to handle long fields
> (although that's a benefit of this approach), it's to attach a unique
> identifier to the error/validation issue message that links it to the
> original problem.  I'd be happy to consider alternative solutions to this
> problem (for instance, actually sending across the data itself) I just
> haven't been able to think of another way to do this that I like better.
>
> Jon
>
> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrim...@gmail.com> wrote:
>
> > We also need a JIRA for any install/Ansible/MPack work needed.
> >
> > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org>
> wrote:
> >
> > > Now that I had some time to think about it I would collapse all error
> and
> > > validation topics into one.  We can differentiate between different
> views
> > > of the data (split by error source etc) via Kibana dashboards.  I would
> > > implement this feature incrementally.  First I would modify all the
> bolts
> > > to log to a single topic.  Second, I would get the error indexing done
> by
> > > attaching the indexing topology to the error topic. Third I would
> create
> > > the necessary dashboards to view errors and validation failures by
> > source.
> > > Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> > > fields that are too long.  It seems like a separate feature that we
> need
> > to
> > > think through.  We may need a stellar function around that.
> > >
> > > Thanks,
> > > James
> > >
> > > 24.01.2017, 10:25, "Ryan Merriman" <merrim...@gmail.com>:
> > > > I understand what Jon is talking about. He's proposing we hash the
> > value
> > > > that caused the error, not necessarily the error message itself. For
> an
> > > > enrichment this is easy. Just pass along the field value that failed
> > > > enrichment. For other cases the field that caused the error may not
> be
> > so
> > > > obvious. Take parser validation for example. The message is validated
> > as
> > > > a whole and it may not be easy to determine which field is the cause.
> > In
> > > > that case would a hash of the whole message work?
> > > >
> > > > There is a broader architectural discussion that needs to happen
> before
> > > we
> > > > can implement this. Currently we have an indexing topology that reads
> > > from
> > > > 1 topic and writes messages to ES but errors are written to several
> > > > different topics:
> > > >
> > > >- parser_error
> > > >- parser_invalid
> > > >- enrichments_error
> > > >- threatintel_error
> > > >- indexing_error
> > > >
> > > > I can see 4 possible approaches to implementing this:
> > > >
> > > >1. Create an index topology for each error topic
> > > >   1. Good because we can easily reuse the indexing topology and
> > would
> > > >   require the least development effort
> > > >   2. Bad because it would consume a lot of extra worker slots
> > > >2. Move the topic name into the error JSON message as a new
> > > "error_type"
> > > >field and write all messages to the indexing topic
> > > >   1. Good because we don't need to create a new topology
> > > >   2. Ba

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
We also need a JIRA for any install/Ansible/MPack work needed.

On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org> wrote:

> Now that I had some time to think about it I would collapse all error and
> validation topics into one.  We can differentiate between different views
> of the data (split by error source etc) via Kibana dashboards.  I would
> implement this feature incrementally.  First I would modify all the bolts
> to log to a single topic.  Second, I would get the error indexing done by
> attaching the indexing topology to the error topic. Third I would create
> the necessary dashboards to view errors and validation failures by source.
> Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> fields that are too long.  It seems like a separate feature that we need to
> think through.  We may need a stellar function around that.
>
> Thanks,
> James
>
> 24.01.2017, 10:25, "Ryan Merriman" <merrim...@gmail.com>:
> > I understand what Jon is talking about. He's proposing we hash the value
> > that caused the error, not necessarily the error message itself. For an
> > enrichment this is easy. Just pass along the field value that failed
> > enrichment. For other cases the field that caused the error may not be so
> > obvious. Take parser validation for example. The message is validated as
> > a whole and it may not be easy to determine which field is the cause. In
> > that case would a hash of the whole message work?
> >
> > There is a broader architectural discussion that needs to happen before
> we
> > can implement this. Currently we have an indexing topology that reads
> from
> > 1 topic and writes messages to ES but errors are written to several
> > different topics:
> >
> >- parser_error
> >- parser_invalid
> >- enrichments_error
> >- threatintel_error
> >- indexing_error
> >
> > I can see 4 possible approaches to implementing this:
> >
> >1. Create an index topology for each error topic
> >   1. Good because we can easily reuse the indexing topology and would
> >   require the least development effort
> >   2. Bad because it would consume a lot of extra worker slots
> >2. Move the topic name into the error JSON message as a new
> "error_type"
> >field and write all messages to the indexing topic
> >   1. Good because we don't need to create a new topology
> >   2. Bad because we would be flowing data and errors through the same
> >   topology. A spike in errors could affect message indexing.
> >3. Compromise between 1 and 2. Create another indexing topology that
> is
> >dedicated to indexing errors. Move the topic name into the error JSON
> >message as a new "error_type" field and write all errors to a single
> error
> >topic.
> >4. Write a completely new topology with multiple spouts (1 for each
> >error type listed above) that all feed into a single
> BulkMessageWriterBolt.
> >   1. Good because the current topologies would not need to change
> >   2. Bad because it would require the most development effort, would
> >   not reuse existing topologies and takes up more worker slots than 3
> >
> > Are there other approaches I haven't thought of? I think 1 and 2 are off
> > the table because they are shortcuts and not good long-term solutions. 3
> > would be my choice because it introduces less complexity than 4.
> Thoughts?
> >
> > Ryan
> >
> > On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
> >
> >>  In that case the hash would be of the value in the IP field, such as
> >>  sha3(8.8.8.8).
> >>
> >>  Jon
> >>
> >>  On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsir...@apache.org> wrote:
> >>
> >>  > Jon,
> >>  >
> >>  > I am still not entirely following why we would want to use hashing.
> For
> >>  > example if my error is "Your IP field is invalid and failed
> validation"
> >>  > hashing this error string will always result in the same hash. Why
> not
> >>  > just use the actual error string? Can you provide an example where
> you
> >>  > would use it?
> >>  >
> >>  > Thanks,
> >>  > James
> >>  >
> >>  > 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  > > For 1 - I'm good with that.
> >>  > >
> >>  > > I'm talking about

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
I understand what Jon is talking about.  He's proposing we hash the value
that caused the error, not necessarily the error message itself.  For an
enrichment this is easy.  Just pass along the field value that failed
enrichment.  For other cases the field that caused the error may not be so
obvious.  Take parser validation for example.  The message is validated as
a whole and it may not be easy to determine which field is the cause.  In
that case would a hash of the whole message work?

There is a broader architectural discussion that needs to happen before we
can implement this.  Currently we have an indexing topology that reads from
1 topic and writes messages to ES but errors are written to several
different topics:

   - parser_error
   - parser_invalid
   - enrichments_error
   - threatintel_error
   - indexing_error

I can see 4 possible approaches to implementing this:

   1. Create an index topology for each error topic
  1. Good because we can easily reuse the indexing topology and would
  require the least development effort
  2. Bad because it would consume a lot of extra worker slots
   2. Move the topic name into the error JSON message as a new "error_type"
   field and write all messages to the indexing topic
  1. Good because we don't need to create a new topology
  2. Bad because we would be flowing data and errors through the same
  topology.  A spike in errors could affect message indexing.
   3. Compromise between 1 and 2.  Create another indexing topology that is
   dedicated to indexing errors.  Move the topic name into the error JSON
   message as a new "error_type" field and write all errors to a single error
   topic.
   4. Write a completely new topology with multiple spouts (1 for each
   error type listed above) that all feed into a single BulkMessageWriterBolt.
  1. Good because the current topologies would not need to change
  2. Bad because it would require the most development effort, would
  not reuse existing topologies and takes up more worker slots than 3

Are there other approaches I haven't thought of?  I think 1 and 2 are off
the table because they are shortcuts and not good long-term solutions.  3
would be my choice because it introduces less complexity than 4.  Thoughts?

Ryan


On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com  wrote:

> In that case the hash would be of the value in the IP field, such as
> sha3(8.8.8.8).
>
> Jon
>
> On Mon, Jan 23, 2017, 6:41 PM James Sirota  wrote:
>
> > Jon,
> >
> > I am still not entirely following why we would want to use hashing.  For
> > example if my error is "Your IP field is invalid and failed validation"
> > hashing this error string will always result in the same hash.  Why not
> > just use the actual error string? Can you provide an example where you
> > would use it?
> >
> > Thanks,
> > James
> >
> > 23.01.2017, 16:29, "zeo...@gmail.com" :
> > > For 1 - I'm good with that.
> > >
> > > I'm talking about hashing the relevant content itself not the error.
> Some
> > > benefits are (1) minimize load on search index (there's minimal benefit
> > in
> > > spending the CPU and disk to keep it at full fidelity (tokenize and
> > store))
> > > (2) provide something to key on for dashboards (assuming a good hash
> > > algorithm that avoids collisions and is second preimage resistant) and
> > (3)
> > > specific to errors, if the issue is that it failed to index, a hash
> gives
> > > us some protection that the issue will not occur twice.
> > >
> > > Jon
> > >
> > > On Mon, Jan 23, 2017, 2:47 PM James Sirota  wrote:
> > >
> > > Jon,
> > >
> > > With regards to 1, collapsing to a single dashboard for each would be
> > > fine. So we would have one error index and one "failed to validate"
> > > index. The distinction is that errors would be things that went wrong
> > > during stream processing (failed to parse, etc...), while validation
> > > failures are messages that explicitly failed stellar validation/schema
> > > enforcement. There should be relatively few of the second type.
> > >
> > > With respect to 3, why do you want the error hashed? Why not just
> search
> > > for the error text?
> > >
> > > Thanks,
> > > James
> > >
> > > 20.01.2017, 14:01, "zeo...@gmail.com" :
> > >>  As someone who currently fills the platform engineer role, I can give
> > this
> > >>  idea a huge +1. My thoughts:
> > >>
> > >>  1. I think it depends on exactly what data is pushed into the index
> > (#3).
> > >>  However, assuming the errors you proposed recording, I can't see huge
> > >>  benefits to having more than one dashboard. I would be happy to be
> > >>  persuaded otherwise.
> > >>
> > >>  2. I would say yes, storing the errors in HDFS in addition to
> indexing
> > is
> > >>  a good thing. Using METRON-510
> > >>   as a case study,
> > there
> > >>  is the potential 

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-13 Thread Ryan Merriman
you reach in and edit Ambari’s
> > files,
> > > > it
> > > > > > will
> > > > > > > > > Error
> > > > > > > > > > out if the set of parameters or parameter names changes.
> > The
> > > > > > > > historical
> > > > > > > > > > information about configuration changes is also stored in
> > the
> > > > db.
> > > > > > > > > > For each component (and in the case of Metron, for each
> > > > > topology),
> > > > > > > > there
> > > > > > > > > > is a python file which controls the logic for these
> > actions,
> > > > > among
> > > > > > > > > others:
> > > > > > > > > > - Install
> > > > > > > > > > - Start / stop / restart / status
> > > > > > > > > > - Configure
> > > > > > > > > >
> > > > > > > > > > It is actually up to this python code (which we wrote for
> > the
> > > > > > Metron
> > > > > > > > > > Mpack) what happens in each of these API calls.  But the
> > > > current
> > > > > > > code,
> > > > > > > > > and
> > > > > > > > > > I believe this is typical of Ambari-managed components,
> > > > performs
> > > > > a
> > > > > > > > > > “Configure” action whenever you press the “Save” button
> > after
> > > > > > > changing
> > > > > > > > a
> > > > > > > > > > component config in Ambari, and also on each Install and
> > > Start
> > > > or
> > > > > > > > > Restart.
> > > > > > > > > >
> > > > > > > > > > The Configure action consists of approximately the
> > following
> > > > > > sequence
> > > > > > > > > (see
> > > > > > > > > > disclaimer above :-)
> > > > > > > > > > - Recreate the generated config files, using the template
> > > files
> > > > > and
> > > > > > > the
> > > > > > > > > > actual configuration most recently set in Ambari
> > > > > > > > > > o Note this is also under the control of python code that
> > we
> > > > > wrote,
> > > > > > > and
> > > > > > > > > > this is the appropriate place to push to ZK if desired.
> > > > > > > > > > - Propagate those config files to each Ambari-agent,
> with a
> > > > > command
> > > > > > > to
> > > > > > > > > set
> > > > > > > > > > them locally
> > > > > > > > > > - The ambari-agents on each node receive the files and
> > write
> > > > them
> > > > > > to
> > > > > > > > the
> > > > > > > > > > specified locations on local storage
> > > > > > > > > >
> > > > > > > > > > Ambari-server then whines that the updated services
> should
> > be
> > > > > > > > restarted,
> > > > > > > > > > but does not initiate that action itself (unless of
> course
> > > the
> > > > > > > > initiating
> > > > > > > > > > action was a Start command from the administrator).
> > > > > > > > > >
> > > > > > > > > > Make sense?  It’s all quite straightforward in concept,
> > > there’s
> > > > > > just
> > > > > > > an
> > > > > > > > > > awful lot of stuff wrapped around that to make it all go
> > > > smoothly
> > > > > > and
> > > > > > > > > > handle the problems when it doesn’t.
> > > > > > > > > >
> > > > > > > > > > There’s additional complexity in that the Ambari-agent
> also
> > > > > caches
> > > > > > > (on
> > > > > > > > > > each node) both the template files an

Re: [VOTE] Reporting Issues Wiki

2017-01-10 Thread Ryan Merriman
+1 (binding)

On Mon, Jan 9, 2017 at 8:23 AM, Casey Stella  wrote:

> +1 (binding)
>
> On Fri, Jan 6, 2017 at 7:43 PM, Kyle Richardson  >
> wrote:
>
> > 0 (binding)
> >
> > I think it's good but it just feels a little cumbersome still.
> >
> > -Kyle
> >
> > > On Jan 6, 2017, at 7:53 AM, JJ Meyer  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > What do you think about changing `*DO NOT FILE A JIRA, DO NOT POST ON
> ANY
> > > OTHER BOARD` *to standard case, but use a confluence warning macro? The
> > all
> > > caps just makes me feel like I'm being yelled at :)
> > >
> > >> On Thu, Jan 5, 2017 at 8:10 PM, zeo...@gmail.com 
> > wrote:
> > >>
> > >> +1 (non-binding)
> > >>
> > >>> On Thu, Jan 5, 2017, 8:30 PM Matt Foley  wrote:
> > >>>
> > >>> +1 (non-binding)
> > >>>
> > >>> One typo is still in there:
> > >   After discussion of the issue on the JIRA if it is clear that you
> > >>> found a bug then you should file a JIRA
> > >>> should be
> > >   After discussion of the issue on the mailing list if it is clear
> > >>> that you found a bug then you should file a JIRA
> > >>>
> > >>> Cheers,
> > >>> --Matt
> > >>>
> > >>>
> > >>> On 1/5/17, 2:41 PM, "James Sirota"  wrote:
> > >>>
> > >>>Based on feedback from the discuss thread.
> > >>>
> > >>>Please vote +1, -1, or 0.  The vote will be open for 72 hours
> > >>>
> > >>>
> > >>>
> > >>> https://cwiki.apache.org/confluence/pages/viewpage.
> > >> action?pageId=67635199
> > >>>
> > >>>
> > >>>All “I have found a bug” issues are considered developer-level
> > >>> issues.  Please report all developer-level issues to
> > >>> dev@metron.incubator.apache.org. Examples of developer issues would
> > be:
> > >>>Project fails to compile or is failing unit or integration tests
> > >>>Project or individual components fail to install
> > >>>There are error messages or failures in my logs
> > >>>I need help with coding or extending a specific component
> > >>>etc...
> > >>>After discussion of the issue on the JIRA if it is clear that you
> > >>> found a bug then you should file a JIRA (unless you found a security
> > >>> vulnerability).  Follow up on the mailing lists if you want advice
> with
> > >>> respect to workaround or a local fix.  Our JIRA is located here.
> > >>>
> > >>>All  “I have a problem” or "How do you use x" issues are usability
> > >>> issues.  If you are an end-user of the product and have a comment or
> > >>> question then use u...@metron.incubator.apache.org.  If you have a
> > >>> problem and a strong suspicion that you might have found a bug,
> please
> > >>> cross-reference dev@metron.incubator.apache.org as well
> > >>>I don't understand the UI, what does button x do?
> > >>>What should the output of function x be?
> > >>>It would be nice if I had feature x along with feature y
> > >>>etc...
> > >>>
> > >>>If you found a security-related issue, please report immediately
> to
> > >>> secur...@metron.incubator.apache.org. Please adhere to the following
> > >>> Apache policy found here. DO NOT FILE A JIRA, DO NOT POST ON ANY
> OTHER
> > >> BOARD
> > >>>I can get access to data I should not have access to
> > >>>I have privileges to do things I should not be allowed to do
> > >>>I found that this project is susceptible to an exploit
> > >>>etc...
> > >>>
> > >>>Please report issues related to the JIRA/Wiki to
> > >>> iss...@metron.incubator.apache.org
> > >>>I don't have access to create/assign JIRAs to myself
> > >>>I don't have visibility/access to certain JIRA featuers
> > >>>I can't create or view a wiki entry
> > >>>etc
> > >>>
> > >>>
> > >>>---
> > >>>Thank you,
> > >>>
> > >>>James Sirota
> > >>>PPMC- Apache Metron (Incubating)
> > >>>jsirota AT apache DOT org
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>
> > >> Jon
> > >>
> > >> Sent from my mobile device
> > >>
> >
>


Re: Enrich enrichment

2017-01-09 Thread Ryan Merriman
We can already do what Carolyn suggests with Stellar enrichments, assuming
we get a Stellar function created for Geo enrichments (all enrichments
ideally).  Would be configured like this under enrichment section in
enrichment config:

"stellar" : {
  "config" : {
"geo_enriched_field" : "GEO_ENRICHMENT(field)"
"enriched_geo_enriched_field": "ENRICHMENT_GET('some_other_enrichment',
geo_enriched_field, 'enrichments', 'cf')"
  }
}

These statements are executed in order and can be grouped together.  As
Casey pointed out the limitation is that both would run in a single Storm
worker.  Would that be acceptable tradeoff?  Sure it would be ideal if we
could execute each one in separate workers but then we would have to
re-architect our topologies and our system would become much more complex.

Ryan


On Mon, Jan 9, 2017 at 12:59 PM, Carolyn Duby <cd...@hortonworks.com> wrote:

> Adding new topologies adds more processing requirements to the system.  It
> adds more topics (storage) and more producers and consumers to kafka
> (processing).
>
> I think what we need is a dependency of enrichments.  Maybe we need to
> either derive the dependencies using the Stellar (potentially not that
> easy) or allow the enrichment to specify the order or enrichment
> calculations.
> This will allow users to calculate more enrichment in the same topology.
>
> Thanks
> Carolyn
>
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>
>  Original message 
> From: Nick Allen <n...@nickallen.org>
> Date: 1/9/17 8:49 AM (GMT-08:00)
> To: dev@metron.incubator.apache.org
> Subject: Re: Enrich enrichment
>
> I agree that making it easy for the user to "enrich enrichments", as Dima
> put it, to an arbitrary depth, would be extremely useful for a lot of use
> cases. We've discussed the use case a little in the past in this thread
> [1].
>
> Re-purposing the "threat intel" phase gives us something that is feasible
> today, but only to a "depth" of 2.  We would also need to rename and
> redocument it so that users understand how they can leverage the two
> phases.  This seems like a minimally viable option if we want to head down
> this road.
>
> The other extreme might involve inferring the topology needed based on the
> user's configuration. If the user needs 3 phases, then we build a topology
> that supports 3 phases.  Under the covers instead of using Flux, we would
> use Storm's topology builder Java API to grok the configuration and build
> the topology(ies) that the user needs.
>
> I am not sure if we can infer this from the configuration as it exists
> today or if we would need to redefine the configuration somehow.  Like I
> said this is "extreme", but could give the user more expressive and
> intuitive options.
>
>
>
>
> ---
> [1]
> http://mail-archives.apache.org/mod_mbox/incubator-metron-
> dev/201610.mbox/%3CCAHSJ8NwJUiyp3YO6NVE4tfLoSSk
> Oc6QG%2BMsAJSSDu%2B-wfct_vw%40mail.gmail.com%3E
>
>
>
> On Mon, Jan 9, 2017 at 10:56 AM, Casey Stella <ceste...@gmail.com> wrote:
>
> > I think that would be a good feature to add to have arbitrary number of
> > phases, though it might be tricky to code (the way I envisioned it would
> > involve a loop in storm, which is possible[1]), might have unintended
> > consequences to guarantees (e.g. updating enrichments might not be able
> to
> > be applied in realtime) and could be tricky to reason about
> > performance-wise.
> >
> > As it stands, the number of phases is a consequence of the topology
> > itself.  We do not currently have an architecture which would allow an
> > arbitrary number of phases without changing the flux file itself.  What
> you
> > can do, though, in a stellar enrichment is stack enrichments (e.g. depend
> > on previous enrichments) because it's just a list of stellar statements.
> > The consequence, of course, is that these statements get run within the
> > same worker, which is unfortunate, but may be a stopgap workaround.
> >
> > *1. https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8
> >
> > On Mon, Jan 9, 2017 at 10:48 AM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> > > Maybe the naming of the phases is misleading?  What if you could set up
> > an
> > > arbitrary number of stages, with defaults?
> > >
> > >
> > > On January 8, 2017 at 16:25:01, Casey Stella (ceste...@gmail.com)
> wrote:
> > >
> > > You could do the geo enrichment normally and do a stellar hbase
> > enrichment
> > > in the threat Intel phase.
> 

Re: Enrich enrichment

2017-01-08 Thread Ryan Merriman
Hbase enrichments and geo enrichments are done in parallel so I would not 
expect this to work.  You could do the Hbase enrichment as a threat Intel 
enrichment and that should work because enrichments and threat Intel are done 
in series.

The ideal way would be to chain together Stellar enrichments but I don't think 
there is a geo enrichment function created yet.  I think that should be a Jira. 
 I know someone is working on an update to how we do geo enrichments so I will 
file a follow on Jira if it's not included in the scope of that work.

Ryan

> On Jan 8, 2017, at 2:31 PM, Dima Kovalyov  wrote:
> 
> Is it possible to enrich enrichment?
> 
> For example I have IP address, I enrich it with geo and get City name,
> now I want to enrich City name with city crime level (assume I have that
> data). But when I do that it just does not work. I specify enrichment
> like that:
>> {
>>  "index" : "msexchange",
>>  "batchSize" : 5,
>>  "enrichment" : {
>>"fieldMap" : {
>>  "geo" : [ "destination_ip", "source_ip" ],
>>  "hbaseEnrichment" : [ "enrichments.geo.destination_ip.country" ],
>>"hbaseEnrichment" : [ "enrichments:geo:destination_ip:country" ],
>>"hbaseEnrichment" : [ "enrichments.geo.destination_ip:country" ]
>>},
>>"fieldToTypeMap" : {
>>  "enrichments.geo.destination_ip.country" : [ "city_crime_level" ],
>>  "enrichments:geo:destination_ip:country" : [ "city_crime_level" ],
>>  "enrichments.geo.destination_ip:country" : [ "city_crime_level" ]
>>},
>>"config" : { }
>>  },
>>  "threatIntel" : {
>>"fieldMap" : { },
>>"fieldToTypeMap" : { },
>>"config" : { },
>>"triageConfig" : {
>>  "riskLevelRules" : { },
>>  "aggregator" : "MAX",
>>  "aggregationConfig" : { }
>>}
>>  },
>>  "configuration" : { }
>> }
> I tried all the ways how enrichment field can be entered just to be sure
> I do not mistype it.
> 
> - Dima


Re: [GitHub] incubator-metron issue #393: METRON-622: Create a Metron Docker Compose appl...

2017-01-03 Thread Ryan Merriman
I would consider the topologies installed, just not running.  But yes, no
data flowing end to end by default.

Ryan

On Thu, Dec 22, 2016 at 11:42 AM, ottobackwards  wrote:

> Github user ottobackwards commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/393
>
> So just to be clear, the end result of this is the cluster deployed,
> but nothing installed?  No topologies, no indices in ES etc?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: [GitHub] incubator-metron issue #379: METRON-595: ES Writer uses more than one IP

2016-12-01 Thread Ryan Merriman
Jonathan,

That is the wrong Storm version.  You need to update your Vagrant box:
https://www.vagrantup.com/docs/cli/box.html.

Ryan

On Thu, Dec 1, 2016 at 1:50 PM, JonathanRider  wrote:

> Github user JonathanRider commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/379
>
> And you can start a topolgy in it? This is the command I'm running:
> /usr/metron/0.3.0/bin/start_parser_topology.sh -z node1:2181 -k
> node1:6667 -s bro
>
> And storm version gives 0.10.0.2.3.4.7-4
> And the HDP stack in ambari says: HDP-2.3.4.7-4
>
> Can you confirm that these match yours?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: Grok - exception accessing HDFS

2016-11-22 Thread Ryan Merriman
My guess is that either /apps/metron/patterns/squid was somehow removed
from HDFS or the HDSF url isn¹t configured properly.  I believe that url
comes from the fs.defaultFS property in /etc/hadoop/conf/core-site.xml.

Ryan

On 11/22/16, 11:14 AM, "Otto Fowler"  wrote:

>Anyone ever see this?
>
>2016-11-21 16:21:43.197 o.a.h.u.NativeCodeLoader [WARN] Unable to load
>native-hadoop library for your platform... using builtin-java classes
>where
>applicable
>2016-11-21 16:21:43.354 o.a.m.p.GrokParser [ERROR] Grok parser unable to
>initialize grok parser: Unable to load /apps/metron/patterns/squid from
>either classpath or HDFS
>java.lang.RuntimeException: Grok parser unable to initialize grok parser:
>Unable to load /apps/metron/patterns/squid from either classpath or HDFS
>at org.apache.metron.parsers.GrokParser.init(GrokParser.java:109)
>[stormjar.jar:?]
>at
>org.apache.metron.parsers.bolt.ParserBolt.prepare(ParserBolt.java:86)
>[stormjar.jar:?]
>at
>backtype.storm.daemon.executor$fn__6259$fn__6272.invoke(executor.clj:751)
>[storm-core-0.10.0.2.4.3.0-227.jar:0.10.0.2.4.3.0-227]
>at backtype.storm.util$async_loop$fn__545.invoke(util.clj:477)
>[storm-core-0.10.0.2.4.3.0-227.jar:0.10.0.2.4.3.0-227]
>at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
>at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
>2016-11-21 16:21:43.358 b.s.util [ERROR] Async loop died!
>java.lang.RuntimeException: Grok parser Error: Grok parser unable to
>initialize grok parser: Unable to load /apps/metron/patterns/squid from
>either classpath or HDFS
>at org.apache.metron.parsers.GrokParser.init(GrokParser.java:127)
>~[stormjar.jar:?]
>at
>org.apache.metron.parsers.bolt.ParserBolt.prepare(ParserBolt.java:86)
>~[stormjar.jar:?]
>at
>backtype.storm.daemon.executor$fn__6259$fn__6272.invoke(executor.clj:751)
>~[storm-core-0.10.0.2.4.3.0-227.jar:0.10.0.2.4.3.0-227]
>at backtype.storm.util$async_loop$fn__545.invoke(util.clj:477)
>[storm-core-0.10.0.2.4.3.0-227.jar:0.10.0.2.4.3.0-227]
>at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
>at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
>Caused by: java.lang.RuntimeException: Grok parser unable to initialize
>grok parser: Unable to load /apps/metron/patterns/squid from either
>classpath or HDFS
>at org.apache.metron.parsers.GrokParser.init(GrokParser.java:109)
>~[stormjar.jar:?]
>... 5 more
>
>
>2.1Beta rc2



Re: problem with vagrant launch

2016-11-21 Thread Ryan Merriman
Matt,

Ansible is probably the most fragile of all those with respect to
versions.  That is likely your problem.

Ryan Merriman

On Mon, Nov 21, 2016 at 12:27 PM, Matt Foley <ma...@apache.org> wrote:

> I had checked them, and they are all >= the specified requirement in
> https://github.com/apache/incubator-metron/tree/master/
> metron-deployment/vagrant/quick-dev-platform#user-content-prerequisites :
> Ansible 2.1.2.0
> Vagrant 1.8.6
> Virtualbox 5.0.24
> Python 2.7.12
> Maven 3.3.9
>
> The only one significantly different is Ansible; the requirement is
> 2.0.0.2.
> Is Ansible 2.1 not compatible with Ansible 2.0?  I let homebrew install
> the latest version.
> Thanks,
> --Matt
>
> From: Otto Fowler <ottobackwa...@gmail.com>
> Date: Saturday, November 19, 2016 at 5:18 AM
> To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>,
> Matt Foley <ma...@apache.org>
> Subject: Re: problem with vagrant launch
>
> What versions of ansible, vagrant, virtualbox and python do you have?
>
>
>
> On November 18, 2016 at 16:34:08, Matt Foley (ma...@apache.org) wrote:
> Hi,
> I'm trying to launch vanilla vagrant single-node test env for Metron on a
> Mac, via
> “vagrant up” in metron-deployment/vagrant/full-dev-platform
>
> The Ambari install all goes fine, then the Elastic Search install
> but when it comes to
> TASK [metron_elasticsearch_templates : Add Elasticsearch templates for
> topologies]
> it fails on all three (bro, snort, yaf indexes) with
> "Status code was not [200]: An unknown error occurred: sendall() argument
> 1 must be string or buffer, not dict"
>
> Launching from metron-deployment/vagrant/quick-dev-platform had the same
> failure point.
>
> All three *_index.template files are present in the specified directory.
>
> Does anyone recognize this problem? Have people been running the vagrant
> test env lately?
> Any suggestions for debugging?
>
> Thanks,
> --Matt
>
> Cmd line output:
> …
> TASK [metron_elasticsearch_templates : Wait for Index to Become
> Available] *
> ok: [node1]
>
> TASK [metron_elasticsearch_templates : Add Elasticsearch templates for
> topologies] ***
> failed: [node1] (item=/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/bro_index.template) => {"content": "", "failed": true,
> "item": "/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/bro_index.template", "msg": "Status code was not [200]: An
> unknown error occurred: sendall() argument 1 must be string or buffer, not
> dict", "redirected": false, "status": -1, "url": "
> http://node1:9200/_template/bro_index"}
> failed: [node1] (item=/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/snort_index.template) => {"content": "", "failed": true,
> "item": "/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/snort_index.template", "msg": "Status code was not [200]: An
> unknown error occurred: sendall() argument 1 must be string or buffer, not
> dict", "redirected": false, "status": -1, "url": "
> http://node1:9200/_template/snort_index"}
> failed: [node1] (item=/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/yaf_index.template) => {"content": "", "failed": true,
> "item": "/Users/mfoley/projects/Metron/metron532/metron-
> deployment/roles/metron_elasticsearch_templates/files/
> es_templates/yaf_index.template", "msg": "Status code was not [200]: An
> unknown error occurred: sendall() argument 1 must be string or buffer, not
> dict", "redirected": false, "status": -1, "url": "
> http://node1:9200/_template/yaf_index"}
>
> NO MORE HOSTS LEFT **
> ***
> to retry, use: --limit @/Users/mfoley/projects/Metron/metron532/metron-
> deployment/playbooks/metron_full_install.retry
>
> PLAY RECAP 
> *
> node1  : ok=44   changed=12   unreachable=0
> failed=1
>
> Ansible failed to complete successfully. Any error output should be
> visible above. Please fix these errors and try again.
>
>
>
>
>
>
>


Re: [DISCUSS] Next Release Name

2016-11-09 Thread Ryan Merriman
+1

On Wed, Nov 9, 2016 at 4:30 PM, Casey Stella  wrote:

> Agreed, +1 to 0.3.0
>
> On Wed, Nov 9, 2016 at 5:28 PM, zeo...@gmail.com  wrote:
>
> > That sounds very reasonable to me.
> >
> > Jon
> >
> > On Wed, Nov 9, 2016, 17:15 James Sirota  wrote:
> >
> > Guys,
> >
> > You know, looking at the release I think the changes were significant
> > enough due to the storm & kafka upgrade to justify moving it to a
> non-point
> > release.  Generally point releases are reserved for patches or
> maintenance
> > releases.  I think this release is more than just a maintenance
> release.  I
> > suggest we consider 0.3.0
> >
> > 04.11.2016, 18:27, "Kyle Richardson" :
> > > I'm a little late to the party but thought I would go ahead and throw
> my
> > > two cents into the mix.
> > >
> > > I share the concern around an upgrade / migration path. While I would
> > love
> > > to see the BETA dropped sooner than later, to me, this is a game
> changer
> > > for people implementing Metron. I think there is a silent expectation
> of
> > no
> > > data loss after dropping the BETA tag.
> > >
> > > Even if there is not a direct upgrade path for a few releases, is there
> > > documentation that we could provide to ensure a data migration path for
> > > users? I'm not thinking anything automated just some instructions on
> what
> > > to do.
> > >
> > > -Kyle
> > >
> > > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella 
> wrote:
> > >
> > >>  Jon,
> > >>
> > >>  Thank you for your thoughts; they are appreciated and you should keep
> > them
> > >>  coming. This kind of discussion is exactly why I sent out this
> thread.
> > I
> > >>  think it's safe to say that the entire community shares your desire
> for
> > >>  Metron to be as easy to use as possible and a "data analysis platform
> > for
> > >>  the masses." We should hold ourselves to a high standard, no doubt.
> > >>
> > >>  Casey
> > >>
> > >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> > wrote:
> > >>
> > >>  > Please understand that my points mostly relate to perception and
> ease
> > of
> > >>  > use, not what's technically possible or available. I'm coming at
> this
> > as
> > >>  > Metron should be a data analysis platform for the masses.
> > >>  >
> > >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> > your
> > >>  > definition of non-issue. I personally believe that data (in every
> > >>  location
> > >>  > that it exists) needs to be obvious and have ultra high integrity.
> > I'm
> > >>  not
> > >>  > concerned that the correct data won't exist somewhere in the
> cluster,
> > I'm
> > >>  > focusing on it being easily accessible by an operations team that
> may
> > >>  > consist of entry level analysts. Once 517 is done and merged I
> would
> > >>  > consider that a short term mitigation is in place.
> > >>  >
> > >>  > I feel like the project should stick to certain principles and a
> > >>  suggestion
> > >>  > is that data access is easy, accurate, and obvious. Do we have
> > anything
> > >>  > like this that was agreed upon, discussed, or documented? Probably
> a
> > >>  > discussion for a different thread.
> > >>  >
> > >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> > that
> > >>  and
> > >>  > resolving them would give a better first impression (assuming that
> > people
> > >>  > monitoring the project will start using it more once it's non-BETA
> > >>  > software). First impressions are big on my book and could affect
> > initial
> > >>  > adoption.
> > >>  >
> > >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> > else
> > >>  saw
> > >>  > this issue as well. I think the finger is currently being pointed
> at
> > >>  monit
> > >>  > timeouts and not storm. It also doesn't happen every single time, I
> > only
> > >>  > run into it while the cluster is under load and after dozens of
> > topology
> > >>  > restarts that I do when tuning parallelism in storm. I'm going to
> be
> > >>  > updating to storm 1.0.x in order to see if this still exists.
> Again,
> > >>  this
> > >>  > relates to ease of use/load testing/tuning.
> > >>  >
> > >>  > Agree with the upgrade comments - as long as it's supported at some
> > >>  defined
> > >>  > point (IMHO this is when a project leaves BETA but others are
> welcome
> > to
> > >>  > disagree).
> > >>  >
> > >>  > Finally, I know this doesn't come across well in email but I'm just
> > >>  > mentioning items which I think are important, not attempting to
> > demand
> > >>  that
> > >>  > they be fixed or that this doesn't leave beta. Thanks,
> > >>  >
> > >>  > Jon
> > >>  >
> > >>  > On Thu, Nov 3, 2016, 16:44 James Sirota 
> wrote:
> > >>  >
> > >>  >
> > >>  > Hi Jon,
> > >>  >
> > >>  > Here are my thoughts around your objections.
> > >>  >
> > >>  > METRON-517/METRON-542
> > >>  >
> > >>  > I thin the 

Re: travis - how long should a build take?

2016-11-09 Thread Ryan Merriman
The last PR I did took 43 minutes.  I would restart it.

On 11/9/16, 2:45 PM, "Otto Fowler"  wrote:

>My last pr is going on 2h.



Re: Re: Travis Logging and You

2016-11-04 Thread Ryan Merriman
Haha I wrote something that does the exact same thing.  I added a couple
extra methods to set log levels for other logging frameworks (Log4j2 and
Java logging).  Good to know, I will just add on to that.

On Fri, Nov 4, 2016 at 9:05 AM, Otto Fowler <ottobackwa...@gmail.com> wrote:

> Hey Ryan,
>
> Take a look at the UnitTestHelper in test utils.  It has methods for
> changing the logger verbosity.  Even if it is not what we do, it is
> interesting.  I just stubbled on it.
>
>
>
> On November 3, 2016 at 13:56:51, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> We are going to need a separate jira or set for sweeping the code for
> compile warnings maybe.
>
>
> On November 3, 2016 at 13:50:59, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> https://github.com/ottobackwards/incubator-metron.git
> branch METRON-538
>
> If you end up wanting to send me over PR’s.
>
> 1 commit so far, I was able to get rid of the curator exceptions:
>
> Running org.apache.metron.maas.service.MaasIntegrationTest
> Found endpoints: dummy:1.0 @ http://10.0.0.99:1500 serving:
> apply=echo
> Cleaning up...
> Killing 2189 from  2189 ttys0000:01.19 /Library/Java/
> JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/bin/java
> org.apache.metron.maas.service.runner.Runner -ci 2 -zq 127.0.0.1:52505
> -zr /maas/config -s dummy_rest.sh -n dummy -hn 10.0.0.99 -v 1.0
> Killing 2192 from  2192 ttys0000:00.01 /bin/bash
> /Users/ottofowler/src/apache/forks/incubator-metron/metron-
> analytics/metron-maas-service/target/MaasIntegrationTest/
> MaasIntegrationTest-localDir-nm-0_0/usercache/ottofowler/
> appcache/application_1478189460460_0001/container_
> 1478189460460_0001_01_02/dummy_rest.sh
> Killing 2236 from  2236 ttys0000:00.00 /bin/bash
> /Users/ottofowler/src/apache/forks/incubator-metron/metron-
> analytics/metron-maas-service/target/MaasIntegrationTest/
> MaasIntegrationTest-localDir-nm-0_0/usercache/ottofowler/
> appcache/application_1478189460460_0001/container_
> 1478189460460_0001_01_02/dummy_rest.sh
> 2016-11-03 12:11:20,839 ERROR [Thread[Thread-256,5,main]] delegation.
> AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(659))
> - ExpiredTokenRemover received java.lang.InterruptedException: sleep
> interrupted
> 2016-11-03 12:11:20,956 ERROR [Thread[Thread-236,5,main]] delegation.
> AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(659))
> - ExpiredTokenRemover received java.lang.InterruptedException: sleep
> interrupted
> 2016-11-03 12:11:36,277 ERROR [Curator-Framework-0]
> curator.ConnectionState (ConnectionState.java:checkTimeouts(200)) -
> Connection timed out for connection string (127.0.0.1:52505) and timeout
> (15000) / elapsed (15421)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at org.apache.curator.ConnectionState.checkTimeouts(
> ConnectionState.java:197)
> at org.apache.curator.ConnectionState.getZooKeeper(
> ConnectionState.java:87)
> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(
> CuratorZookeeperClient.java:115)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:806)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(
> CuratorFrameworkImpl.java:62)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> etc etc
>
> I am just going to make a pass through these types and see if there is a
> common shutdown order problem ( maas does not use the integration component
> runner so shutdown order is not specified, so it may be ‘special’ ).
>
>
>
> On November 3, 2016 at 12:30:46, Ryan Merriman (merrim...@gmail.com)
> wrote:
>
> Here is the Jira for log levels:
> https://issues.apache.org/jira/browse/METRON-541
>
> On Thu, Nov 3, 2016 at 11:13 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > I think we fell off the list - sorry
> >
> >
> > On November 3, 2016 at 12:09:02, Otto Fowler (ottobackwa...@gmail.com)
> > wrote:
> >
> > METRON-538
> >
> > Everyone is welcome to comment.
> >
> >
> > On November 3, 2016 at 12:06:28, Ryan Merriman (merrim...@gmail.com)
>

Re: Re: Travis Logging and You

2016-11-03 Thread Ryan Merriman
Here is the Jira for log levels:
https://issues.apache.org/jira/browse/METRON-541

On Thu, Nov 3, 2016 at 11:13 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> I think we fell off the list - sorry
>
>
> On November 3, 2016 at 12:09:02, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> METRON-538
>
> Everyone is welcome to comment.
>
>
> On November 3, 2016 at 12:06:28, Ryan Merriman (merrim...@gmail.com)
> wrote:
>
> Makes sense.
>
> On Thu, Nov 3, 2016 at 11:03 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > I think we should have two jiras, two pr’s.
> > I’ll create one for the shutdown issues.
> >
> >
> >
> > On November 3, 2016 at 12:02:53, Ryan Merriman (merrim...@gmail.com)
> > wrote:
> >
> > Yeah let's do it in parallel.  You can start with the shutdown issues and
> > I'll work on making the log levels configurable.  Let's go ahead and
> > proceed with 2 log configurations and see how it goes.  If you get done
> > first, just submit a PR and I'll add to it.
> >
> > Thanks Otto.  I can't wait to get this fixed.
> >
> > On Thu, Nov 3, 2016 at 10:57 AM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> >> I have not.  I was going to start looking at shutdown while waiting for
> >> consensus on 1 v 2 log configurations.
> >> How do you want to proceed?  We can do it together.
> >>
> >>
> >> On November 3, 2016 at 11:43:24, Ryan Merriman (merrim...@gmail.com)
> >> wrote:
> >>
> >> Otto, have you started on any of this yet? Was thinking I would start
> with
> >> getting the log levels consistent and dig into the shutdown issues. Then
> >> we can iterate from there.
> >>
> >> Ryan
> >>
> >> On Wed, Nov 2, 2016 at 1:29 PM, Ryan Merriman <merrim...@gmail.com>
> >> wrote:
> >>
> >> > I vote for 1 logging configuration (ERROR only). Why do we want
> >> different
> >> > logging in Travis vs local? If you are working on a specific component
> >> and
> >> > need more verbose logging, temporarily change the log level to INFO
> for
> >> > that component. If we get the logging in shape this will be easy to
> do.
> >> >
> >> > On Wed, Nov 2, 2016 at 1:18 PM, Otto Fowler <ottobackwa...@gmail.com>
> >> > wrote:
> >> >
> >> >> On Fri, Oct 28, 2016 at 3:13 PM
> >> >> <http://airmail.calendar/2016-10-28%2015:13:00%20EDT>, David Lyle <
> >> >> dlyle65...@gmail.com> wrote:
> >> >>
> >> >> > I think you noticed the main problem with turning logging off
> >> entirely.
> >> >> >
> >> >> > I'd be inclined to have two files: one which defaults to INFO and
> >> >> another
> >> >> > that defaults to ERROR for Travis. We can give a
> >> >> -Dlog4j.configuration=file:log4j.config.set.to.ERROR.only
> >> >> > for travis, which I think Otto suggested.
> >> >>
> >> >> So -
> >> >> * one jira to fix the component shutdowns ( I’ll take a stab unless
> you
> >> >> are
> >> >> already on it )
> >> >> * one jira to have travis run with a second configuration ( be it
> >> >> literally
> >> >> a second file or something else ) set to error only
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On November 2, 2016 at 13:51:28, Casey Stella (ceste...@gmail.com)
> >> wrote:
> >> >>
> >> >> What would be in the two different logging properties?
> >> >>
> >> >> On Wed, Nov 2, 2016 at 1:45 PM, Otto Fowler <ottobackwa...@gmail.com
> >
> >> >> wrote:
> >> >>
> >> >> > What about having two logging configurations? One just for travis,
> >> and
> >> >> > one pretty much what there is now ( the teardown stuff still has to
> >> be
> >> >> > sorted out ). Maybe Travis can be scripted to put the right logging
> >> >> > properties files in place?
> >> >> >
> >> >> >
> >> >> > On November 2, 2016 at 12:42:09, Casey Stella (ceste...@gmail.com)
> >> >> wrote:
> >> >> >
> >> >> > I haven't seen a JIRA about this yet. IMHO, I think a good
> first-pass
> >> >> > would be:
> >

Re: Travis Logging and You

2016-11-03 Thread Ryan Merriman
Otto, have you started on any of this yet?  Was thinking I would start with
getting the log levels consistent and dig into the shutdown issues.  Then
we can iterate from there.

Ryan

On Wed, Nov 2, 2016 at 1:29 PM, Ryan Merriman <merrim...@gmail.com> wrote:

> I vote for 1 logging configuration (ERROR only).  Why do we want different
> logging in Travis vs local?  If you are working on a specific component and
> need more verbose logging, temporarily change the log level to INFO for
> that component.  If we get the logging in shape this will be easy to do.
>
> On Wed, Nov 2, 2016 at 1:18 PM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>> On Fri, Oct 28, 2016 at 3:13 PM
>> <http://airmail.calendar/2016-10-28%2015:13:00%20EDT>, David Lyle <
>> dlyle65...@gmail.com> wrote:
>>
>> > I think you noticed the main problem with turning logging off entirely.
>> >
>> > I'd be inclined to have two files: one which defaults to INFO and
>> another
>> > that defaults to ERROR for Travis. We can give a
>> -Dlog4j.configuration=file:log4j.config.set.to.ERROR.only
>> > for travis, which I think Otto suggested.
>>
>> So -
>> * one jira to fix the component shutdowns ( I’ll take a stab unless you
>> are
>> already on it )
>> * one jira to have travis run with a second configuration ( be it
>> literally
>> a second file or something else ) set to error only
>>
>>
>>
>>
>> On November 2, 2016 at 13:51:28, Casey Stella (ceste...@gmail.com) wrote:
>>
>> What would be in the two different logging properties?
>>
>> On Wed, Nov 2, 2016 at 1:45 PM, Otto Fowler <ottobackwa...@gmail.com>
>> wrote:
>>
>> > What about having two logging configurations?  One just for travis, and
>> > one pretty much what there is now ( the teardown stuff still has to be
>> > sorted out ).  Maybe Travis can be scripted to put the right logging
>> > properties files in place?
>> >
>> >
>> > On November 2, 2016 at 12:42:09, Casey Stella (ceste...@gmail.com)
>> wrote:
>> >
>> > I haven't seen a JIRA about this yet. IMHO, I think a good first-pass
>> > would be:
>> > * We have a lot of ERROR level logging that happens because during
>> teardown
>> > of the in memory components that could be fixed by tearing down
>> components
>> > in the right order (possibly).
>> > * Teardown in some of our integration tests don't seem to get called if
>> the
>> > tests fail, this causes cascading errors to happen ( the next test won't
>> > start because it can't start the components), so ensuring teardown
>> happens
>> > in a finally block would be good
>> > * If there are chatty components that are inappropriately logging, we
>> can
>> > adjust the logging level on a per-package basis. Tender balance between
>> > suppressing valuable output and chattiness would ahve to be made (and
>> > probably discussed as part of a JIRA).
>> >
>> > In retrospect, after considering this after the previous discussion on
>> the
>> > dev list, I would not be in favor of logging to a file. It is important
>> to
>> > see those logs on the travis output to help with quick-debugging help
>> and
>> > we'd be setting ourselves up to be non-standard as well. I'd rather see
>> a
>> > more directed and surgical effort.
>> >
>> > That's just my $0.02, though. I'd welcome a JIRA (or multiple JIRAs) to
>> > tackle logging.
>> >
>> > On Wed, Nov 2, 2016 at 12:33 PM, Otto Fowler <ottobackwa...@gmail.com>
>> > wrote:
>> >
>> > > Did a jira for this actually get created? I would be willing to help
>> work
>> > > on getting the logs setup for what they need to be for travis and for
>> > > local. Did we settle on an approach? Is there work ongoing that could
>> use
>> > > some dev or testing help?
>> > >
>> >
>> >
>>
>
>


Re: [DISCUSS] Metron REST API Architecture and Design

2016-11-02 Thread Ryan Merriman
Hey Jon sorry for the delay.  These are new to me so I'm not sure where to
start.  Is there a common use case we could explore initially?

Ryan

On Thu, Oct 27, 2016 at 9:02 AM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> Another thought that came to me recently - Has there been any consideration
> of using some of the newer standards such as pxGrid
> <https://developer.cisco.com/site/pxgrid/discover/overview/>, DXL
> <https://community.mcafee.com/docs/DOC-7296>, etc. to integrate with other
> systems and allow data loading/sharing?  I'm not familiar enough with those
> solutions to know if it fits with the current Metron API architecture, but
> at least I wanted to put it out there.
>
> Jon
>
> On Mon, Oct 24, 2016 at 11:00 AM larry mccay <lmc...@apache.org> wrote:
>
> > I think this is a reasonable direction for Metron.
> >
> > It probably makes sense to make sure that your services can accept
> identity
> > propagation from Knox so that they can also be proxied along with Hadoop
> > APIs.
> >
> > FWIW - discussing whether a JAXRS programming model is something wanted
> by
> > the community wouldn't be a difficult thing to do.
> > It hasn't been discussed to this point which is why it isn't documented -
> > this is primarily because there hasn't been any explicit demand for it.
> >
> >
> >
> > On Mon, Oct 24, 2016 at 10:51 AM, zeo...@gmail.com <zeo...@gmail.com>
> > wrote:
> >
> > > Ok, that sounds good to me, I primarily wanted to see whether or not it
> > was
> > > attempted and if it hit a technical roadblock.  Thanks,
> > >
> > > Jon
> > >
> > > On Mon, Oct 24, 2016 at 10:11 AM Ryan Merriman <merrim...@gmail.com>
> > > wrote:
> > >
> > > > There is also this comment in that Jira:
> > > >
> > > > "Adding the JAXRS services to knox is really easy but we haven't
> really
> > > > discussed whether it should be a programming model aspect of Knox in
> > the
> > > > community"
> > > >
> > > > I think that would need to be worked out before we move services into
> > > Knox,
> > > > if we decide we should do that.
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 9:06 AM, Ryan Merriman <merrim...@gmail.com>
> > > > wrote:
> > > >
> > > > > I spent some time researching the Knox documentation and building
> > > custom
> > > > > services (hosted in Knox) was not well-documented.  Spring is a
> great
> > > > > choice for that and I didn't really get any other feedback on which
> > > > > application development framework to use.  So that's what I went
> > with.
> > > > >
> > > > > I think we should plan on adding Knox in front to leverage all the
> > nice
> > > > > security features and integrations.  That is how most Knox
> > integrations
> > > > > (HFDS, Storm, etc) are architected.
> > > > >
> > > > > Ryan
> > > > >
> > > > > On Mon, Oct 24, 2016 at 8:37 AM, zeo...@gmail.com <
> zeo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> So it looks like, for now, you are not pursuing Knox (per comments
> > in
> > > > >> METRON-503 and then PR 316).  Is there a reason for that?
> > > > >>
> > > > >> Jon
> > > > >>
> > > > >> On Fri, Oct 14, 2016 at 5:59 PM zeo...@gmail.com <
> zeo...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Good question :)
> > > > >> >
> > > > >> > On Fri, Oct 14, 2016, 17:07 Ryan Merriman <merrim...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > Jon,
> > > > >> >
> > > > >> > It wasn't intentional, I ran out of time and wanted to get
> > something
> > > > out
> > > > >> > there.  I think it certainly could be open ended though.  Where
> > > should
> > > > >> the
> > > > >> > REST API project be located?
> > > > >> >
> > > > >> > Ryan
> > > > >> >
> > > > >> > On Thu, Oct 13, 2016 at 7:32 PM, zeo...@gmail.com <
> > zeo...@gmail.com
> > > >
> > > > >&g

Re: Travis Logging and You

2016-11-02 Thread Ryan Merriman
I vote for 1 logging configuration (ERROR only).  Why do we want different
logging in Travis vs local?  If you are working on a specific component and
need more verbose logging, temporarily change the log level to INFO for
that component.  If we get the logging in shape this will be easy to do.

On Wed, Nov 2, 2016 at 1:18 PM, Otto Fowler  wrote:

> On Fri, Oct 28, 2016 at 3:13 PM
> , David Lyle <
> dlyle65...@gmail.com> wrote:
>
> > I think you noticed the main problem with turning logging off entirely.
> >
> > I'd be inclined to have two files: one which defaults to INFO and another
> > that defaults to ERROR for Travis. We can give a
> -Dlog4j.configuration=file:log4j.config.set.to.ERROR.only
> > for travis, which I think Otto suggested.
>
> So -
> * one jira to fix the component shutdowns ( I’ll take a stab unless you are
> already on it )
> * one jira to have travis run with a second configuration ( be it literally
> a second file or something else ) set to error only
>
>
>
>
> On November 2, 2016 at 13:51:28, Casey Stella (ceste...@gmail.com) wrote:
>
> What would be in the two different logging properties?
>
> On Wed, Nov 2, 2016 at 1:45 PM, Otto Fowler 
> wrote:
>
> > What about having two logging configurations?  One just for travis, and
> > one pretty much what there is now ( the teardown stuff still has to be
> > sorted out ).  Maybe Travis can be scripted to put the right logging
> > properties files in place?
> >
> >
> > On November 2, 2016 at 12:42:09, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > I haven't seen a JIRA about this yet. IMHO, I think a good first-pass
> > would be:
> > * We have a lot of ERROR level logging that happens because during
> teardown
> > of the in memory components that could be fixed by tearing down
> components
> > in the right order (possibly).
> > * Teardown in some of our integration tests don't seem to get called if
> the
> > tests fail, this causes cascading errors to happen ( the next test won't
> > start because it can't start the components), so ensuring teardown
> happens
> > in a finally block would be good
> > * If there are chatty components that are inappropriately logging, we can
> > adjust the logging level on a per-package basis. Tender balance between
> > suppressing valuable output and chattiness would ahve to be made (and
> > probably discussed as part of a JIRA).
> >
> > In retrospect, after considering this after the previous discussion on
> the
> > dev list, I would not be in favor of logging to a file. It is important
> to
> > see those logs on the travis output to help with quick-debugging help and
> > we'd be setting ourselves up to be non-standard as well. I'd rather see a
> > more directed and surgical effort.
> >
> > That's just my $0.02, though. I'd welcome a JIRA (or multiple JIRAs) to
> > tackle logging.
> >
> > On Wed, Nov 2, 2016 at 12:33 PM, Otto Fowler 
> > wrote:
> >
> > > Did a jira for this actually get created? I would be willing to help
> work
> > > on getting the logs setup for what they need to be for travis and for
> > > local. Did we settle on an approach? Is there work ongoing that could
> use
> > > some dev or testing help?
> > >
> >
> >
>


Re: [DISCUSS] Build Log Usability

2016-10-27 Thread Ryan Merriman
I definitely agree with Otto that we need to make our build logs less
verbose.  I think the first step is to get control of the log levels across
various components.  Are there specific examples of the types of messages
we want to suppress?  For instance:

org.apache.metron.profiler.integration.ProfilerIntegrationTest - zookeeper
and curator log levels are set to INFO
org.apache.metron.enrichment.bolt.ThreatIntelJoinBoltTest - expected
exceptions are printed out when they should be suppressed (to Dave's point)
...

Maybe building a list of examples will be easier after we take a pass at
setting all the log levels to ERROR.  Ideally we only see the test results
and an error if something really did fail.  Does that sound like a
reasonable approach?

Ryan

On Wed, Oct 19, 2016 at 7:44 AM, David Lyle  wrote:

> Hi Otto,
>
> Good point. One of the practices I'd like to see is to suppress any
> exception/error output in tests when it's expected.
>
> Thanks!
>
> -D...
>
>
> On Wed, Oct 19, 2016 at 8:18 AM, Otto Fowler 
> wrote:
>
> > The Metron build logs contain a great deal of information, including a
> > large number of warnings / exceptions that do not result in test failures
> > but still fill the log.  This makes troubleshooting test/build issues
> > difficult.  For example - the travis ci log for PR #276’s recent failure
> > doesn’t even have the final errors because the log is too long.
> >
> > What are some things that can be done to make the situation better?
> >
> >
> >
>


Re: [DISCUSS] Metron REST API Architecture and Design

2016-10-24 Thread Ryan Merriman
There is also this comment in that Jira:

"Adding the JAXRS services to knox is really easy but we haven't really
discussed whether it should be a programming model aspect of Knox in the
community"

I think that would need to be worked out before we move services into Knox,
if we decide we should do that.



On Mon, Oct 24, 2016 at 9:06 AM, Ryan Merriman <merrim...@gmail.com> wrote:

> I spent some time researching the Knox documentation and building custom
> services (hosted in Knox) was not well-documented.  Spring is a great
> choice for that and I didn't really get any other feedback on which
> application development framework to use.  So that's what I went with.
>
> I think we should plan on adding Knox in front to leverage all the nice
> security features and integrations.  That is how most Knox integrations
> (HFDS, Storm, etc) are architected.
>
> Ryan
>
> On Mon, Oct 24, 2016 at 8:37 AM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
>
>> So it looks like, for now, you are not pursuing Knox (per comments in
>> METRON-503 and then PR 316).  Is there a reason for that?
>>
>> Jon
>>
>> On Fri, Oct 14, 2016 at 5:59 PM zeo...@gmail.com <zeo...@gmail.com>
>> wrote:
>>
>> > Good question :)
>> >
>> > On Fri, Oct 14, 2016, 17:07 Ryan Merriman <merrim...@gmail.com> wrote:
>> >
>> > Jon,
>> >
>> > It wasn't intentional, I ran out of time and wanted to get something out
>> > there.  I think it certainly could be open ended though.  Where should
>> the
>> > REST API project be located?
>> >
>> > Ryan
>> >
>> > On Thu, Oct 13, 2016 at 7:32 PM, zeo...@gmail.com <zeo...@gmail.com>
>> > wrote:
>> >
>> > > Along the lines of:
>> > > • Must be deployed to a machine with adequate resources so that
>> resource
>> > > contention is avoided.
>> > > • Will need network access to all other services within Metron
>> > >
>> > > Has there been any consideration of a "Metron Manager" node?  In the
>> old
>> > > TP2
>> > > bare metal install guide
>> > > <https://cwiki.apache.org/confluence/display/METRON/
>> > > Metron+Installation+on+an+Ambari-Managed+Cluster>
>> > > it mentions a "Metron Installer," but I could see the needs for that
>> sort
>> > > of a system expanding to have the following roles:
>> > > - API
>> > > - Metron UI
>> > > - Metron Installer/upgrades
>> > > - Edge/Gateway Node for data loading
>> > > - Clients
>> > >
>> > > Also, at the end it ends mid-sentence under "Organization within
>> Metron,"
>> > > was that intended to be open ended?
>> > >
>> > > Jon
>> > >
>> > > On Thu, Oct 13, 2016 at 6:10 PM Ryan Merriman <merrim...@gmail.com>
>> > wrote:
>> > >
>> > > > I created a Jira to track this new feature at
>> > > > https://issues.apache.org/jira/browse/METRON-503.  I also started
>> and
>> > > > attached an architecture doc to that Jira with some of my ideas
>> about
>> > how
>> > > > we should implement it.  Please feel free to review and comment or
>> add
>> > to
>> > > > it.  Looking forward to everyone's ideas and feedback.
>> > > >
>> > > > Ryan Merriman
>> > > >
>> > > --
>> > >
>> > > Jon
>> > >
>> >
>> > --
>> >
>> > Jon
>> >
>> --
>>
>> Jon
>>
>
>


Re: [DISCUSS] Metron REST API Architecture and Design

2016-10-24 Thread Ryan Merriman
I spent some time researching the Knox documentation and building custom
services (hosted in Knox) was not well-documented.  Spring is a great
choice for that and I didn't really get any other feedback on which
application development framework to use.  So that's what I went with.

I think we should plan on adding Knox in front to leverage all the nice
security features and integrations.  That is how most Knox integrations
(HFDS, Storm, etc) are architected.

Ryan

On Mon, Oct 24, 2016 at 8:37 AM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> So it looks like, for now, you are not pursuing Knox (per comments in
> METRON-503 and then PR 316).  Is there a reason for that?
>
> Jon
>
> On Fri, Oct 14, 2016 at 5:59 PM zeo...@gmail.com <zeo...@gmail.com> wrote:
>
> > Good question :)
> >
> > On Fri, Oct 14, 2016, 17:07 Ryan Merriman <merrim...@gmail.com> wrote:
> >
> > Jon,
> >
> > It wasn't intentional, I ran out of time and wanted to get something out
> > there.  I think it certainly could be open ended though.  Where should
> the
> > REST API project be located?
> >
> > Ryan
> >
> > On Thu, Oct 13, 2016 at 7:32 PM, zeo...@gmail.com <zeo...@gmail.com>
> > wrote:
> >
> > > Along the lines of:
> > > • Must be deployed to a machine with adequate resources so that
> resource
> > > contention is avoided.
> > > • Will need network access to all other services within Metron
> > >
> > > Has there been any consideration of a "Metron Manager" node?  In the
> old
> > > TP2
> > > bare metal install guide
> > > <https://cwiki.apache.org/confluence/display/METRON/
> > > Metron+Installation+on+an+Ambari-Managed+Cluster>
> > > it mentions a "Metron Installer," but I could see the needs for that
> sort
> > > of a system expanding to have the following roles:
> > > - API
> > > - Metron UI
> > > - Metron Installer/upgrades
> > > - Edge/Gateway Node for data loading
> > > - Clients
> > >
> > > Also, at the end it ends mid-sentence under "Organization within
> Metron,"
> > > was that intended to be open ended?
> > >
> > > Jon
> > >
> > > On Thu, Oct 13, 2016 at 6:10 PM Ryan Merriman <merrim...@gmail.com>
> > wrote:
> > >
> > > > I created a Jira to track this new feature at
> > > > https://issues.apache.org/jira/browse/METRON-503.  I also started
> and
> > > > attached an architecture doc to that Jira with some of my ideas about
> > how
> > > > we should implement it.  Please feel free to review and comment or
> add
> > to
> > > > it.  Looking forward to everyone's ideas and feedback.
> > > >
> > > > Ryan Merriman
> > > >
> > > --
> > >
> > > Jon
> > >
> >
> > --
> >
> > Jon
> >
> --
>
> Jon
>


Re: [DISCUSS] Metron REST API Architecture and Design

2016-10-14 Thread Ryan Merriman
Jon,

It wasn't intentional, I ran out of time and wanted to get something out
there.  I think it certainly could be open ended though.  Where should the
REST API project be located?

Ryan

On Thu, Oct 13, 2016 at 7:32 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> Along the lines of:
> • Must be deployed to a machine with adequate resources so that resource
> contention is avoided.
> • Will need network access to all other services within Metron
>
> Has there been any consideration of a "Metron Manager" node?  In the old
> TP2
> bare metal install guide
> <https://cwiki.apache.org/confluence/display/METRON/
> Metron+Installation+on+an+Ambari-Managed+Cluster>
> it mentions a "Metron Installer," but I could see the needs for that sort
> of a system expanding to have the following roles:
> - API
> - Metron UI
> - Metron Installer/upgrades
> - Edge/Gateway Node for data loading
> - Clients
>
> Also, at the end it ends mid-sentence under "Organization within Metron,"
> was that intended to be open ended?
>
> Jon
>
> On Thu, Oct 13, 2016 at 6:10 PM Ryan Merriman <merrim...@gmail.com> wrote:
>
> > I created a Jira to track this new feature at
> > https://issues.apache.org/jira/browse/METRON-503.  I also started and
> > attached an architecture doc to that Jira with some of my ideas about how
> > we should implement it.  Please feel free to review and comment or add to
> > it.  Looking forward to everyone's ideas and feedback.
> >
> > Ryan Merriman
> >
> --
>
> Jon
>


Re: [DISCUSS] Improving quick-dev

2016-10-14 Thread Ryan Merriman
+1 I like it.  Just to clarify, the scripts to run Storm topologies locally
in an IDE should be available independent of the environment running.  No
need for a separate build/image.

On Fri, Oct 14, 2016 at 9:12 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> Going forward, the Demo env and data would have implications for testing as
> well ( gold data sets ) etc.
>
> On October 14, 2016 at 09:52:07, Nick Allen (n...@nickallen.org) wrote:
>
> I think based on everyone's input so far, we're describing 4 different
> builds/images/tools that would each be intended to run on a standard
> Mac/Linux/Windows laptop.
>
> Full Dev - A development environment that performs a full end-to-end
> deployment of Metron. This is intended for developers working with
> sensors, deployments, or validating how all Metron components interact with
> one another.
>
>
> - Starts from base Linux image
> - Installs Hadoop-y components
> - Installs Metron
> - Installs Sensors
> - Nothing started by default
>
> Quick Dev - An environment intended for the developer focusing on the
> streaming components of Metron; parsing, enrichment, and indexing.
>
>
> - Starts from base image of Linux + Hadoop-y components
> - Installs Metron
> - Installs "data generator" spouts
> - Does not install sensors
> - Nothing started by default
>
> Demo - An environment intended to introduce new users to Metron. The
> environment should go from nothing to plenty of data in the Metron
> dashboard in as little "boot" time as possible.
>
>
> - Starts from a base image including Linux + Hadoop-y + Metron + Data
> Generator Spouts pre-installed
> - Pre-load Elasticsearch indices so the user has plenty of data to
> investigate in the dashboard
> - Does not install sensors
> - Everything started by default
>
> Storm Local Cluster - Otto suggested some scripts/tooling to make it easy
> to launch the core topologies on a local Storm cluster running on the host
> OS.
>
>
> I'd be interested to hear if this works for everyone and how this might
> play into the Ambari mpack + RPM based deployment scheme.
>
>
> On Fri, Oct 14, 2016 at 1:45 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I think this may have come up in another PR already (have to look for
> it).
> > But maybe we could maintain our flexibility in quick-dev by installing
> the
> > sensors and not starting them until we need them. I think it's useful to
> > have a quick "genuine" e2e testing environment that doesn't require
> running
> > through a full install. I'm also not opposed to extracting the
> integration
> > test functionality into general purpose data generators.
> >
> > On Thu, Oct 13, 2016 at 8:31 PM, Nick Allen <n...@nickallen.org> wrote:
> >
> > > To Jon's point, I think it would be useful to have a Demo box that uses
> > > generators to produce 3 or 4 types of telemetry that shows up in the
> > Metron
> > > Dashboard. This box would be different from Quick-Dev in that
> everything
> > > starts automatically, so that a user just has to launch it and the
> should
> > > start seeing data in the Metron Dashboard right away. In fact, we could
> > > even pre-load the Elasticsearch indices so that the user has more of a
> > > history to mine when using the Demo box.
> > >
> > > On Thu, Oct 13, 2016 at 2:04 PM, zeo...@gmail.com <zeo...@gmail.com>
> > > wrote:
> > >
> > > > +1 Ryan and Otto's comments.
> > > >
> > > > I also strongly think we need to make a demo environment easier, but
> > that
> > > > should be different than quick-dev.
> > > >
> > > > Jon
> > > >
> > > > On Thu, Oct 13, 2016 at 1:15 PM Otto Fowler <ottobackwa...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > - create scripts/utilities to easily run a topology locally in an
> IDE
> > > > > instead of in the VM
> > > > >
> > > > >
> > > > >  THIS.
> > > > >
> > > > >
> > > > > On October 13, 2016 at 12:36:45, Ryan Merriman (
> merrim...@gmail.com)
>
> > > > > wrote:
> > > > >
> > > > > Working with the quick-dev vagrant VM recently left a lot to be
> > > desired.
> > > > > All forthcoming comments are made under the assumption that this VM
> > is
> > > > > intended for development purposes. If that is not true, I think we
> > > should
> &

Re: [VOTE] Casey Stella as a second release manager

2016-10-14 Thread Ryan Merriman
+1 (binding)

On Fri, Oct 14, 2016 at 11:56 AM, David Lyle  wrote:

> +1 (binding)
>
> On Fri, Oct 14, 2016 at 12:47 PM, Matt Foley 
> wrote:
>
> > +1 (non-binding)
> >
> > On 10/14/16, 9:39 AM, "James Sirota"  wrote:
> >
> > I would like to nominate Casey as our second release manager.  This
> > way we don't have a single person who is a bottle neck in doing releases.
> > Also, if anyone other committer or PPMC wants to volunteer please  post
> so
> > on the boards.
> >
> > The vote will run for 72 hours.  Please vote +1 for yes, -1 for no,
> > and 0 for not sure.
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> >
> >
> >
>


[DISCUSS] Metron REST API Architecture and Design

2016-10-13 Thread Ryan Merriman
I created a Jira to track this new feature at
https://issues.apache.org/jira/browse/METRON-503.  I also started and
attached an architecture doc to that Jira with some of my ideas about how
we should implement it.  Please feel free to review and comment or add to
it.  Looking forward to everyone's ideas and feedback.

Ryan Merriman


[DISCUSS] Move grok pattern from HDFS to Zookeeper

2016-10-10 Thread Ryan Merriman
Currently a file with a Grok pattern is stored in HDFS and referenced by
path in a Zookeeper parser config property.  I believe we can simplify this
and remove the HDFS dependency by just storing the Grok pattern directly in
a Zookeeper parser config property.  Does anyone see a problem with doing
this?

Ryan Merriman


Re: [DISCUSS] Opinionated Data Flows

2016-10-10 Thread Ryan Merriman
I think this is a great discussion.  I especially like the DSL examples
that are given and think we should expand on that.  The good news is that
we are not far away from being able to actually implement it.  It's just a
matter of transforming that syntax into the zookeeper configs that drive
the topologies.  I think the underlying issue here is that the zookeeper
configs are not intuitive and are hard to work with.  Making them simpler
or adding a layer on top that makes them simpler is necessary in my
opinion.

As for the edge cases that have come up and are mentioned in this
thread ("parse
heterogenous data from a single topic" and "enriched output to land in
unique topics by sensor type"), a simple enhancement could solve both of
those.  Right now the output topic for parser and enrichment topologies are
either passed in when building the topology (flux or constructor args) or
retrieved from zookeeper.  This limits you to 1 output topic per topology.
Expanding the KafkaWriter class to optionally pull the output topic from a
field in a parsed message or have it passed in as an input parameter to the
write method should make it flexible enough to route messages to different
topics.  Also this statement is not entirely true:  "You cannot use the
output of one enrichment as the input to another".  You can if you use a
Stellar enrichment bolt and HBase enrichments.  Geo and host enrichments
would either need to be exposed through Stellar, or even better, converted
to HBase enrichments.

I disagree with the idea that Metron should not be responsible for defining
data flows and I think that conflicts with the idea of abstracting out the
CEP component (Storm, Flink, etc).  There are patterns that emerge and
tricks the community finds through experience that should be baked in.  An
example of this is the enrichment topologies.  Grouping messages together
by enrichment keys before enrichment allows us to put a caching layer in
front which lightens the load on HBase and makes enrichment more
efficient.  If we put the responsibility of defining topologies on the
user, now they have to be an expert in tuning whatever CEP is chosen as
well as be knowledgable of established design patterns.  Maybe the current
state of Metron requires Storm tuning expertise anyways but I think we
should trend away from that and evolve Metron to be more capable of making
intelligent choices automatically.  I remember the early days of Hive
required careful consideration when writing queries to ensure the correct
joins where used, data was distributed evenly, etc.  Tuning Hive is easier
now because it has evolved to be able to make more of these choices
automatically without requiring users to have detailed knowledge of how
things work internally.

Ryan Merriman

On Fri, Oct 7, 2016 at 7:12 AM, Nick Allen <n...@nickallen.org> wrote:

> Whether it is explicit or implicit, I think that would be one of the major
> benefits of having the expressiveness of a DSL.  I can choose to have some
> enrichments run in parallel (the split/join that you are referring to) or
> have some enrichment runs serially.
>
> Having enrichments run serially is not something you can easily do with
> Metron today.  You cannot use the output of one enrichment as the input to
> another.
>
> As a simple example, I have a blacklist of countries for which my
> organization should not be doing business.  I need to use the IP to find
> the location and then use the location to match against a blacklist.  I
> need these enrichments to run serially.
>
> source("netflow")
>   -> parser("Netflow")
>   -> exists("ip_src_addr")
>   -> src_country = geo["ip_src_addr"].country
>   -> is_alert = blacklist["src_country"]
>   ...
>
>
>
>
> On Thu, Oct 6, 2016 at 6:25 PM, Matt Foley <mfo...@hortonworks.com> wrote:
>
> > Would splitting and joining be implicit or explicit, for multi-path
> > topologies?
> > 
> > From: zeo...@gmail.com <zeo...@gmail.com>
> > Sent: Thursday, October 06, 2016 11:03 AM
> > To: dev@metron.incubator.apache.org
> > Subject: Re: [DISCUSS] Opinionated Data Flows
> >
> > It should also be smart enough to handle an order like:
> >
> > source("bro")
> >   -> parser("BasicBroParser")
> >   -> exists("ip_src_addr")
> >   -> geo_ip_src = geo["ip_src_addr"]
> >   -> application = assets["ip_src_addr"].application
> >   -> owner = assets["ip_src_addr"].owner
> >   -> exists("ip_dst_addr")
> >   -> geo_ip_dst = geo["ip_dst_addr"]
> >   -> elasticsearch("bro-index")
> >
> > Without duplicate hits of the t

[DISCUSS] Dockerize Metron

2016-09-30 Thread Ryan Merriman
I would like to open up a discussion around creating Docker images for
Metron.  Having this available would provide a leaner alternative to the
ansible/vagrant environment for development tesing (and even demoing or
exploring features).  It could also relieve some of the dependency version
conflict issues that we've been experiencing when running integration tests
in a single JVM.

I would suggest the initial version be intended only for development and
testing purposes.  The general approach could be to create an image for
each service we depend on and use something like Docker compose to package
them together.  A Dockerfile would either install the service from scratch
or extend a community image then add any Metron related dependencies or
configurations on top.  The metron-deployment project code could be used as
a guide.

I would like to see these images added initially to support development and
testing:

   - Kafka with topics preconfigured
   - Storm with metron topology assets installed
   - Zookeeper with paths created and sample configs loaded
   - HBase with sample enrichments and threat intel loaded
   - Elasticsearch configured for Metron
   - MySQL with databases/tables/users created and geo data loaded

Other images that could also be useful:

   - Images for each sensor
   - Ambari?
   - Solr

Looking forward to hearing what everyone thinks.

Ryan Merriman


[DISCUSS] Metron REST API Requirements

2016-09-29 Thread Ryan Merriman
I would like to start a discussion around adding a REST API to Metron.  I
believe this is well worth the effort and would provide several benefits
including:

   - Giving users a uniform, well-tested and well-documented interface for
   interacting with Metron
   - Providing a layer of security (role-based, ACL, etc) around Metron
   functions
   - Enabling developers to build tools and interfaces on top of Metron
   without needing detailed knowledge of Metron internals

 The purpose of this discussion is to gather requirements around what the
REST API should include and the features it should provide.  Assuming we
decide a REST API is a worthy addition and can agree on a comprehensive
list of requirements, I will start a separate more technical discussion
around architecture and design.

I will start it off with a list of requirements that I think are
important.  The REST API should:

   - Be secure and adhere to modern web security standards
   - Provide a pluggable authentication mechanism
   - Be written in Java since the majority of Metron is written in Java
   - Be powered by a mature, open source web development framework with
   strong community support
   - Be as comprehensive as possible, meaning any task or function in
   Metron should be available in the REST API (are there exceptions?)
   - Have detailed documentation that always reflects the current version
   - Have the same level of installation tooling as Metron platform

The following is a initial list of functions I think should be included:

   - CRUD functions for parser, enrichment, and indexing zookeeper
   configurations
   - A function to retrieve versioning and auditing information for
   zookeeper configs
   - Functions for starting/stopping/enabling/disabling various Storm
   topologies
   - A function to retrieve sample data from various Kafka topics
   - A function to test Grok statements
   - A function to test parsers against sample data
   - A function to test enrichments against sample data
   - Functions that give status for the various components in Metron
   (HBase, Storm topologies, enrichments, etc)
   - A function that tests and runs Stellar statements against sample data.

I will continue to add to these lists as I think of more.  Looking forward
to hearing everyone’s ideas and input.

Ryan Merriman


Re: [VOTE] Adopting Bylaws

2016-08-17 Thread Ryan Merriman
+1 (binding)

On 8/17/16, 4:44 PM, "Casey Stella"  wrote:

>+1 (binding)
>
>On Wed, Aug 17, 2016 at 5:44 PM, David Lyle  wrote:
>
>> +1 (non-binding)
>> Adopt the bylaws as stated here
>> 
>>
>>.
>>
>> -D...
>>
>> On Wed, Aug 17, 2016 at 3:23 PM, Andrew Psaltis
>>
>> wrote:
>>
>> > +1  (non-binding)  Adopt the bylaws as stated here
>> > 
>>> >
>> >
>> > On Wed, Aug 17, 2016 at 2:49 PM, Casey Stella 
>> wrote:
>> >
>> > > This is a vote to adopt the bylaws as they are stated here
>> > > > Apache+Metron+Bylaws
>> > >.
>> > >
>> > > This is a procedural vote, so it is simple majority rules and there
>> will
>> > be
>> > > no vetoes (as directed by the ASF here
>> > > ).  PMC member votes
>> are
>> > > binding, but other votes are welcome to show support.  If you are
>>not a
>> > PMC
>> > > member, please indicate that your vote is non-binding.
>> > >
>> > > Voting methodology is as follows:
>> > >
>> > > [ ] +1 Adopt the bylaws as stated here
>> > > > Apache+Metron+Bylaws
>> > >.
>> > > [ ] -1 Do not adopt the bylaws as stated here
>> > > > Apache+Metron+Bylaws
>> > >
>> > >  because...
>> > >
>> > > This vote will be held open for 72 hours.
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Andrew
>> >
>> > Subscribe to my book: Streaming Data 
>> > 
>> > twiiter: @itmdata 
>> >
>>



Re: [VOTE] Releasing Apache Metron 0.2.0BETA-RC2

2016-07-07 Thread Ryan Merriman
+1 for the release

On 7/7/16, 11:21 AM, "Casey Stella"  wrote:

>whoops, +1 for the release.
>
>On Thu, Jul 7, 2016 at 8:55 AM, Billie Rinaldi  wrote:
>
>> +1 for the release.
>>
>> On Sat, Jul 2, 2016 at 7:58 PM, Casey Stella  wrote:
>>
>> > Ok, the new signature has been uploaded to
>> >
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.0BETA-RC2-inc
>>ubating/
>> > (same link as in the original announcement, so no change).  Sorry for
>>the
>> > delay. :)
>> >
>> > Best,
>> >
>> > Casey
>> >
>> > On Fri, Jul 1, 2016 at 5:28 PM, Casey Stella 
>>wrote:
>> >
>> > > We are going to re-sign and re-upload and let you know when it's
>>done.
>> > > On Fri, Jul 1, 2016 at 16:28 Billie Rinaldi 
>>wrote:
>> > >
>> > >> Huh, that's weird.  When I curl the file it comes out correctly as
>> > you've
>> > >> said, but if I click on the link and download the file with
>>Firefox it
>> > >> comes out binary (the file is slightly larger, too).  Chrome also
>> > >> downloads
>> > >> it correctly.  I haven't encountered this issue before, so not sure
>> > what's
>> > >> going on with Firefox.  In any case, the sha and md5 checksums look
>> > >> correct.  Can you verify that the key used for the signature is
>> 81FD4538
>> > >> and if so, add that to the KEYS file?
>> > >>
>> > >> On Fri, Jul 1, 2016 at 12:12 PM, Casey Stella 
>> > wrote:
>> > >>
>> > >> > Regarding the SHA, I'm happy to do it another way, but it
>>appears to
>> > be
>> > >> > non-binary for me when I pull it from the URL:
>> > >> >
>> > >> > {12:08}[system]~ ➭ curl
>> > >> > >>
>> > >> >
>> > >>
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.0BETA-RC2-inc
>>ubating/apache-metron-0.2.0BETA-RC2-incubating.tar.gz.sha
>> > >> > >
>> > >> > > apache-metron-0.2.0BETA-RC2-incubating.tar.gz:
>> > >> > >
>> > >> > > 0172 824E AC5C 6CCA 71F9  9E8A E35D 95B1 B339 6821
>> > >> > >
>> > >> > >
>> > >> > Is it supposed to look different?
>> > >> >
>> > >> > I've added the md5sum.
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Fri, Jul 1, 2016 at 11:56 AM, Billie Rinaldi
>>
>> > >> wrote:
>> > >> >
>> > >> > > The sha file is a binary file, so something went wrong in the
>> > >> creation of
>> > >> > > this file. This also happened for the last release, so I guess
>>the
>> > gpg
>> > >> > > command is not working; perhaps you should switch to shasum.
>>There
>> > is
>> > >> no
>> > >> > > md5 checksum file, which is supposed to be present. Also, I
>>could
>> > not
>> > >> > > verify the asc signature. It says the signature was created
>>with
>> key
>> > >> > > 81FD4538, which does not exist in the KEYS file.
>> > >> > >
>> > >> > > The tarball matches the tag, and I took a quick glance over the
>> > >> licenses
>> > >> > > and they seemed okay. Disclaimer, filename, build all look good
>> too.
>> > >> > >
>> > >> > > On Fri, Jun 24, 2016 at 4:19 PM, James Sirota
>>> >
>> > >> > wrote:
>> > >> > >
>> > >> > > > his is a call to
>> > >> > > > vote on releasing Apache Metron 0.2.0BETA-RC2 incubating
>> > >> > > > Full list of changes in this release:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.0BETA-RC2-inc
>>ubating/CHANGES
>> > >> > > >
>> > >> > > > The tag/commit to be voted upon is Metron_0.2.0BETA_rc2:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=commit;h
>>=5fb4dda0e385ba030455db4c7d1290f872b688ce
>> > >> > > >
>> > >> > > > The source archive being voted upon can be found here:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.0BETA-RC2-inc
>>ubating/apache-metron-0.2.0BETA-RC2-incubating.tar.gz
>> > >> > > >
>> > >> > > > Other release files, signatures and digests can be found
>>here:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.0BETA-RC2-inc
>>ubating
>> > >> > > >
>> > >> > > > The release artifacts are signed with the following key:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob_pla
>>in;f=KEYS;hb=refs/tags/Metron_0.2.0BETA_rc2
>> > >> > > >
>> > >> > > >
>> > >> > > > Please vote on releasing this package as Apache Metron
>> > 0.2.0BETA-RC2
>> > >> > > > incubating
>> > >> > > >
>> > >> > > > When voting, please list the actions taken to verify the
>> release.
>> > >> > > > Recommended build validation and verification instructions
>>are
>> > >> posted
>> > >> > > here:
>> > >> > > >
>> > https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>> > >> > > >
>> > 

Re: [DISCUSS] Project reorganization

2016-04-20 Thread Ryan Merriman
Sheetal, 

Thank you for the input.  We appreciate all the hard work you and others
put into OpenSOC to get us to where we are today.

To your points:

- Agreed on reevaluating the bolts that now ship with Storm.  I believe
the HDFS and HBase bolts didn’t quite provide all the functionality needed
and is the reason for custom implementationsm but I will defer to others
who actually worked on those tasks.
- Agreed on changing HbaseConverter to HBaseConverter.  I will update the
spreadsheet.
- Agreed on a common package for HBase related classes.  We should look
more closely at this, any suggestions are welcome of course.
- There is a reason Solr and Elasticsearch classes ended up in separate
projects.  The supported version of Elasticsearch (1.7.4) is a couple
years old and the supported version of Solr is recent.  Locating these in
the same project is challenging because they both depend on very different
versions of Lucene.  Once we update Elasticsearch to a more recent version
such that it depends on the same Lucene version as Solr, keeping them
together in the same project should be much easier.
- Parsers and Enrichments are now decoupled, whereas before they were
included in the same topology.  Now they run in different topologies and
are deployed in separate jars.
- Agreed on the categories.  I believe some in your list are already
represented in the proposed project structure.  “Data Acquisition” is
analogous to the top level “metron-sensors” project. “Data Access” is
represented by the top level “metron-ui” project and the “metron-api”
project within the top level “metron-platform” project.  I like your idea
of having “Active Analysis” and “Deep Analytics” projects as well.  The
real-time pieces are represented in various sub projects in
“metron-platform” but I think there will eventually be a need for a “Deep
Analytics” project which is missing.  Maybe we should include a
“metron-analytics” project under “metron-platform”?  If not now, in the
future when we deliver more functionality in this area?

Ryan Merriman

On 4/19/16, 3:48 PM, "Sheetal Dolas" <sheetal.do...@gmail.com> wrote:

>Some of HBase bolt related classes were created in OpenSoc as that time
>Storm's HBase bolt did not have all necessary features (ability to add
>custom configs, enable/disable WAL, easy tuple mapping etc.). It should be
>re-evaluated to see if we can leverage the these components from Storm
>itself so as to avoid additional maintenance.
>
>Some observations and pointers for more thoughts:
>* HbaseConverter should be H*B*aseConverter to match other cases.
>* org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but
>other hbase components are in hbase package.
>* It may be better to have project structure on functional grouping than
>mix of function + implementation choices for example solr, and es probably
>could be packages than sub modules. (Unless the intention is to support
>more such "pluggable" indexing mechanisms at any given point)
>* parsers/enrichments, are they expected to be reused across multiple
>projects? If yes, are they different from common? If not, should they be
>packages instead?
>* From deployment perspective essentially there following broader
>categories
>1. Data Acquisition (pcap, nifi, flume, kafka writer etc.)
>2. Active Analysis (real time pieces - kafka, storm topology, bolts,
>parsers, enrichments, alerts etc)
>3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark
>related components)
>4. Data Access (apis, UI etc)
>
>Would it make sense to create project structure in such functional
>groupings?
>
>
>On Mon, Apr 18, 2016 at 1:46 PM, James Sirota <jsir...@hortonworks.com>
>wrote:
>
>> Hi Ryan,
>>
>> This is great.  You should attach this to the Jira when you are ready to
>> commit the reorg so we know which parts shifted.
>>
>> Thanks,
>> James
>>
>>
>>
>>
>> On 4/18/16, 1:30 PM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>>
>> >Thanks Frank.  I’ve updated those in the spreadsheet.
>> >
>> >On 4/18/16, 3:27 PM, "Frank Lu" <y...@hortonworks.com> wrote:
>> >
>> >>As of now, I think the following classes are not used:
>> >>
>> >>
>> >>
>> >>
>> >>Metron-EnrichmentAdapters
>> >>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
>> >>
>> >>
>> >>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
>> >>
>> >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
>> >>
>> >>
>> >>Metron-DataLoads
>> >>org.apache.metron.dataloads.cif.HBaseTableLoad.jav

Re: [DISCUSS] Project reorganization

2016-04-18 Thread Ryan Merriman
Thanks Frank.  I’ve updated those in the spreadsheet.

On 4/18/16, 3:27 PM, "Frank Lu" <y...@hortonworks.com> wrote:

>As of now, I think the following classes are not used:
>
>
> 
> 
>Metron-EnrichmentAdapters
>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
> 
> 
>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
>
>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
>
>
>Metron-DataLoads
>org.apache.metron.dataloads.cif.HBaseTableLoad.java
>   
>
>Thanks,
>Frank Lu
>
>
>
>
>On 4/18/16, 3:05 PM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>
>>All,
>>
>>I put together a list of all the project java assets that details where
>>they will be moved (or potentially deleted) as part of the project
>>reorganization.  Feedback welcome.
>>
>>Ryan Merriman 
>>
>>On 4/13/16, 9:42 AM, "James Sirota" <jsir...@hortonworks.com> wrote:
>>
>>>I would have configs as a project but rather as a folder structure that
>>>other modules can point to
>>>
>>>Thanks,
>>>James 
>>>
>>>
>>>
>>>
>>>On 4/13/16, 7:32 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>>>
>>>>James brings up a good point.  I propose adding another project under
>>>>metron-platform called metron-configuration.  This would be a fairly
>>>>lightweight project that would contain anything related to
>>>>configuration
>>>>(property files, json files, flux files, etc).
>>>>
>>>>On 4/13/16, 8:56 AM, "James Sirota" <jsir...@hortonworks.com> wrote:
>>>>
>>>>>+1 from me.
>>>>>
>>>>>I would also like to address the configs and make sure the configs are
>>>>>in
>>>>>the same place.  Do you have ideas on where we would put those?
>>>>>
>>>>>Thanks,
>>>>>James 
>>>>>
>>>>>
>>>>>
>>>>>On 4/13/16, 6:50 AM, "Ryan Merriman" <rmerri...@hortonworks.com>
>>>>>wrote:
>>>>>
>>>>>>Thank you for all the feedback everyone.  I will attempt to summarize
>>>>>>all
>>>>>>the input we¹ve received and update my initial proposal.  We can
>>>>>>discuss
>>>>>>further if anyone is still unclear and I will volunteer to capture
>>>>>>all
>>>>>>the
>>>>>>details in a document of some kind once we all come to a consensus.
>>>>>>
>>>>>>Looks like everyone is in agreement for the top level projects.  Nick
>>>>>>is
>>>>>>working on a task that will require an addition top level project so
>>>>>>I
>>>>>>am
>>>>>>going to add that in as well:
>>>>>>
>>>>>>metron-deployment
>>>>>>metron-platform
>>>>>>metron-ui
>>>>>>metron-sensors
>>>>>>
>>>>>>All of these except metron-platform are well understood and don¹t
>>>>>>warrant
>>>>>>any more discussion.  For metron-platform there seem to be 2 areas
>>>>>>that
>>>>>>are not as clear:
>>>>>>
>>>>>>- whether we need a common project
>>>>>>- how do we organize test related code
>>>>>>
>>>>>>I agree with David and others that a common project will likely get
>>>>>>misused and could become unnecessary bloated.  But I suspect there
>>>>>>will
>>>>>>be
>>>>>>cases where we have common code being used across multiple projects
>>>>>>(is
>>>>>>already happening).  In this case we will either need this common
>>>>>>project
>>>>>>or we will have to keep common code in one of the other projects and
>>>>>>have
>>>>>>all other projects extend that. For the latter, an example would be
>>>>>>keeping common code in enrichment and having parsers declare
>>>>>>enrichment
>>>>>>as
>>>>>>a dependency.  There are a couple downsides I see with this approach:
>>>>>>
>>>>>>- parser topology jars now bring along all the enrichment
>>>>>

Re: [DISCUSS] Project reorganization

2016-04-18 Thread Ryan Merriman
All,

I put together a list of all the project java assets that details where
they will be moved (or potentially deleted) as part of the project
reorganization.  Feedback welcome.

Ryan Merriman 

On 4/13/16, 9:42 AM, "James Sirota" <jsir...@hortonworks.com> wrote:

>I would have configs as a project but rather as a folder structure that
>other modules can point to
>
>Thanks,
>James 
>
>
>
>
>On 4/13/16, 7:32 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>
>>James brings up a good point.  I propose adding another project under
>>metron-platform called metron-configuration.  This would be a fairly
>>lightweight project that would contain anything related to configuration
>>(property files, json files, flux files, etc).
>>
>>On 4/13/16, 8:56 AM, "James Sirota" <jsir...@hortonworks.com> wrote:
>>
>>>+1 from me.
>>>
>>>I would also like to address the configs and make sure the configs are
>>>in
>>>the same place.  Do you have ideas on where we would put those?
>>>
>>>Thanks,
>>>James 
>>>
>>>
>>>
>>>On 4/13/16, 6:50 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>>>
>>>>Thank you for all the feedback everyone.  I will attempt to summarize
>>>>all
>>>>the input we¹ve received and update my initial proposal.  We can
>>>>discuss
>>>>further if anyone is still unclear and I will volunteer to capture all
>>>>the
>>>>details in a document of some kind once we all come to a consensus.
>>>>
>>>>Looks like everyone is in agreement for the top level projects.  Nick
>>>>is
>>>>working on a task that will require an addition top level project so I
>>>>am
>>>>going to add that in as well:
>>>>
>>>>metron-deployment
>>>>metron-platform
>>>>metron-ui
>>>>metron-sensors
>>>>
>>>>All of these except metron-platform are well understood and don¹t
>>>>warrant
>>>>any more discussion.  For metron-platform there seem to be 2 areas that
>>>>are not as clear:
>>>>
>>>>- whether we need a common project
>>>>- how do we organize test related code
>>>>
>>>>I agree with David and others that a common project will likely get
>>>>misused and could become unnecessary bloated.  But I suspect there will
>>>>be
>>>>cases where we have common code being used across multiple projects (is
>>>>already happening).  In this case we will either need this common
>>>>project
>>>>or we will have to keep common code in one of the other projects and
>>>>have
>>>>all other projects extend that. For the latter, an example would be
>>>>keeping common code in enrichment and having parsers declare enrichment
>>>>as
>>>>a dependency.  There are a couple downsides I see with this approach:
>>>>
>>>>- parser topology jars now bring along all the enrichment dependencies
>>>>- since more code from various projects are being packaged together,
>>>>version conflicts are more likely and poms become more complicated due
>>>>to
>>>>all the necessary exclusions
>>>>
>>>>My thinking is that any jar file being deployed should only contain
>>>>what
>>>>it needs.  Curious what others think here.  My vote would be to
>>>>maintain
>>>>a
>>>>common project (or whatever we want to call it) and be diligent about
>>>>not
>>>>letting project-specific code slip in there.
>>>>
>>>>I believe Nick was the first person to ask the question about projects
>>>>related to test code and why we would need separate test and
>>>>integration
>>>>test.  The reason for this is that our integration-test classes
>>>>currently
>>>>depend on other projects (not surprising since they are integration
>>>>tests).  If there are utilities we want make available to all projects
>>>>(mock classes, utilities for reading sample data, etc) then it can¹t
>>>>live
>>>>in integration-test because that will introduce circular dependencies.
>>>>If
>>>>it is possible to refactor our current Metron-Testing project so that
>>>>it
>>>>doesn¹t depend on any other projects, then we can keep utilities here.
>>>>Otherwise we need a

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Ryan Merriman
James brings up a good point.  I propose adding another project under
metron-platform called metron-configuration.  This would be a fairly
lightweight project that would contain anything related to configuration
(property files, json files, flux files, etc).

On 4/13/16, 8:56 AM, "James Sirota" <jsir...@hortonworks.com> wrote:

>+1 from me.
>
>I would also like to address the configs and make sure the configs are in
>the same place.  Do you have ideas on where we would put those?
>
>Thanks,
>James 
>
>
>
>On 4/13/16, 6:50 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>
>>Thank you for all the feedback everyone.  I will attempt to summarize all
>>the input we¹ve received and update my initial proposal.  We can discuss
>>further if anyone is still unclear and I will volunteer to capture all
>>the
>>details in a document of some kind once we all come to a consensus.
>>
>>Looks like everyone is in agreement for the top level projects.  Nick is
>>working on a task that will require an addition top level project so I am
>>going to add that in as well:
>>
>>metron-deployment
>>metron-platform
>>metron-ui
>>metron-sensors
>>
>>All of these except metron-platform are well understood and don¹t warrant
>>any more discussion.  For metron-platform there seem to be 2 areas that
>>are not as clear:
>>
>>- whether we need a common project
>>- how do we organize test related code
>>
>>I agree with David and others that a common project will likely get
>>misused and could become unnecessary bloated.  But I suspect there will
>>be
>>cases where we have common code being used across multiple projects (is
>>already happening).  In this case we will either need this common project
>>or we will have to keep common code in one of the other projects and have
>>all other projects extend that. For the latter, an example would be
>>keeping common code in enrichment and having parsers declare enrichment
>>as
>>a dependency.  There are a couple downsides I see with this approach:
>>
>>- parser topology jars now bring along all the enrichment dependencies
>>- since more code from various projects are being packaged together,
>>version conflicts are more likely and poms become more complicated due to
>>all the necessary exclusions
>>
>>My thinking is that any jar file being deployed should only contain what
>>it needs.  Curious what others think here.  My vote would be to maintain
>>a
>>common project (or whatever we want to call it) and be diligent about not
>>letting project-specific code slip in there.
>>
>>I believe Nick was the first person to ask the question about projects
>>related to test code and why we would need separate test and integration
>>test.  The reason for this is that our integration-test classes currently
>>depend on other projects (not surprising since they are integration
>>tests).  If there are utilities we want make available to all projects
>>(mock classes, utilities for reading sample data, etc) then it can¹t live
>>in integration-test because that will introduce circular dependencies.
>>If
>>it is possible to refactor our current Metron-Testing project so that it
>>doesn¹t depend on any other projects, then we can keep utilities here.
>>Otherwise we need a separate project for testing utilities.  I suspect
>>removing other project dependencies from Metron-Testing will prove more
>>difficult than it¹s worth so my vote would be to have 2 test related
>>projects.
>>
>>So here is where our metron-platform organization stands:
>>
>>metron-common *
>>metron-integration-test *
>>metron-test-utilities *
>>metron-data-management
>>metron-pcap
>>metron-parsers
>>metron-enrichment
>>  metron-solr
>>  metron-elasticsearch
>>metron-api
>>
>>* may or may not change depending on the outcome of this discussion
>>
>>Thoughts?
>>
>>Ryan Merriman
>>
>>
>>On 4/11/16, 4:15 PM, "Debojyoti Dutta" <ddu...@gmail.com> wrote:
>>
>>>If you load up your Irc client just type
>>>/join #apache-metron-dev
>>>
>>>Sent from my iPhone
>>>
>>>> On Apr 11, 2016, at 12:06 PM, James Sirota <jsir...@hortonworks.com>
>>>>wrote:
>>>> 
>>>> Great, thanks, Debo.  Where can I find instructions on how to get to
>>>>it?
>>>> 
>>>> Thanks,
>>>> James 
>>>> 
>>>> 
>>>> 
>>>> 
>

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Ryan Merriman
To answer a couple of other questions people asked:

Debo, agreed having clear extension points is going to be extremely
important for us.  Currently we have well defined interfaces for parsers
and enrichment adapters as well as the ability to load data into and drive
enrichments (threat intels) from HBase tables with well defined key
structures.  Eventually we will want to extend this to models.  Maybe an
analytical project makes sense when we get to that point?

Debo and James, yes my vision for the metron-api project is a standard
interface for interacting with Metron.  This would include everything from
data access (pcap service) to security and beyond.

David, let’s explore the best way to leverage the dependencyManagement
section in our top level pom.  I think you’re on to something there.  Our
maven implementation needs a thorough review as well.

Ryan Merriman



On 4/13/16, 8:50 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:

>Thank you for all the feedback everyone.  I will attempt to summarize all
>the input we¹ve received and update my initial proposal.  We can discuss
>further if anyone is still unclear and I will volunteer to capture all the
>details in a document of some kind once we all come to a consensus.
>
>Looks like everyone is in agreement for the top level projects.  Nick is
>working on a task that will require an addition top level project so I am
>going to add that in as well:
>
>metron-deployment
>metron-platform
>metron-ui
>metron-sensors
>
>All of these except metron-platform are well understood and don¹t warrant
>any more discussion.  For metron-platform there seem to be 2 areas that
>are not as clear: 
>
>- whether we need a common project
>- how do we organize test related code
>
>I agree with David and others that a common project will likely get
>misused and could become unnecessary bloated.  But I suspect there will be
>cases where we have common code being used across multiple projects (is
>already happening).  In this case we will either need this common project
>or we will have to keep common code in one of the other projects and have
>all other projects extend that. For the latter, an example would be
>keeping common code in enrichment and having parsers declare enrichment as
>a dependency.  There are a couple downsides I see with this approach:
>
>- parser topology jars now bring along all the enrichment dependencies
>- since more code from various projects are being packaged together,
>version conflicts are more likely and poms become more complicated due to
>all the necessary exclusions
>
>My thinking is that any jar file being deployed should only contain what
>it needs.  Curious what others think here.  My vote would be to maintain a
>common project (or whatever we want to call it) and be diligent about not
>letting project-specific code slip in there.
>
>I believe Nick was the first person to ask the question about projects
>related to test code and why we would need separate test and integration
>test.  The reason for this is that our integration-test classes currently
>depend on other projects (not surprising since they are integration
>tests).  If there are utilities we want make available to all projects
>(mock classes, utilities for reading sample data, etc) then it can¹t live
>in integration-test because that will introduce circular dependencies.  If
>it is possible to refactor our current Metron-Testing project so that it
>doesn¹t depend on any other projects, then we can keep utilities here.
>Otherwise we need a separate project for testing utilities.  I suspect
>removing other project dependencies from Metron-Testing will prove more
>difficult than it¹s worth so my vote would be to have 2 test related
>projects.
>
>So here is where our metron-platform organization stands:
>
>metron-common *
>metron-integration-test *
>metron-test-utilities *
>metron-data-management
>metron-pcap
>metron-parsers
>metron-enrichment
>   metron-solr
>   metron-elasticsearch
>metron-api
>
>* may or may not change depending on the outcome of this discussion
>
>Thoughts?
>
>Ryan Merriman
>
>
>On 4/11/16, 4:15 PM, "Debojyoti Dutta" <ddu...@gmail.com> wrote:
>
>>If you load up your Irc client just type
>>/join #apache-metron-dev
>>
>>Sent from my iPhone
>>
>>> On Apr 11, 2016, at 12:06 PM, James Sirota <jsir...@hortonworks.com>
>>>wrote:
>>> 
>>> Great, thanks, Debo.  Where can I find instructions on how to get to
>>>it?
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
>>>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" <dedu...@cisco.com

[DISCUSS] Project reorganization

2016-04-08 Thread Ryan Merriman
All,

I would like to propose a review and refactor of the current project 
organization within Metron.  Much of the way the legacy code was organized does 
not make sense anymore and could be designed so that it is easier to navigate 
and understand.  Our test coverage has increased substantially so I believe we 
can do this with confidence.

First off, I think we should agree on a naming convention.  I see some projects 
(YARN and Storm for example) that prepend the sub-project with the name of the 
top-level project (storm-core for example).  Metron also currently does this 
(Metron-Common).  I think that's fine, although in the case of Metron, I feel 
like having "Metron" prepended is redundant.  Regardless of whether we decide 
to stick with that approach, I propose that project names be uniform and 
lowercase.  For example, under these assumptions "Metron-Common" would change 
to "common".

The first level of organization makes sense to me.  Only change I would make 
would be to project names:

  *   deployment
  *   streaming
  *   ui

Or if we want to keep metron in project names:

  *   metron-deployment
  *   metron-streaming
  *   metron-ui

For now I don't see any changes necessary in deployment or ui organization.  I 
see the streaming project structure primarily driven by 2 things:  the Maven 
dependency tree and deployment targets.  For example, solr and elasticsearch 
code should be separated (because their dependency on lucene conflicts) but 
both will depend on common enrichment code.  Also, now that parser, enrichment 
and pcap topologies are separate, code for those topologies will be deployed as 
separate jars.  No reason to include parser code in enrichment topologies and 
vice-versa.  Any other considerations I'm missing?

With that being said, here is my initial proposal:

  *   common -  Any common code that all topologies depend on (configuration 
classes, generic writers for example).  No dependencies on other Metron 
projects.
  *   test - Contains utilities for writing unit tests, sample configs and 
sample data.  Will depend on common.
  *   integration-test - Contains utilities and classes needed to run our 
integration tests (in memory components for example).  Will depend on common 
and test.
  *   dataload - Contains all code related to data loading.  Will also include 
any property files needed and integration tests.  Will depend on common, test 
(test scope), and integration-test (test scope).
  *   parser - All code specific to the parser topologies.  Would also include 
scripts, property files, flux files and parser topology integration tests.  
This project will depend on common, test (test scope), and integration-testing 
(test scope).
  *   enrichment - All code specific to the enrichment topologies (except solr 
and elasticsearch).  Would also include scripts, property files, flux files and 
enrichment topology integration tests.  This project will depend on common, 
test (test scope), and integration-test (test scope).
  *   elasticsearch - All Elasticsearch related code.  Will depend on 
enrichment.
  *   solr - All Solr related code.  Will depend on enrichment.
  *   pcap - All code specific to the topology dedicated to pcap.  Would also 
include scripts, property files, flux files and pcap integration test.  This 
project will depend on common, test (test scope) and integration-test (test 
scope).
  *   api - This will serve as a generic replacement for Metron-Pcap_Service.  
Will contain all code to build a Metron web service middle layer that can 
expose APIs through REST or other client protocols.  Could possibly depend on 
all other projects or separated further if version conflicts arise (separate 
api projects for solr and elasticsearch for example).

Looking forward to hearing everyone's feedback and great ideas.

Ryan Merriman


Re: [Vote] release of Metron_0.1BETA_rc7

2016-04-04 Thread Ryan Merriman
+1 (binding)

On 4/4/16, 11:16 AM, "Casey Stella"  wrote:

>+1 (binding)
>
>On Mon, Apr 4, 2016 at 12:15 PM, James Sirota 
>wrote:
>
>> + 1 (binding)
>>
>>
>>
>>
>> On 4/4/16, 9:08 AM, "James Sirota"  wrote:
>>
>> >This is a call to vote on releasing Apache Metron 0.1BETA-RC7
>> >
>> >Full list of changes in this release:
>> >
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC7-incub
>>ating/CHANGES
>> <
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/CHANGES
>> >
>> >
>> >The tag/commit to be voted upon is Metron_0.1BETA_rc7:
>> >
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=commit;h
>>=ad3866bdf4b6233950e7803c3c3141f0f859e994
>> >
>> >The source archive being voted upon can be found here:
>> >
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC7-incub
>>ating/apache-metron-0.1BETA-RC7-incubating.tar.gz
>> <
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/apache-metron-0.1BETA-RC6-incubating.tar.gz
>> >
>> >
>> >Other release files, signatures and digests can be found here:
>> >
>> >
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC7-incub
>>ating/
>> <
>> 
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/
>> >
>> >
>> >The release artifacts are signed with the following key:
>> >
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob_pla
>>in;f=KEYS;hb=dc59e37e402bd868aeac7ab42a0cc9c51ccae3c2
>> >
>> >The Nexus staging repository for this release will be created after
>>this
>> vote has been passed.
>> >
>> >
>> >Please vote on releasing this package as Apache Metron 0.1BETA-RC7.
>> >
>> >When voting, please list the actions taken to verify the release.
>> >Recommended build validation and verification instructions are posted
>> here:
>> >https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>> >
>> >This vote will be open for at least 72 hours.
>> >
>> >[ ] +1 Release this package as Apache Metron 0.1BETA-RC7
>> >[ ]  0 No opinion
>> >[ ] -1 Do not release this package because...
>>



Re: [Vote] Release of Apache Metron 0.1BETA-RC6

2016-03-30 Thread Ryan Merriman
+1 (binding)

On 3/30/16, 11:00 AM, "James Sirota"  wrote:

>+ 1 (binding) 
>
>
>
>
>On 3/30/16, 8:58 AM, "James Sirota"  wrote:
>
>>This is a call to vote on releasing Apache Metron 0.1BETA-RC6
>>
>>Full list of changes in this release:
>>
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/CHANGES
>>
>>The tag/commit to be voted upon is Metron_0.1BETA_rc6:
>>
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=commit;h
>>=8973c9b5889586e70213e526cd58e33165192db0
>>
>>The source archive being voted upon can be found here:
>>
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/apache-metron-0.1BETA-RC6-incubating.tar.gz
>>
>>Other release files, signatures and digests can be found here:
>>
>>https://dist.apache.org/repos/dist/dev/incubator/metron/0.1BETA-RC6-incub
>>ating/
>>
>>The release artifacts are signed with the following key:
>>
>>https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob_pla
>>in;f=KEYS;hb=dc59e37e402bd868aeac7ab42a0cc9c51ccae3c2
>>
>>The Nexus staging repository for this release will be created after this
>>vote has been passed.
>>
>>
>>Please vote on releasing this package as Apache Metron 0.1BETA-RC6.
>>
>>When voting, please list the actions taken to verify the release.
>>Recommended build validation and verification instructions are posted
>>here:
>>https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>>
>>This vote will be open for at least 72 hours.
>>
>>[ ] +1 Release this package as Apache Metron 0.1BETA-RC6
>>[ ]  0 No opinion
>>[ ] -1 Do not release this package because...



Re: [VOTE] Metron_0.1BETA release

2016-02-16 Thread Ryan Merriman
+1

> On Feb 16, 2016, at 3:18 PM, James Sirota  wrote:
> 
> I am putting up for a vote our first Apache release.  Many thanks to all who 
> have contributed.  As previously discussed we will be on a monthly release 
> cadence.  This is the delayed Jan build (delayed due to setting up internal 
> infrastructure and has nothing to do with Metron).  The release branch will 
> be created upon completion of this vote.
> 
> The following Jiras are included:
> 
> METRON-28 Metron should compile with JUnit 4
> METRON-21 Ansible Deployment Scripts
> METRON-12 Switch packages from "opensoc" to "metron"
> METRON-9 Add a .gitignore file
> METRON-5 Minor tweaks to website
> METRON-3 Replace opensoc-streaming version 0.4BETA with 0.6BETA
> METRON-2 Create automated deployment scripts for Metron components
> METRON-1 Create a website
> 
> Thanks,
> James


Re: [GitHub] incubator-metron pull request: Deployment

2016-01-26 Thread Ryan Merriman
Here is the new pull request with the modifications:

https://github.com/apache/incubator-metron/pull/15


This only includes the additions our team contributed since METRON-2.

On 1/26/16, 8:39 AM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:

>Mark,
>
>We went back and merged METRON-2 into the master branch so your pull
>request is in the commit history now.  Let me know if something doesn’t
>look right.  I am going to add the most recent deployment changes on top
>of that and submit a new pull request.  We would love your feedback since
>you are experienced with Ansible.
>
>
>Sorry for the misunderstanding.  Some of us are new to this process and
>the fact that your pull request didn’t get merged before the big refactor
>tripped us up.
>
>Ryan Merriman
>
>On 1/25/16, 8:54 PM, "Ryan Merriman" <rmerri...@hortonworks.com> wrote:
>
>>Really sorry about that Mark.  I was so focused on getting everything
>>working that I wasn¹t thinking about the commit history.  I will get that
>>fixed.  I promise it wasn¹t intentional.
>>
>>On 1/25/16, 8:27 PM, "Mark Bittmann" <m...@b23.io> wrote:
>>
>>>Ryan,
>>>
>>>It looks like you took all the code from PR #3 resubmitted it as your
>>>own
>>>in PR #14. Obviously you put in a ton of extra work on top of it, and
>>>I'm
>>>excited to see you build off of it, but I would be personally
>>>disappointed to not get any credit for laying the foundation for all of
>>>the Hadoop components.
>>>
>>>https://github.com/apache/incubator-metron/pull/3
>>>
>>>I would think it would be better for the community to have multiple
>>>people contributing incrementally. Wouldn't it make more sense to pull
>>>in
>>>PR #3 rather than copying the code into a new PR and losing the history?
>>>
>>>Mark
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 1/25/16, 8:38 PM, "merrimanr" <g...@git.apache.org> wrote:
>>>
>>>>Github user merrimanr commented on the pull request:
>>>>
>>>>
>>>>https://github.com/apache/incubator-metron/pull/14#issuecomment-1747614
>>>>3
>>>>3
>>>>  
>>>>https://issues.apache.org/jira/browse/METRON-21
>>>>
>>>>
>>>>---
>>>>If your project is set up for it, you can reply to this email and have
>>>>your
>>>>reply appear on GitHub as well. If your project does not have this
>>>>feature
>>>>enabled and wishes so, or if the feature is enabled but not working,
>>>>please
>>>>contact infrastructure at infrastruct...@apache.org or file a JIRA
>>>>ticket
>>>>with INFRA.
>>>>---
>>
>>
>