from:"Matt Foley"

build on Centos 7 seems broken on metron-config

2017-04-13 Thread Matt Foley

Has anyone built recently on Centos 7 ?  I’m trying on a fairly vanilla Centos 
7 box, and getting error in metron-config.

It fails to download node-zopfli@2.0.2 and has some other gyp errors, which are 
supposed to be optional so okay, and indeed the build attempts to continue.  
But then it downloads PhantomJS from 
https://github.com/Medium/phantomjs/releases/download/v2.1.1/phantomjs-2.1.1-linux-x86_64.tar.bz2

and fails to untar it.  This fails the build.

 

I can’t untar it either, the file may be corrupted.  It is 23 MB (23415665)

 

Looking at https://github.com/Medium/phantomjs/releases/ , it seems they are 
now up to 2.1.14.  2.1.8 was in July 2016, and no earlier 2.1 version is 
listed.  Any reason we’re calling out such an old one?

 

Thanks for any input.

--Matt

Re: [DISCUSS] Extracting Stellar as a component/module

2017-04-11 Thread Matt Foley

I’ve copied it to the cwiki, but the thing is that cwiki only allows comments
at the bottom. With a long doc like this, that’s not very good. I’d much
rather keep everyone’s comments in the same system, and local to the text
they’re commenting on.

Is it okay to leave this in google doc?

If anyone can’t abide logging in to google, the cwiki version is here:

https://cwiki.apache.org/confluence/display/METRON/Extracting+Stellar+into+an+Independent+Module

Thanks,

--Matt

On 4/11/17, 3:15 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:

No, actually you’re right. Will have it moved over shortly.

On 4/11/17, 2:56 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote:

Nevermind

On April 11, 2017 at 17:47:57, Otto Fowler (ottobackwa...@gmail.com)
wrote:

Can’t we do this in confluence?

On April 11, 2017 at 17:38:40, Matt Foley (ma...@apache.org) wrote:

Hi all,

This is a new discussion thread, and if the proposed change is accepted
by

the community, it will be submitted to the next release, not the current

0.4.0 branch.

Stellar has 126 verbs today, and seems only likely to continue growing.

Furthermore, we expect Stellar to be extended by users, and probably
grow

into having one or more Registry/ Repositories, etc. All this suggests
that

we should start viewing Stellar itself as a component, and make sure it
is

maintainable and has clean interfaces to the rest of the system. And
that

will be easier if we extract it into its own module, both in the code
tree

and in maven.

I’ve written a combination proposal / discussion about how to extract

Stellar from its current deep embed in Metron. Comments are welcome, and

encouraged. Please read:

https://docs.google.com/document/d/1EP7Jt4ePHe2A-_oboLl2QbN1muh7uKeET_kbpIgjcJM/edit#heading=h.4vsrmths49wk

I believe I’ve set access so anyone can read and comment on it. However,

google docs may still ask you log in with a google-registered email

address. If this is a problem for anyone, let me know and I can send
you a

Word document.

Thanks,

--Matt

[DISCUSS] Extracting Stellar as a component/module

2017-04-11 Thread Matt Foley

Hi all,
This is a new discussion thread, and if the proposed change is accepted by the 
community, it will be submitted to the next release, not the current 0.4.0 
branch.

Stellar has 126 verbs today, and seems only likely to continue growing.  
Furthermore, we expect Stellar to be extended by users, and probably grow into 
having one or more Registry/ Repositories, etc.  All this suggests that we 
should start viewing Stellar itself as a component, and make sure it is 
maintainable and has clean interfaces to the rest of the system.  And that will 
be easier if we extract it into its own module, both in the code tree and in 
maven.

I’ve written a combination proposal / discussion about how to extract Stellar 
from its current deep embed in Metron.  Comments are welcome, and encouraged.  
Please read:
https://docs.google.com/document/d/1EP7Jt4ePHe2A-_oboLl2QbN1muh7uKeET_kbpIgjcJM/edit#heading=h.4vsrmths49wk

I believe I’ve set access so anyone can read and comment on it.  However, 
google docs may still ask you log in with a google-registered email address.  
If this is a problem for anyone, let me know and I can send you a Word document.

Thanks,
--Matt

Re: [DISCUSS] next release proposal

2017-04-11 Thread Matt Foley

Hi all,
Looks to me like the vast majority of the material mentioned below has been 
committed.  There are still 8 recent PRs that need review and, hopefully, 
commit.

I’m going to go ahead and make a release branch, with the understanding that 
any further commits (especially but not limited to Kerberization, Metron-UI, 
Metron Management UI, or Mpack support), that come in over the next 36 hours or 
so will still be included in the RC.

Does that meet everyone’s needs?  I want to get started because it will 
probably take a day or more just to create the branch, an RC build, and start 
the sanity testing.

There’s enough major new stuff here that I’m going to call it 0.4.0.  Is that 
also okay with everyone?

Thanks,
--Matt

On 4/5/17, 6:23 PM, "Ali Nazemian" <alinazem...@gmail.com> wrote:

Dear Metron Devs,

As Metron users/customers, we are very keen to have all high priority
related features/bugs to the Security as well as Metron-UI and Metron
Management-UI.

Thanks,
Ali

On Thu, Apr 6, 2017 at 8:04 AM, Ryan Merriman <merrim...@gmail.com> wrote:

> We just finished responding to the first round of feedback so I don't 
think
> we're that far away on METRON-623.
>
> On Wed, Apr 5, 2017 at 3:30 PM, Matt Foley <ma...@apache.org> wrote:
>
> > Totally agree would be good to have MPack support.  Let’s see how it
> > goes.  Wouldn’t want to cut it out for the sake of a day or two.
> >
> > On 4/5/17, 1:14 PM, "Justin Leet" <justinjl...@gmail.com> wrote:
> >
> > I've made fairly good progress on
> > https://issues.apache.org/jira/browse/METRON-799 (The MPack should
> > function
> > in a kerberized cluster).  The PR itself might cut close to the
> > deadline,
> > and in particular might be tough to get reviewed in time.
> >
> > I'll do a best effort attempt to get it in to make our Kerberos 
story
> > more
> > complete, but I'd say the release can go on without this (and we use
> > manual
> > Kerberos in its absence).
> >
> > Justin
> >
> > On Wed, Apr 5, 2017 at 4:07 PM, Matt Foley <ma...@apache.org> wrote:
> >
> > > Sure.  To be clear, I wasn’t proposing an exclusive list, just
> > making the
> > > argument that there seemed to be enough to proceed with.  Any duly
> > > committed content in the master branch, at the time we create the
> > first RC
> > > (ie, some time after METRON-623 goes in, but not before Monday)
> will
> > surely
> > > be included in the RC, unless something has a bug that can’t be
> > readily
> > > resolved.
> > >
> > > Thanks,
> > > --Matt
> > >
> > > On 4/5/17, 12:56 PM, "David Lyle" <dlyle65...@gmail.com> wrote:
> > >
> > > I'm working on METRON-826 right now. I'll have a PR up today 
or
> > > tomorrow at
> > > the latest. I'd like to see it go as well.
> > >
> > > https://issues.apache.org/jira/browse/METRON-826
> > >
> > > -D...
> > >
> > >
> > > On Wed, Apr 5, 2017 at 3:52 PM, Nick Allen <n...@nickallen.org
> >
> > wrote:
> > >
> > > > I would like to include #509 with the Fastcapa 
improvements..
> > > Already have
> > > > a +1.  I'm just letting it soak giving others some time to
> > review if
> > > they
> > > > feel so inclined.
> > > >
> > > > https://github.com/apache/incubator-metron/pull/509
> > > >
> > > >
> > > > On Wed, Apr 5, 2017 at 3:50 PM, James Sirota <
> > jsir...@apache.org>
> > > wrote:
> > > >
> > > > > I second this.  I want to see 623 go in in addition to the
> > > kerberos work.
> > > > > When both are in I think it makes sense to do the release
> > > > >
> > > > > 04.04.2017, 11:33, "Simon Elliston Ball" <
> > > si...@simonellistonball.com>:
> > > > > > I'd really like to see METRON-623 (the ui) get into the

Re: [DISCUSS] New Stellar Functions

2017-04-10 Thread Matt Foley

Hi Kyle,
For now at least, it seems to me that something with as much core usefulness, 
and presumably non-customer-specific, as string concat should go into 
metron-common.  That said, I suspect Mike is right that JOIN would be 
sufficient.  But if that fits your use case poorly, just say so.

--Matt

On 4/10/17, 10:13 AM, "Michael Miklavcic"  wrote:

Hey Kyle,

It probably belongs here -

https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java
There is an existing JOIN function for strings - might this suit your
needs? I didn't see a unit test for it, so it would probably be good for us
to backfill with a test here as well. I can submit a PR for it, or if
you're already in that code, you're welcome to also -

https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java

e.g.
Object joined = run("JOIN(['A','B','C','D'], ':')",new HashedMap());
System.out.println(joined);
Object joined2 = run("JOIN(['A','B','C','D'], '')",new HashedMap());
System.out.println(joined2);

Output I get is:
A:B:C:D
ABCD

AFA where to put these functions, I believe we have a number of options now
that we have the ability to add to the storm topology classpath and
sideload jars.
- https://github.com/apache/incubator-metron/pull/204
- https://github.com/apache/incubator-metron/pull/468

I think that if the functions are unique to a customer, they should
probably be built as a stand-alone Maven project. I believe Otto is working
on this ATM if I'm not mistaken. If there is universal (across all
functions in the system, whether parsing, analytics or otherwise) then they
should probably go in with the dsl package in metron-common. At some point
we might want to make Stellar its own module, but there is some work there.

Best,
Mike

On Sun, Apr 9, 2017 at 2:26 PM, Kyle Richardson 
wrote:

> I have the need for a new Stellar function to perform string 
concatenation.
> I have it implemented but am curious about where new functions should live
> given the new capabilities around 3rd party Stellar function libraries.
>
> So, I guess my question is, should this function live in:
> 1) metron-common with the other string functions
> 2) another metron project
> 3) as a standalone project and not part of the metron source tree
>
> While I'm specifically asking about this case, I think it's also 
worthwhile
> that we think about where other new functions should live in the long 
term.
>
> Thanks!
>
> -Kyle
>

Re: [DISCUSS] next release proposal

2017-04-05 Thread Matt Foley

Totally agree would be good to have MPack support.  Let’s see how it goes.  
Wouldn’t want to cut it out for the sake of a day or two.

On 4/5/17, 1:14 PM, "Justin Leet" <justinjl...@gmail.com> wrote:

I've made fairly good progress on
https://issues.apache.org/jira/browse/METRON-799 (The MPack should function
in a kerberized cluster).  The PR itself might cut close to the deadline,
and in particular might be tough to get reviewed in time.

I'll do a best effort attempt to get it in to make our Kerberos story more
complete, but I'd say the release can go on without this (and we use manual
Kerberos in its absence).

Justin

On Wed, Apr 5, 2017 at 4:07 PM, Matt Foley <ma...@apache.org> wrote:

> Sure.  To be clear, I wasn’t proposing an exclusive list, just making the
> argument that there seemed to be enough to proceed with.  Any duly
> committed content in the master branch, at the time we create the first RC
> (ie, some time after METRON-623 goes in, but not before Monday) will 
surely
> be included in the RC, unless something has a bug that can’t be readily
> resolved.
>
> Thanks,
> --Matt
>
> On 4/5/17, 12:56 PM, "David Lyle" <dlyle65...@gmail.com> wrote:
>
> I'm working on METRON-826 right now. I'll have a PR up today or
> tomorrow at
> the latest. I'd like to see it go as well.
>
> https://issues.apache.org/jira/browse/METRON-826
>
> -D...
>
>
> On Wed, Apr 5, 2017 at 3:52 PM, Nick Allen <n...@nickallen.org> wrote:
>
> > I would like to include #509 with the Fastcapa improvements..
> Already have
> > a +1.  I'm just letting it soak giving others some time to review if
> they
> > feel so inclined.
> >
> > https://github.com/apache/incubator-metron/pull/509
> >
> >
> > On Wed, Apr 5, 2017 at 3:50 PM, James Sirota <jsir...@apache.org>
> wrote:
> >
> > > I second this.  I want to see 623 go in in addition to the
> kerberos work.
> > > When both are in I think it makes sense to do the release
> > >
> > > 04.04.2017, 11:33, "Simon Elliston Ball" <
> si...@simonellistonball.com>:
> > > > I'd really like to see METRON-623 (the ui) get into the release.
> It
> > > feels like the current PR review is getting close, and that
> getting it in
> > > then focussing on follow on tasks in a separate release would work
> well.
> > > >
> > > > I would be all for getting a release out if only for the
> Kerberos work.
> > > >
> > > > Simon
> > > >
> > > >>  On 4 Apr 2017, at 20:15, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
> > > >>
> > > >>  How far out is the management UI?
> > > >>
> > > >>  Jon
> > > >>
> > > >>>  On Tue, Apr 4, 2017, 2:09 PM Matt Foley <ma...@apache.org>
> wrote:
> > > >>>
> > > >>>  Hi all,
> > > >>>  Although it’s only been a few weeks since the last release 
was
> > finally
> > > >>>  published, that process started in January :-)
> > > >>>  Also, the last commit in 0.3.1 was Feb 23, and there’s been a
> ton of
> > > >>>  really cool new stuff added since then:
> > > >>>
> > > >>>  Biggest items:
> > > >>>  - Multiple commits for REST API (base Jira: METRON-503)
> > > >>>  - Multiple commits to work with Kerberized (secure) clusters
> (mult.
> > > Jiras)
> > > >>>
> > > >>>  Other major new features:
> > > >>>  - METRON-690: DSL-based sparse time window specification for
> > Profiler
> > > >>>  - METRON-733: Remove Geo db from ParserBolt
> > > >>>  - METRON-686: Record rule set that fired during Threat Triage
> > > >>>  - METRON-743: Sort files when reading results from Pcap
> > > >>>  - METRON-701: Triage metrics produced by Profiler
> > > >>>  - METRON-744: Stellar external functions loaded f

Re: [DISCUSS] The bro kafka plugin

2017-04-05 Thread Matt Foley

Browsing https://git.apache.org/ shows lots of examples.  A few are quite 
prolific.
 

On 4/5/17, 1:00 PM, "Nick Allen" <n...@nickallen.org> wrote:

Does anyone know any other Apache projects that are using multiple repos?
I'd like to see what they've done just so we don't break convention.






On Wed, Apr 5, 2017 at 3:22 PM, Nick Allen <n...@nickallen.org> wrote:

> Yes, I will open an INFRA ticket.  Just give me a little time to research
> what we need.
>
> On Wed, Apr 5, 2017 at 1:29 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:
>
>> Okay great, thanks.  Would you mind throwing in an INFRA ticket for the
>> new
>> repo?  I can take it all from there.
>>
>> Does anybody know if we have ASF resources to help answer the above legal
>> question?
>>
>> Jon
>>
>> On Wed, Apr 5, 2017 at 1:26 PM Nick Allen <n...@nickallen.org> wrote:
>>
>> > (1) I am not sure if licensing is a problem here.
>> >
>> > (2) I am OK with whatever we need to get this effort done and under 
ASF.
>> >
>> >
>> > On Wed, Apr 5, 2017 at 1:12 PM, zeo...@gmail.com <zeo...@gmail.com>
>> wrote:
>> >
>> > > I'm working on this
>> > > <https://github.com/JonZeolla/incubator-metron/tree/METRON-348> in
>> > > preparation for the new repo and migration to a package.  It looks
>> like
>> > in
>> > > bro-plugins COPYING
>> > > <https://github.com/bro/bro-plugins/blob/master/kafka/COPYING>
>> > attributes
>> > > to Nick, but in our version COPYING
>> > > <https://github.com/apache/incubator-metron/tree/master/
>> > > metron-sensors/bro-plugin-kafka/COPYING>
>> > > points to the Apache License.  Same with MAINTAINER (this
>> > > <https://github.com/apache/incubator-metron/blob/master/
>> > > metron-sensors/bro-plugin-kafka/MAINTAINER>
>> > > vs this <https://github.com/bro/bro-plugins/blob/master/kafka/MAINTA
>> INER
>> > > >).
>> > > I assume when we package this up and host it in Apache we need to
>> give it
>> > > the Apache license, and point to Metron for MAINTAINER.  My questions
>> > are:
>> > >
>> > > 1.  Is there any legal/licensing concern here?  I am taking changes
>> from
>> > > the bro-plugins version and pulling it into the Apache-hosted code.
>> > IANAL
>> > > 2.  Nick - are you OK with these changes?
>> > >
>> > > Jon
>> > >
>> > > On Mon, Apr 3, 2017 at 3:50 PM zeo...@gmail.com <zeo...@gmail.com>
>> > wrote:
>> > >
>> > > > Can someone on the PMC submit a ticket to INFRA?  It looks like
>> > > > <https://www.apache.org/dev/infra-contact> committers aren't
>> supposed
>> > > to.
>> > > >
>> > > > Jon
>> > > >
>> > > > On Fri, Mar 31, 2017 at 4:23 PM zeo...@gmail.com <zeo...@gmail.com>
>> > > wrote:
>> > > >
>> > > > I would be happy to try it again but I attempted to do that before
>> with
>> > > > bro packages and it failed to be able to handle it.  I also tried
>> using
>> > > > branches of a repo with bro but that similarly failed (and was a
>> pretty
>> > > bad
>> > > > idea to start with).
>> > > >
>> > > > Jon
>> > > >
>> > > > On Fri, Mar 31, 2017, 3:24 PM Matt Foley <ma...@apache.org> wrote:
>> > > >
>> > > > We should be able to request just one alternate repo from INFRA, 
and
>> > put
>> > > a
>> > > > top hierarchical level in it that doesn’t include a maven pom.  As
>> far
>> > as
>> > > > maven and clients are concerned, it
>> > > >
>> > > > just increases by 1 the path length to the root of the repo.
>> > > >
>> > > > On 3/31/17, 10:30 AM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
>> > > >
>> > > > Once we agree on a repo location to host

Re: [DISCUSS] next release proposal

2017-04-05 Thread Matt Foley

Sure.  To be clear, I wasn’t proposing an exclusive list, just making the 
argument that there seemed to be enough to proceed with.  Any duly committed 
content in the master branch, at the time we create the first RC (ie, some time 
after METRON-623 goes in, but not before Monday) will surely be included in the 
RC, unless something has a bug that can’t be readily resolved.

Thanks,
--Matt

On 4/5/17, 12:56 PM, "David Lyle" <dlyle65...@gmail.com> wrote:

I'm working on METRON-826 right now. I'll have a PR up today or tomorrow at
the latest. I'd like to see it go as well.

https://issues.apache.org/jira/browse/METRON-826

-D...


On Wed, Apr 5, 2017 at 3:52 PM, Nick Allen <n...@nickallen.org> wrote:

> I would like to include #509 with the Fastcapa improvements..  Already 
have
> a +1.  I'm just letting it soak giving others some time to review if they
> feel so inclined.
>
> https://github.com/apache/incubator-metron/pull/509
>
>
> On Wed, Apr 5, 2017 at 3:50 PM, James Sirota <jsir...@apache.org> wrote:
>
> > I second this.  I want to see 623 go in in addition to the kerberos 
work.
> > When both are in I think it makes sense to do the release
> >
> > 04.04.2017, 11:33, "Simon Elliston Ball" <si...@simonellistonball.com>:
> > > I'd really like to see METRON-623 (the ui) get into the release. It
> > feels like the current PR review is getting close, and that getting it 
in
> > then focussing on follow on tasks in a separate release would work well.
> > >
> > > I would be all for getting a release out if only for the Kerberos 
work.
> > >
> > > Simon
> > >
> > >>  On 4 Apr 2017, at 20:15, zeo...@gmail.com <zeo...@gmail.com> wrote:
> > >>
> > >>  How far out is the management UI?
> > >>
> > >>  Jon
> > >>
> > >>>  On Tue, Apr 4, 2017, 2:09 PM Matt Foley <ma...@apache.org> wrote:
> > >>>
> > >>>  Hi all,
> > >>>  Although it’s only been a few weeks since the last release was
> finally
> > >>>  published, that process started in January :-)
> > >>>  Also, the last commit in 0.3.1 was Feb 23, and there’s been a ton 
of
> > >>>  really cool new stuff added since then:
> > >>>
> > >>>  Biggest items:
> > >>>  - Multiple commits for REST API (base Jira: METRON-503)
> > >>>  - Multiple commits to work with Kerberized (secure) clusters (mult.
> > Jiras)
> > >>>
> > >>>  Other major new features:
> > >>>  - METRON-690: DSL-based sparse time window specification for
> Profiler
> > >>>  - METRON-733: Remove Geo db from ParserBolt
> > >>>  - METRON-686: Record rule set that fired during Threat Triage
> > >>>  - METRON-743: Sort files when reading results from Pcap
> > >>>  - METRON-701: Triage metrics produced by Profiler
> > >>>  - METRON-744: Stellar external functions loaded from HDFS (and huge
> > >>>  speed-up for function resolution)
> > >>>  - METRON-694: Index errors from Topologies, and
> > >>>  - METRON-745: Create Error dashboards
> > >>>  - METRON-712: Separate eval from parse in Stellar
> > >>>  - METRON-765: Add GUID to messages
> > >>>  - METRON-793: Updated to storm-kafka-client spout
> > >>>
> > >>>  We’ve also had numerous bug fixes, docs improvements, and
> > improvements to
> > >>>  deployment tools (docker, ansible, mpack, quickdev, and fulldev).
> > >>>
> > >>>  I think the REST API and Kerberization, by themselves, would
> justify a
> > >>>  release. Along with the others, I’d like to propose that we make a
> > release
> > >>>  soon. The time frame I had in mind was at the end of this week I
> > could cut
> > >>>  a release branch (so on-going work in master doesn’t get blocked)
> and
> > start
> > >>>  the process of generating an RC.
> > >>>
> > >>>  What do you-all think?
> > >>>  Also, what additional work do you think should be included in this
> > >>>  release, and can it realistically get done by the end of this week?
> > The
> > >>>  time frame is, of course, flexible at the pleasure of the community
> –
> > but
> > >>>  also, there will be another release in another couple months or so,
> > so no
> > >>>  need to rush stuff.
> > >>>
> > >>>  Thanks,
> > >>>  --Matt
> > >>>
> > >>>  --
> > >>
> > >>  Jon
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>

Re: [DISCUSS] next release proposal

2017-04-05 Thread Matt Foley

Ok, 820 just went in this morning, and sounds like there’s no problem with 509 
being in by maybe Monday?
Consensus seems to be to wait for METRON-623 (Management UI) also, so that’s 
what I’ll do.  Any projection about how long that will be?  Review seems to be 
active, so hopefully not too many days to go.

Thanks for everybody’s input. We’ll check status on Monday.
--Matt

On 4/5/17, 12:52 PM, "Nick Allen" <n...@nickallen.org> wrote:

I would like to include #509 with the Fastcapa improvements..  Already have
a +1.  I'm just letting it soak giving others some time to review if they
feel so inclined.

https://github.com/apache/incubator-metron/pull/509


On Wed, Apr 5, 2017 at 3:50 PM, James Sirota <jsir...@apache.org> wrote:

> I second this.  I want to see 623 go in in addition to the kerberos work.
> When both are in I think it makes sense to do the release
>
> 04.04.2017, 11:33, "Simon Elliston Ball" <si...@simonellistonball.com>:
> > I'd really like to see METRON-623 (the ui) get into the release. It
> feels like the current PR review is getting close, and that getting it in
> then focussing on follow on tasks in a separate release would work well.
> >
> > I would be all for getting a release out if only for the Kerberos work.
> >
> > Simon
> >
> >>  On 4 Apr 2017, at 20:15, zeo...@gmail.com <zeo...@gmail.com> wrote:
> >>
> >>  How far out is the management UI?
> >>
> >>  Jon
> >>
> >>>  On Tue, Apr 4, 2017, 2:09 PM Matt Foley <ma...@apache.org> wrote:
> >>>
> >>>  Hi all,
> >>>  Although it’s only been a few weeks since the last release was 
finally
> >>>  published, that process started in January :-)
> >>>  Also, the last commit in 0.3.1 was Feb 23, and there’s been a ton of
> >>>  really cool new stuff added since then:
> >>>
> >>>  Biggest items:
> >>>  - Multiple commits for REST API (base Jira: METRON-503)
> >>>  - Multiple commits to work with Kerberized (secure) clusters (mult.
> Jiras)
> >>>
> >>>  Other major new features:
> >>>  - METRON-690: DSL-based sparse time window specification for Profiler
> >>>  - METRON-733: Remove Geo db from ParserBolt
> >>>  - METRON-686: Record rule set that fired during Threat Triage
> >>>  - METRON-743: Sort files when reading results from Pcap
> >>>  - METRON-701: Triage metrics produced by Profiler
> >>>  - METRON-744: Stellar external functions loaded from HDFS (and huge
> >>>  speed-up for function resolution)
> >>>  - METRON-694: Index errors from Topologies, and
> >>>  - METRON-745: Create Error dashboards
> >>>  - METRON-712: Separate eval from parse in Stellar
> >>>  - METRON-765: Add GUID to messages
> >>>  - METRON-793: Updated to storm-kafka-client spout
> >>>
> >>>  We’ve also had numerous bug fixes, docs improvements, and
> improvements to
> >>>  deployment tools (docker, ansible, mpack, quickdev, and fulldev).
> >>>
> >>>  I think the REST API and Kerberization, by themselves, would justify 
a
> >>>  release. Along with the others, I’d like to propose that we make a
> release
> >>>  soon. The time frame I had in mind was at the end of this week I
> could cut
> >>>  a release branch (so on-going work in master doesn’t get blocked) and
> start
> >>>  the process of generating an RC.
> >>>
> >>>  What do you-all think?
> >>>  Also, what additional work do you think should be included in this
> >>>  release, and can it realistically get done by the end of this week?
> The
> >>>  time frame is, of course, flexible at the pleasure of the community –
> but
> >>>  also, there will be another release in another couple months or so,
> so no
> >>>  need to rush stuff.
> >>>
> >>>  Thanks,
> >>>  --Matt
> >>>
> >>>  --
> >>
> >>  Jon
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>

Re: [GitHub] incubator-metron pull request #512: METRON-824 site-book generation is broke...

2017-04-04 Thread Matt Foley

Heh! I submitted similar patch one minute ago!  
Great minds must think alike :-)
--Matt

On 4/4/17, 6:22 PM, "JonZeolla"  wrote:

GitHub user JonZeolla opened a pull request:

https://github.com/apache/incubator-metron/pull/512

METRON-824 site-book generation is broken

## Contributor Comments
Currently, if you attempt to build the site-book on master you get the 
following error:
```
ERROR OR ERRORS DETECTED:
ERROR: Header specification character (#) detected with indenting.  
This is presumed to be an error, since it will render as text. If intentional, 
put a period or other printable character before it.
on line: 16 in file: ./metron-sensors/bro-plugin-kafka/index.md
# curl -L 
https://github.com/edenhill/librdkafka/archive/v0.9.4.tar.gz | tar xvz
```

In order to fix this I changed 
`metron-sensors/bro-plugin-kafka/index.md` to use triple backticks instead of 
indentation to indicate the code block.  This allows it to parse properly in 
GitHub MD as well as our Doxia site-book docs.


## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron (Incubating).  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [X] Is there a JIRA ticket associated with this PR? If not one needs 
to be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [X] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [X] Has your PR been rebased against the latest commit within the 
target branch (typically master)?

### For documentation related changes:
- [X] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  bin/generate-md.sh
  mvn site:site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for 
build issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set 
up for your personal repository such that your branches are built there before 
submitting a pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JonZeolla/incubator-metron METRON-824

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #512


commit 5379d8c4f44f0bb10760a13cdef4618bd83803b7
Author: Jon Zeolla 
Date:   2017-04-05T01:10:45Z

Use backticks instead of indentation for codeblocks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] next release proposal

2017-04-04 Thread Matt Foley

>> another incubator release?

It depends on the timing.  The last release took 3 weeks and 5 RCs to get 
approval as a release within the Metron community.  Hopefully this one won’t be 
as hard, but it’s likely it will take a couple weeks, especially since this is 
my first cycle as Release Manager. If, as hoped, the Board graduates us to TLP 
at the Apr 19 meeting, this could become our first release as a TLP.

On the other hand, if it goes really fast, we could go ahead and submit it as 
another incubator release.  And if graduation comes in the middle, we’d just 
withdraw that and re-vote as a TLP, on our own authority as delegated by the 
Board.

Basically, either way works out.
Thanks,
--Matt

From: Otto Fowler <ottobackwa...@gmail.com>
Date: Tuesday, April 4, 2017 at 11:43 AM
To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>, Matt 
Foley <ma...@apache.org>
Subject: Re: [DISCUSS] next release proposal

So this would be another incubator release?


On April 4, 2017 at 14:09:58, Matt Foley (ma...@apache.org) wrote:
Hi all, 
Although it’s only been a few weeks since the last release was finally 
published, that process started in January :-) 
Also, the last commit in 0.3.1 was Feb 23, and there’s been a ton of really 
cool new stuff added since then: 

Biggest items: 
- Multiple commits for REST API (base Jira: METRON-503) 
- Multiple commits to work with Kerberized (secure) clusters (mult. Jiras) 

Other major new features: 
- METRON-690: DSL-based sparse time window specification for Profiler 
- METRON-733: Remove Geo db from ParserBolt 
- METRON-686: Record rule set that fired during Threat Triage 
- METRON-743: Sort files when reading results from Pcap 
- METRON-701: Triage metrics produced by Profiler 
- METRON-744: Stellar external functions loaded from HDFS (and huge speed-up 
for function resolution) 
- METRON-694: Index errors from Topologies, and 
- METRON-745: Create Error dashboards 
- METRON-712: Separate eval from parse in Stellar 
- METRON-765: Add GUID to messages 
- METRON-793: Updated to storm-kafka-client spout 

We’ve also had numerous bug fixes, docs improvements, and improvements to 
deployment tools (docker, ansible, mpack, quickdev, and fulldev). 

I think the REST API and Kerberization, by themselves, would justify a release. 
Along with the others, I’d like to propose that we make a release soon. The 
time frame I had in mind was at the end of this week I could cut a release 
branch (so on-going work in master doesn’t get blocked) and start the process 
of generating an RC. 

What do you-all think? 
Also, what additional work do you think should be included in this release, and 
can it realistically get done by the end of this week? The time frame is, of 
course, flexible at the pleasure of the community – but also, there will be 
another release in another couple months or so, so no need to rush stuff. 

Thanks, 
--Matt

[DISCUSS] next release proposal

2017-04-04 Thread Matt Foley

Hi all,
Although it’s only been a few weeks since the last release was finally 
published, that process started in January :-)
Also, the last commit in 0.3.1 was Feb 23, and there’s been a ton of really 
cool new stuff added since then:

Biggest items:
- Multiple commits for REST API (base Jira: METRON-503)
- Multiple commits to work with Kerberized (secure) clusters (mult. Jiras) 

Other major new features:
- METRON-690: DSL-based sparse time window specification for Profiler
- METRON-733: Remove Geo db from ParserBolt
- METRON-686: Record rule set that fired during Threat Triage
- METRON-743: Sort files when reading results from Pcap
- METRON-701: Triage metrics produced by Profiler
- METRON-744: Stellar external functions loaded from HDFS (and huge speed-up 
for function resolution)
- METRON-694: Index errors from Topologies, and
- METRON-745: Create Error dashboards
- METRON-712: Separate eval from parse in Stellar
- METRON-765: Add GUID to messages
- METRON-793: Updated to storm-kafka-client spout

We’ve also had numerous bug fixes, docs improvements, and improvements to 
deployment tools (docker, ansible, mpack, quickdev, and fulldev).

I think the REST API and Kerberization, by themselves, would justify a release. 
 Along with the others, I’d like to propose that we make a release soon.  The 
time frame I had in mind was at the end of this week I could cut a release 
branch (so on-going work in master doesn’t get blocked) and start the process 
of generating an RC.

What do you-all think?
Also, what additional work do you think should be included in this release, and 
can it realistically get done by the end of this week?  The time frame is, of 
course, flexible at the pleasure of the community – but also, there will be 
another release in another couple months or so, so no need to rush stuff.

Thanks,
--Matt

Re: [DISCUSS] The bro kafka plugin

2017-03-31 Thread Matt Foley

We should be able to request just one alternate repo from INFRA, and put a top 
hierarchical level in it that doesn’t include a maven pom.  As far as maven and 
clients are concerned, it 

just increases by 1 the path length to the root of the repo.

On 3/31/17, 10:30 AM, "zeo...@gmail.com"  wrote:

Once we agree on a repo location to host this, I would be happy to put
together the package and update our environments to use bro-pkg to install
the plugin.  I have created METRON-813
 to track this and
changed METRON-348  to be
a sub-task.

Otto - the bro packages model doesn't allow colocation with anything else.
That said, if we have two similar situations, and given the INFRA example
 Casey linked to before
was requesting 9 repos, perhaps we just request two repos.  Would someone
else mind putting that request in?

Jon

On Fri, Mar 31, 2017 at 12:49 PM Otto Fowler 
wrote:

Could we create a separate repo for more than on thing?  like put … um
let’s say
a maven plugin and the bro plugin?

On March 31, 2017 at 12:30:25, Nick Allen (n...@nickallen.org) wrote:

I agree with everything that I've read.

One of the guys from Bro had contacted me a while back, letting me know
that the packaging mechanism in Bro was ready for public consumption. I
just have not had cycles to do anything with it yet. They are not wanting
to host any of the plugins.

I thought the package mechanism requires that a package live within its own
repo (which Casey confirmed). This put me in a bind on how to tackle
this. I don't want to personally host the plugin in my own Github repo. I
would prefer that we host it in a community repo; either Bro or Metron.
Since Bro is moving away from hosting their own plugins, that leaves
Metron.

It would be great if we could create a separate repo for the plugin. That
solves the challenge of using the packaging mechanism.

We do need to reconcile what is in bro/bro-plugins and what is in Metron.
There are some enhancements that I and others have made that never made it
back into Metron. They never made it back, because the original plan was
just to switch to using the plugin from bro/bro-plugins before the idea of
a packaging mechanism hit Bro. Reconciling should be fairly easy to see by
just doing a diff.

It would be great if others want to take on any of that work. I would be
glad to offer any support that you need. Thanks, Jon!

On Thu, Mar 30, 2017 at 11:20 PM, zeo...@gmail.com 
wrote:

> Ok, great.
>
> I agree, I definitely want to hear from Nick on the topic. My team is
> currently looking into enhancing the plugin as well to potentially allow
> sending to multiple clusters, investigating some issues we see when our
bro
> cluster is under load, turn it into a package, etc.
>
> The work you just did was on our to do list as well so I'm very excited
to
> see it come through.
>
> Jon
>
> On Thu, Mar 30, 2017, 11:16 PM Casey Stella  wrote:
>
> I *think* it's possible. People do ask for mirrors of directories from
> time to time (see https://issues.apache.org/jira/browse/INFRA-7060). If
> we
> think this is a good idea, we can pose it to INFRA as a request. I'd love
> to see us be able to use the bro packaging infrastructure and get more
> visibility for the plugin.
>
> I'd be particularly interested in Nick's opinion on this, though.
>
> On Thu, Mar 30, 2017 at 11:12 PM, zeo...@gmail.com 
> wrote:
>
> > You can version packages -
> > http://bro-package-manager.readthedocs.io/en/stable/
> package.html#package-
> > versioning
> >
> > I agree that having a separate repo provided by Apache would be
optimal,
> I
> > just don't know the process for that or if it was even reasonable to
> > suggest.
> >
> > Jon
> >
> > On Thu, Mar 30, 2017, 11:01 PM Casey Stella  wrote:
> >
> > > Looking at the bro packages, it appears that bro is expecting things
to
> > be
> > > its own git repository. I wonder if we could either request INFRA
> > provide
> > > another repo for the bro-kafka plugin and integrate it into metron as
a
> > git
> > > submodule *or* if we could request INFRA to create a github mirror of
> the
> > > metron-sensors/bro-kafka-plugin directory. I'm not sure how viable
> > either
> > > of those options are, frankly.
> > >
> > > One thing that I didn't see is how do

Re: configuration update during runtime

2017-03-31 Thread Matt Foley

Hi Moshe,
Dynamic configuration, with parameters stored in Zookeeper, is actually quite 
complex if you follow it from end to end.  Here’s a brief description:
• The Metron configuration parameter mechanism uses Apache Curator to subscribe 
to ZK nodes of interest, and maintain a TreeCache with background, 
asynchronous, atomic updates.  Whenever someone writes new parameter values to 
ZK, ZK notifies the Curator client, which copies the new values into the 
TreeCache.  That’s the bit of code you referenced below.  The pub/sub model 
supported by ZK and used by Curator, means that this is very low cost, without 
active polling.
• For parameters that may be changed without requiring a topology restart, 
Metron always reads them from the parameter mechanism, rather than storing the 
values in local variables.  This sentence needs some commentary:
o Some parameters DO require a topology restart, in particular if they interact 
with Storm settings.  Updates to these values are ignored, or only have partial 
effect, until restart.  The documents are clear about which settings can be 
updated dynamically.
o Reading from the TreeCache is not significantly more expensive than reading 
from an ordinary Map object.  So going back to the parameter mechanism instead 
of storing the values in local variables is a small cost.
o The TreeCache uses Concurrent data structures where appropriate to assure 
that edits happen atomically and at low cost, so you don’t need to worry about 
reading inconsistent values from the parameter mechanism.
• The net result is a fully dynamic configuration capability for a broad set of 
independently updatable parameters.

Hope this helps,
--Matt

On 3/31/17, 5:53 AM, "moshe jarusalem"  wrote:

Hi All,
I have been looking the codes for ConfiguredBolt and its derivatives. I
realized that updateConfig is actually not doing much?

Would you describe how you manage configuration changes might be needed
after bolts are initialized and running?


for convenience, I copied the code here

public void updateConfig(String path, byte[] data) throws IOException {
  if (data.length != 0) {
String name = path.substring(path.lastIndexOf("/") + 1);
if (path.startsWith(ConfigurationType.ENRICHMENT.getZookeeperRoot())) {
  getConfigurations().updateSensorEnrichmentConfig(name, data);
  reloadCallback(name, ConfigurationType.ENRICHMENT);
} else if (ConfigurationType.GLOBAL.getZookeeperRoot().equals(path)) {
  getConfigurations().updateGlobalConfig(data);
  reloadCallback(name, ConfigurationType.GLOBAL);
}
  }



Thanks,

Re: [GitHub] incubator-metron issue #497: METRON-804: Create a document to describe kerbe...

2017-03-30 Thread Matt Foley

Okay, try this:
https://github.com/mattf-horton/incubator-metron/blob/METRON-804/metron-deployment/vagrant/Kerberos-setup.md

I wasn’t able to build a PR to your branch, seems there’s a non-ff in the way
the previous patch was merged. Anyway, if you just grab that file and diff
against yours, you’ll see the change is small.

Items 7, 8, and 20 needed to be fixed. The problem is that “a-b-c” paragraphs
aren’t actually list-items, as MD only knows roman numerals for list numbering.
Since they are paragraphs, the codeblocks and images under them should be at
the SAME indent level, and separated by explicit blank line.

This works in both Github-MD and doxia-markdown. It looks slightly better in
doxia because in Github the “a-b-c” paragraphs are exdented a little. If you
hate it we can try a couple other things, but I thought this was close enough.

Cheers,
--Matt

On 3/30/17, 2:40 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:

That’s weird. Mine looks fine:
https://github.com/mattf-horton/incubator-metron/blob/METRON-804-notes/METRON-804-mf.tiff

But the tooling was exactly that of
https://github.com/mattf-horton/incubator-metron/tree/METRON-804/site-book/bin

What additional changes did you make?

Oh, I just looked in github, and it’s broken there! How ironic.
On your side, is it broken in Github or in the site-book?
--Matt

On 3/30/17, 11:15 AM, "mmiklavc" <g...@git.apache.org> wrote:

Github user mmiklavc commented on the issue:

https://github.com/apache/incubator-metron/pull/497

@mattf-horton Thanks again for the patch! I made a couple more
minor tweaks to get the images and indentation correct for the nested lists.
I'm unable to get a nested list code block to format correctly, however. It's
not bad, but it's just not quite right. If anyone has any suggestions, please
chime in.

![image](https://cloud.githubusercontent.com/assets/658443/24519468/7f928ea6-1542-11e7-80c6-0070a1810f5e.png)

---
If your project is set up for it, you can reply to this email and have
your
reply appear on GitHub as well. If your project does not have this
feature
enabled and wishes so, or if the feature is enabled but not working,
please
contact infrastructure at infrastruct...@apache.org or file a JIRA
ticket
with INFRA.
---

Re: [GitHub] incubator-metron issue #497: METRON-804: Create a document to describe kerbe...

2017-03-30 Thread Matt Foley

That’s weird.  Mine looks fine: 
https://github.com/mattf-horton/incubator-metron/blob/METRON-804-notes/METRON-804-mf.tiff

But the tooling was exactly that of 
https://github.com/mattf-horton/incubator-metron/tree/METRON-804/site-book/bin

What additional changes did you make?

Oh, I just looked in github, and it’s broken there!  How ironic.
On your side, is it broken in Github or in the site-book? 
--Matt

On 3/30/17, 11:15 AM, "mmiklavc"  wrote:

Github user mmiklavc commented on the issue:

https://github.com/apache/incubator-metron/pull/497
  
@mattf-horton Thanks again for the patch! I made a couple more minor 
tweaks to get the images and indentation correct for the nested lists. I'm 
unable to get a nested list code block to format correctly, however. It's not 
bad, but it's just not quite right. If anyone has any suggestions, please chime 
in.

![image](https://cloud.githubusercontent.com/assets/658443/24519468/7f928ea6-1542-11e7-80c6-0070a1810f5e.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS][PROPOSAL] Maven Plugin to build packages with dependencies

2017-03-28 Thread Matt Foley

Makes sense.

 

From: Otto Fowler <ottobackwa...@gmail.com>
Date: Tuesday, March 28, 2017 at 10:13 AM
To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>, Matt 
Foley <ma...@apache.org>
Subject: Re: [DISCUSS][PROPOSAL] Maven Plugin to build packages with 
dependencies

 

I am going to proceed on with 2.

It is prudent to stand review and criticism before involving infrastructure.

 

 

 

On March 28, 2017 at 13:00:46, Matt Foley (ma...@apache.org) wrote:

I support option 1 if Infra will do it. Otherwise 2. But it might be easier to 
achieve the Infra request after we have exited the incubator. 
--Matt 

On 3/27/17, 6:25 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote: 

I don't have a strong opinion here, but I'm interested to see what the 
direction on this one will be.  

Jon 

On Wed, Mar 22, 2017 at 3:30 PM Otto Fowler <ottobackwa...@gmail.com> wrote: 

> As we have discussion previously, I am working on a plugin architecture for 
> parsers and stellar ( down the road ), base on Nifi’s Nar format. 
> 
> Part of using the Nar system is building the Nar itself. This is done 
> using the nifi-nar-maven-plugin. Some historical background - this plugin 
> used to live in the nifi 
> repo/tree, and they split it out to it’s own repo and published it to 
> apache mvn. 
> 
> For Metron’s use of the nar we want the following: 
> 
> 1. To be able to change the archive’s extension from .nar to something else 
> specific for us 
> 2. To be able to rename the manifest information generated so that it 
> doesn’t reference nar 
> 3. Possibly to be able to add more manifest entries and custom metron 
> specific metadata 
> 
> My first preference, was to use the nifi plugin as is, but with just enough 
> modification to make 
> points 1 and 2 possible. 
> 
> To that end I opened a jira and submitted a PR to the nifi project that did 
> just that, added the ability to configure the type produced by the plugin 
> and set the metadata prefix name. 
> 
> The chair of nifi, has commented that issue suggesting that we fork or copy 
> the plugin, since he has rightly noticed that there is really no benefit 
> for 
> them in accepting these changes and capabilities. 
> 
> I wanted to have a discussion around what I see as the options we have at 
> this point.. 
> 
> 1. As Nifi did, we request a new git repo to host our version of the 
> plugin, and do a release/publish of the plugin to apache mvn. This would 
> include ‘rebranding’ the plugin 
> from nifi to something else. 
> 2. We start as they did, with the plugin with my changes and as a separate 
> build step ( copy ) 
> 3. We fork and use git submodules to track that project 
> 
> In cases 2 or 3 we will have to decide to rebrand or just use a custom 
> version scheme ( -METRON ) with the plugin. 
> 
> To me, option 1 is the best option, although it will involve me getting 
> help from others to accomplish ( INFRA request, release manager etc etc ). 
> But, it is the cleanest, best option I think for a production case. 
> Thoughts? 
> 
> 
> 
> 
> 
> 
> 
> for reference: 
> https://issues.apache.org/jira/browse/NIFI-3628 
> https://github.com/apache/nifi-maven/pull/2 
> 
> http://mail-archives.apache.org/mod_mbox/nifi-dev/201508.mbox/%3CCALJK9a7xJW%2BG-dJSXAZ640xQdy4tpRZ%3DtEuHDgq8A%3DMY%2BOkf1g%40mail.gmail.com%3E
>  
> https://issues.apache.org/jira/browse/INFRA-10119 
> 
-- 

Jon

Re: [DISCUSS][PROPOSAL] Maven Plugin to build packages with dependencies

2017-03-28 Thread Matt Foley

I support option 1 if Infra will do it.  Otherwise 2.  But it might be easier 
to achieve the Infra request after we have exited the incubator.
--Matt

On 3/27/17, 6:25 PM, "zeo...@gmail.com"  wrote:

I don't have a strong opinion here, but I'm interested to see what the
direction on this one will be. 

Jon

On Wed, Mar 22, 2017 at 3:30 PM Otto Fowler  wrote:

> As we have discussion previously, I am working on a plugin architecture 
for
> parsers and stellar ( down the road ), base on Nifi’s Nar format.
>
> Part of using the Nar system is building the Nar itself.  This is done
> using the nifi-nar-maven-plugin.  Some historical background - this plugin
> used to live in the nifi
> repo/tree, and they split it out to it’s own repo and published it to
> apache mvn.
>
> For Metron’s use of the nar we want the following:
>
> 1. To be able to change the archive’s extension from .nar to something 
else
> specific for us
> 2. To be able to rename the manifest information generated so that it
> doesn’t reference nar
> 3.  Possibly to be able to add more manifest entries and custom metron
> specific metadata
>
> My first preference, was to use the nifi plugin as is, but with just 
enough
> modification to make
> points 1 and 2 possible.
>
> To that end I opened a jira and submitted a PR to the nifi project that 
did
> just that, added the ability to configure the type produced by the plugin
> and set the metadata prefix name.
>
> The chair of nifi, has commented that issue suggesting that we fork or 
copy
> the plugin, since he has rightly noticed that there is really no benefit
> for
> them in accepting these changes and capabilities.
>
> I wanted to have a discussion around what I see as the options we have at
> this point..
>
> 1. As Nifi did, we request a new git repo to host our version of the
> plugin, and do a release/publish of the plugin to apache mvn.  This would
> include ‘rebranding’ the plugin
> from nifi to something else.
> 2. We start as they did, with the plugin with my changes and as a separate
> build step ( copy )
> 3. We fork and use git submodules to track that project
>
> In cases 2 or 3 we will have to decide to rebrand or just use a custom
> version scheme ( -METRON ) with the plugin.
>
> To me, option 1 is the best option, although it will involve me getting
> help from others to accomplish ( INFRA request, release manager etc etc ).
> But, it is the cleanest, best option I think for a production case.
> Thoughts?
>
>
>
>
>
>
>
> for reference:
> https://issues.apache.org/jira/browse/NIFI-3628
> https://github.com/apache/nifi-maven/pull/2
>
> 
http://mail-archives.apache.org/mod_mbox/nifi-dev/201508.mbox/%3CCALJK9a7xJW%2BG-dJSXAZ640xQdy4tpRZ%3DtEuHDgq8A%3DMY%2BOkf1g%40mail.gmail.com%3E
> https://issues.apache.org/jira/browse/INFRA-10119
>
-- 

Jon

Re: Metron Installation on an Ambari-Managed Cluster?

2017-03-24 Thread Matt Foley

https://cwiki.apache.org/confluence/display/METRON/Metron+with+HDP+2.5+bare-metal+install
is much more recent.  It should still work, but I’m not sure how RPM generation 
has changed, especially after David’s changes in METRON-671 (Refactor Ansible 
deployment to use Ambari).  Of course the two scenarios are different (your 
link is about installing Metron AFTER fully installing HDP, while this link is 
about installing it all together), but they use essentially the same steps.

The above linked article does have a lot of cruft based on working around 
previously-existing bugs that have now been fixed, so if something looks like 
it’s already been done, or doesn’t make sense, it may be okay to ignore.

The one thing both articles do that just isn’t necessary, is make you 
install/run Docker and do the RPM build on the cluster.  I routinely do the 
Docker-based RPM build on my Mac, and move only the RPMs to the install 
directory used by the Metron MPack for Ambari (/localrepo/) on the Metron nodes.

So if the Docker RPM build is the main impediment, try doing that.
--Matt


On 3/24/17, 2:03 PM, "Otto Fowler"  wrote:

I have used

https://cwiki.apache.org/confluence/display/METRON/Metron+Installation+on+an+Ambari-Managed+Cluster
as
a guide in the past, but it is out of date,
I don’t think it can possibly work with docker rpm build now.

Does anyone have any ideas what it would take to get this workflow working
again?

possibly useful info re kafka and kafka connector versions, with Storm

2017-03-23 Thread Matt Foley

Repost from user@storm:

From: Harsha Chintalapani 
Reply-To: "u...@storm.apache.org" 
Date: Thursday, March 23, 2017 at 8:51 AM
To: "u...@storm.apache.org" 
Subject: Re: Valid Kafka version for Storm 1.0.2

Hi Anis,

We've two kafka connectors now. One uses old consumer API (Simple 
Consumer) https://github.com/apache/storm/tree/master/external/storm-kafka

and there is new storm-kafka-client which uses new consumer API 
https://github.com/apache/storm/tree/master/external/storm-kafka-client

The way we build connectors are not necessarily tied to a specific version , as 
long as the client APIs are backward compatible you are free to use any version.

In case of 1, You can use kafka version from 0.8.x to 0.10.x

In case of 2, We recommend you use 0.10.0.1 onwards as thats when the kafka 
client APIs are stabilized and had critical bug-fixes.

Thanks,

Harsha

On Tue, Mar 21, 2017 at 6:45 PM Anis Nasir  wrote:

Dear all, 

Can anyone suggest me a working version of Kafka for Storm 1.0.2.

Thanking you in advance.

Regards,

Anis

Re: [DISCUSS] Stepping down as release manager

2017-03-22 Thread Matt Foley

As Billie says, our bylaws don’t require a vote to assign a release manager.
However, an RM certainly needs the support of the community to be effective,
so all these +1’s are very much appreciated! :-)
Thanks,
--Matt

On 3/22/17, 8:30 AM, "Billie Rinaldi" <bil...@apache.org> wrote:

Not everything needs to be voted upon. The ASF guidelines and Metron's
bylaws specify which actions require a vote. Other types of things, like
choosing a release manager, can be approved by lazy consensus. I just
noticed that Metron's bylaws have slightly different definitions for the
approval types than the ASF typically uses. I'd recommend changing these to
the standard ASF definitions. Specifically, Lazy Consensus does not require
a vote and thus no +1s are needed (https://www.apache.org/
foundation/glossary.html#LazyConsensus) and Lazy Majority and Lazy 2/3
Majority should be called Majority (https://www.apache.org/
foundation/glossary.html#MajorityApproval) and 2/3 Majority.

On Wed, Mar 22, 2017 at 7:23 AM, Justin Leet <justinjl...@gmail.com> wrote:

> Right now it's just support, not a vote.  I assume, based on our past
> practices, that there will be a separate [VOTE] thread.
>
> Justin
>
> On Wed, Mar 22, 2017 at 10:06 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > +1 but is this explicitly an official vote?
> >
> >
> > On March 21, 2017 at 13:51:16, Justin Leet (justinjl...@gmail.com)
> wrote:
> >
> > +1 for Matt
> >
> > On Tue, Mar 21, 2017 at 12:21 PM, zeo...@gmail.com <zeo...@gmail.com>
> > wrote:
> >
> > > +1 for mattf
> > >
> > > On Tue, Mar 21, 2017 at 11:04 AM Ryan Merriman <merrim...@gmail.com>
> > > wrote:
> > >
> > > > +1 for Matt
> > > >
> > > > On Tue, Mar 21, 2017 at 9:44 AM, Matt Foley <ma...@apache.org>
> wrote:
> > > >
> > > > > Casey, you’ve been a great release manager. I know how much detail
> > > > effort
> > > > > goes into this role.
> > > > >
> > > > > I am willing to serve as RM for the next while, if the community
> > would
> > > > > like. I was the RM for Hadoop for about a year, and in fact was RM
> > for
> > > > its
> > > > > 1.0 release. Granted that was a while ago, but overall process
> > doesn’t
> > > > seem
> > > > > to have changed much :-)
> > > > >
> > > > > Cheers,
> > > > > --Matt
> > > > >
> > > > > On 3/21/17, 7:32 AM, "Casey Stella" <ceste...@gmail.com> wrote:
> > > > >
> > > > > Right, Billie is exactly right. Working with the community to
> > > > > constructing
> > > > > releases that conform to apache standards and policies is the main
> > > > > duty.
> > > > > This will (hopefully) be our first set of releases outside of the
> > > > > incubator, so if I'm allowed to be biased, I'm hoping that someone
> > > > with
> > > > > previous release management experience in other projects will
> > > > > volunteer.
> > > > > We're leaving the nest a bit and having an experienced hand at the
> > > > > tiller
> > > > > would be advantageous.
> > > > >
> > > > >
> > > > > On Tue, Mar 21, 2017 at 10:21 AM, Billie Rinaldi <
> > > bil...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > See http://www.apache.org/dev/release-publishing#release_manager
> > > > and
> > > > > > http://www.apache.org/legal/release-policy.html for information
> > > on
> > > > > the
> > > > > > tasks that a release manager performs.
> > > > > >
> > > > > > On Tue, Mar 21, 2017 at 7:10 AM, Khurram Ahmed <
> > > > > khurramah...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Casey it would be helpful if you could outline the
> > > > > responsibilities of a
> > > > > > > release manager for the Metron project.
> > > > > > >
> > > > > > > On Mar 21, 2017 6:57 PM, "Casey Stella" <ceste...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > I've been extremely honored to spend the last few months as
> > > the
> > > > > Metron
> > > > > > > > Release Manager. That being said, my watch is ended and it's
> > > > > time for
> > > > > > > > another release manager to step into my place.
> > > > > > > >
> > > > > > > > Who would like to volunteer to be release manager for the
> > > next
> > > > > release
> > > > > > of
> > > > > > > > Metron?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Casey
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > --
> > >
> > > Jon
> > >
> >
> >
>

Re: [VOTE] Final Board Resolution Draft V2

2017-03-22 Thread Matt Foley

+1 (non-binding)

On 3/22/17, 8:30 AM, "Casey Stella"  wrote:

+1 binding

On Wed, Mar 22, 2017 at 11:23 AM, David Lyle  wrote:

> +1 binding
>
> On Wed, Mar 22, 2017 at 10:49 AM, Kyle Richardson <
> kylerichards...@gmail.com
> > wrote:
>
> > +1 (binding)
> >
> > On Mon, Mar 20, 2017 at 3:05 AM, James Sirota 
> wrote:
> >
> > >
> > > - Removed affiliations
> > > - Added apache IDs where possible
> > > - Removed committers and only left PPMC members
> > >
> > > Hope this version holds up.  Please vote +1, -1, or 0 for neutral.  
The
> > > vote will be open for 72 hours
> > >
> > >
> > > The incubating Apache Metron community believes it is time to graduate
> to
> > > TLP.
> > >
> > > Apache Metron entered incubation in December of 2015. Since then, 
we've
> > > overcome technical challenges to remove Category X dependencies, and
> > made 3
> > > releases. Our most recent release contains binary convenience
> artifacts.
> > We
> > > are a very helpful and engaged community, ready to answer all 
questions
> > and
> > > feedback directed to us via the user list. Through our time in
> incubation
> > > we've added a number of committers and promoted some of them to PPMC
> > > membership. We are actively pursuing others. While we do still have
> > issues
> > > to address raised by means of the maturity model, all projects are
> > ongoing
> > > processes, and we believe we no longer need the incubator to continue
> > > addressing these issues.
> > >
> > > To inform the discussion, here is some basic project information:
> > >
> > > Project status:
> > >   http://incubator.apache.org/projects/metron.html
> > >
> > > Project website:
> > >   https://metron.incubator.apache.org/
> > >
> > > Project documentation:
> > >https://cwiki.apache.org/confluence/display/METRON/Documentation
> > >
> > > Maturity assessment:
> > >https://cwiki.apache.org/confluence/display/METRON/
> > > Apache+Project+Maturity+Model
> > >
> > > DRAFT of the board resolution is at the bottom of this email
> > >
> > > Proposed PMC size: 25 members
> > >
> > > Total number of committers: 6 members
> > >
> > >
> > > 516 commits on develop
> > > 34 contributors across all branches
> > >
> > > dev list averaged ~650 msgs/month for the last 3 months
> > >
> > >
> > > Resolution:
> > >
> > > Establish the Apache Metron Project
> > >
> > > WHEREAS, the Board of Directors deems it to be in the best
> > > interests of the Foundation and consistent with the
> > > Foundation's purpose to establish a Project Management
> > > Committee charged with the creation and maintenance of
> > > open-source software, for distribution at no charge to the
> > > public, related to a security analytics platform for big data use
> cases.
> > >
> > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> > > Committee (PMC), to be known as the "Apache Metron Project",
> > > be and hereby is established pursuant to Bylaws of the
> > > Foundation; and be it further
> > >
> > > RESOLVED, that the Apache Metron Project be and hereby is
> > > responsible for the creation and maintenance of software
> > > related to:
> > > (a) A mechanism to capture, store, and normalize any type of security
> > > telemetry at extremely high rates.
> > > (b) Real time processing and application of enrichments
> > > (c) Efficient information storage
> > > (d) An interface that gives a security investigator a centralized view
> of
> > > data and alerts passed through the system.
> > >
> > > RESOLVED, that the office of "Vice President, Apache Metron" be
> > > and hereby is created, the person holding such office to
> > > serve at the direction of the Board of Directors as the chair
> > > of the Apache Metron Project, and to have primary responsibility
> > > for management of the projects within the scope of
> > > responsibility of the Apache Metron Project; and be it further
> > >
> > > RESOLVED, that the persons listed immediately below be and
> > > hereby are appointed to serve as the initial members of the
> > > Apache Metron Project:
> > >
> > >
> > > PPMC:
> > > Mark Bittmann (mbittmann)
> > > Sheetal Dolas (sheetal_dolas)
> > > Debo Dutta (ddutta)
> > > Discovery Gerdes (discovery)
> > > Andrew Hartnett (dev_warlord)
> > > Dave Hirko (dbhirko)
> > > Paul Kehrer (reaperhulk)
> > > Brad Kolarov (bjkolly)
> > > Kiran Komaravolu (UKNOWN)
> > > Larry McCay (lmccay)
> > > P. Taylor Goetz (ptgoetz)
> > > Ryan Merriman

Re: [DISCUSS] Stepping down as release manager

2017-03-21 Thread Matt Foley

Casey, you’ve been a great release manager.  I know how much detail effort goes 
into this role.

I am willing to serve as RM for the next while, if the community would like.  I 
was the RM for Hadoop for about a year, and in fact was RM for its 1.0 release. 
Granted that was a while ago, but overall process doesn’t seem to have changed 
much :-)

Cheers,
--Matt

On 3/21/17, 7:32 AM, "Casey Stella"  wrote:

Right, Billie is exactly right.  Working with the community to constructing
releases that conform to apache standards and policies is the main duty.
This will (hopefully) be our first set of releases outside of the
incubator, so if I'm allowed to be biased, I'm hoping that someone with
previous release management experience in other projects will volunteer.
We're leaving the nest a bit and having an experienced hand at the tiller
would be advantageous.

On Tue, Mar 21, 2017 at 10:21 AM, Billie Rinaldi  wrote:

> See http://www.apache.org/dev/release-publishing#release_manager and
> http://www.apache.org/legal/release-policy.html for information on the
> tasks that a release manager performs.
>
> On Tue, Mar 21, 2017 at 7:10 AM, Khurram Ahmed 
> wrote:
>
> > Casey it would be helpful if you could outline the responsibilities of a
> > release manager for the Metron project.
> >
> > On Mar 21, 2017 6:57 PM, "Casey Stella"  wrote:
> >
> > > I've been extremely honored to spend the last few months as the Metron
> > > Release Manager.  That being said, my watch is ended and it's time for
> > > another release manager to step into my place.
> > >
> > > Who would like to volunteer to be release manager for the next release
> of
> > > Metron?
> > >
> > > Best,
> > >
> > > Casey
> > >
> >
>

Re: METRON-764 Daylight Savings Time bug in metron-profiler-client Unit Tests

2017-03-14 Thread Matt Foley

Fixed; see https://github.com/apache/incubator-metron/pull/476 
Thanks,
--Matt

On 3/14/17, 1:55 PM, "Matt Foley" <ma...@apache.org> wrote:

Found a fun little bug that will most likely cause any travis build 
happening in the hour after midnight to fail.
https://issues.apache.org/jira/browse/METRON-764

It’s daylight savings time based, and you can stimulate it by setting your 
system clock to 12:30AM on Tuesday 3/14/2017, at least on a Mac.  Haven’t tried 
other settings, but it’s clearly reproducible.

Will take a look to see if the cause is obvious.
--Matt

METRON-764 Daylight Savings Time bug in metron-profiler-client Unit Tests

2017-03-14 Thread Matt Foley

Found a fun little bug that will most likely cause any travis build happening 
in the hour after midnight to fail.
https://issues.apache.org/jira/browse/METRON-764

It’s daylight savings time based, and you can stimulate it by setting your 
system clock to 12:30AM on Tuesday 3/14/2017, at least on a Mac.  Haven’t tried 
other settings, but it’s clearly reproducible.

Will take a look to see if the cause is obvious.
--Matt

Re: [VOTE] Cesey Stella for Metron VP

2017-03-13 Thread Matt Foley

+1 (non-binding)

On 3/13/17, 4:57 PM, "Debojyoti Dutta"  wrote:

+1

Sent from my iPhone

> On Mar 13, 2017, at 4:09 PM, Ryan Merriman  wrote:
> 
> +1 (binding)
> 
>> On Mar 13, 2017, at 6:04 PM, Justin Leet  wrote:
>> 
>> +1 (non-binding)
>> 
>>> On Mon, Mar 13, 2017 at 6:35 PM, zeo...@gmail.com  
wrote:
>>> 
>>> +1 (non-binding)
>>> 
 On Mon, Mar 13, 2017 at 6:34 PM James Sirota  
wrote:
 
 +1 (binding)
 
 13.03.2017, 15:34, "James Sirota" :
> This vote is to make Casey Stella our VP after graduation
> 
> ---
> Thank you,
> 
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
 
 ---
 Thank you,
 
 James Sirota
 PPMC- Apache Metron (Incubating)
 jsirota AT apache DOT org
>>> --
>>> 
>>> Jon
>>>

Re: [VOTE] Metron to graduate to TLP

2017-03-13 Thread Matt Foley

+1 (non-binding)

On 3/13/17, 4:27 PM, "zeo...@gmail.com"  wrote:

0 (non-binding)

On Mon, Mar 13, 2017 at 7:15 PM Ryan Merriman  wrote:

> +1 (binding)
>
> > On Mar 13, 2017, at 6:05 PM, Casey Stella  wrote:
> >
> > +1 (binding)
> >
> >> On Mon, Mar 13, 2017 at 6:37 PM, James Sirota 
> wrote:
> >>
> >> +1 (binding)
> >>
> >> 13.03.2017, 15:37, "James Sirota" :
> >>> Do we feel it's time for us to exit the Apache incubator and petition
> to
> >> make Metron a TLP?
> >>>
> >>> Please vote 1 for yes, -1 for no, 0 for neutral.
> >>>
> >>> The vote will be open for 72 hours
> >>>
> >>> ---
> >>> Thank you,
> >>>
> >>> James Sirota
> >>> PPMC- Apache Metron (Incubating)
> >>> jsirota AT apache DOT org
> >>
> >> ---
> >> Thank you,
> >>
> >> James Sirota
> >> PPMC- Apache Metron (Incubating)
> >> jsirota AT apache DOT org
> >>
>
-- 

Jon

Re: [Discuss] SIDELOADING PARSERS: Parsers as components

2017-03-10 Thread Matt Foley

It sounds like:
- This is a self-contained chunk of work, that can be tested, reviewed, and 
committed on its own, then the other ideas you propose can follow it.
- It crosses a lot of lines, and restructures a lot of code, so will “rot” 
fairly quickly as other people make commits, so if possible you should get a PR 
out there and we should work through it as soon as possible.
Are those both true?

How do other people feel about grouping a given sensor’s parser, enricher, 
indexing logic all together?  It seems to have multiple advantages are there 
also disadvantages?

On 3/10/17, 6:31 AM, "Otto Fowler"  wrote:

As previously discussed here, I have been working on side loading of
parsers.  The goals of this work are:
* Make it possible of developers to create, maintain and deploy parsers
outside of the Metron code tree and not have to fork
* Create maven archetype support for developers of parsers
* Introduce a parser ‘lifecycle’ to support multiple instances and
configurations, states of being installed, under configuration, and deployed
etc.

I would like to have some discussion based on where I am after rebasing
onto METRON-671 which revamps deployment to be totally ambari based.


Parsers as components:

I have all the parsers broken out into individual packages/rpms/jars.
What I have done is taken metron-parsers and broken it out to:

* metron-parsers-common
* This has all the base classes and interfaces, common testing components
etc
* metron-parser-base
* This has the Grok, CSV, and JsonMap parsers and support
* metron-parser-X
* A module per parser type which we currently have in the system
* Each parser has all the indexing, enrichment and parser configurations
for that parser in its package

I will go into packaging and deployment issues in another email.

I have this all working:
* the parsers are built
* the parsers are tested
* the parsers are integrated into the deployment build such that vagrant up
just works as previously in full and quick dev
* maven component of rpm docker
  * the metron.spec file
* ambari installation
* zookeeper configuration deployment
* the ambari parser service code
* the Rest interface works
* see all installed parser configurations etc


So this part of the work, is I think ready for a PR and review/next steps
on it’s own.

I think that it sets up the components and is a base for building out the
rest of the functionality we want.

Re: [DISCUSS] SIDELOADING PARSERS: Packaging and Loading and Extensions [oh.my]

2017-03-10 Thread Matt Foley

I like the approach.  I think Nar constitutes a production-quality existing 
solution meeting highly similar needs to Metron’s.

Just a ‘btw’ regarding Joe’s input that I transmitted:
- Joe made clear that he was only giving his personal opinion, since of course 
no individual can speak for the community.
- Joe also felt that if Metron succeeded in re-using the Nar system without 
having to change it too much, that that would be a good supporting argument for 
later proposing that it become a separate child project.
- Whereas if we or they tried to break it out as a separate project now, we 
would have to do all the community-building work around it, as well as the 
technical work of adapting it for a different environment from NiFi.
- So he recommended to copy and appropriate it for now.
- Which I also agree with.

Thanks,
--Matt

On 3/10/17, 7:42 AM, "Otto Fowler"  wrote:

As previously discussed here, I have been working on side loading of
parsers.  The goals of this work are:
* Make it possible of developers to create, maintain and deploy parsers
outside of the Metron code tree and not have to fork
* Create maven archetype support for developers of parsers
* Introduce a parser ‘lifecycle’ to support multiple instances and
configurations, states of being installed, under configuration, and deployed
etc.

I would like to have some discussion based on where I am after rebasing
onto METRON-671 which revamps deployment to be totally ambari based.


Packaging and Loading and Extensions

I have mentioned previously, and we have discussed on list wanting to move
away from uber’s for somethings to using custom class loaders ( from hdfs
possibly ).
We also want the REST api to work with 3rd parter parsers
We would like to reduce the size of having so many ubers in the build
We would benefit from tooling around this, maven building, archetypes etc
We could benefit from explicitly required metadata and information
We want a generic extension methodology
We want to be able to upgrade parsers/extensions in some way


I have also mentioned that this would look or work a lot like NiFi’s NAR
system.

Now I’m going to put it differently:

I propose that we adapt and introduce the NAR system for Metron Extensions,
starting with parsers, with that adoption
extended to allow for VFS Classloading from hdfs as we are now doing with
Stellar. And that this is done as a follow on to
the base mvp side loading work.

This provides a solution to the above issues, and would afford us a great
amount of flexibility going forward.

https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#nars

https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions

The functional concept would be:
 * the archetype and all the parser projects produce nars ( either
including the configuration and patterns or splitting between runtime ( nar
) and static ( tar.gz )
* these are not shaded, but have a ‘repo’ of dependencies for non-metron
jars.  Metron jars are provided and loaded through the classloading.
* possibly the adaption of the Service Provider api/pattern for parsers and
discovery
*  the nar repository/working directory structures would be implemented in
/usr/metron/version/telemetry  ( although discussion on having multiple
extension directories vs. one extension dir are welcome )
* the storm process only references metron-parsers-common
* the parser bolt uses the nar class loading system to load the parser
* the rest api uses the nar class loading system to load the parser
* etc etc
* a new version of a parser is deployed as a nar, when the service
restarts, the new nar is unpacked and replaces the old version in the
working system ( we could change the restart requirement ……)

The nar system gives us something that is:
* production quality
* small enough to grok and extend as opposed to some other solutions
* comes with a highly accessible sister project
* maven plugin tooling to build
* reference archetype for packaging
and other things

There are a few ways we could approach using Nar:

1. ‘fork’ and appropriate the components and ‘metronize’ them
* the maven plugin
* the nar-utils package
2. Ask for and participate in an effort to pull NAR out into it’s own
project,
* make it more generic
* usable by more than one project
* goal to replace NiFi’s use of nar too
3. Create our own generic version as a fork
* use it in metron
* submit to NiFi as a proposal

MattF was nice enough to float this by Joe Witt, who is at Hortonworks and
is the NiFi lead.
He agrees with the idea of making nar usable for multiple projects, but
does not see them

Re: [GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-08 Thread Matt Foley

The tickets, btw, are METRON-759 and METRON-718.


On 3/8/17, 12:28 PM, "mattf-horton"  wrote:

Github user mattf-horton commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
@justinleet , +100 that you've added site-book and javadocs to the 
automated build.
Having site-book in the base build is fine, it only takes seconds to 
build, and it will keep new README.md files clean.  Javadoc takes a little 
longer, I think, but is also important to keep clean.  Is there something like 
"-DskipSite" that prevents all three (site, site-book, and javadocs) from being 
built, if a developer wants to skip them?

Regarding the integration, what most projects do is, as part of their 
Site's documentation area, there is one or more pull-down menus, or a landing 
page, where you can choose the VERSION you want of:
* Documentation (site-book, in our case)
* Release Notes
* Javadocs

This implies that the Site needs a *cumulative* store of past release 
Doc builds, and that part of the Release Manager's job is adding each new 
release to that store.  Since it keeps getting larger and larger, storing it in 
github master (where Site lives) may not be the best thing.  Rather, it could 
go under https://dist.apache.org/repos/dist/release/incubator/metron/ (release 
-> dev, during votes).  The cost of this is a manual step instead of automated, 
for the Release Manager.

The Site menu would then link into the many doc sets in the store.  
Given the regular naming of the paths, we could actually have each release 
contain its own doc set (only) at a standard place, which is consistent with it 
being part of the build, and then the Site's menu would have a list of links 
that differ only by a version number.  The RM would make the one-line edit to 
add each new doc set as it is released.

What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-metron issue #471: METRON-755 Update GitHub PR Template

2017-03-06 Thread Matt Foley

I feel the introductory text for the PR should be at the top.  That’s the 
interesting part.  The template checklist is to make sure the necessary got 
done; it’s important, but should come second. IMHO.

On 3/6/17, 8:01 AM, "David Lyle"  wrote:

I have a weak preference for top comments.

On Mon, Mar 6, 2017 at 10:22 AM, JonZeolla  wrote:

> Github user JonZeolla commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/471
>
> It seems that some people prefer to add their comments to the top
> instead of the bottom.  I have no preference.  Would anybody recommend we
> swap it?
>
>
> ---
> If your project is set up for it, you can reply to this email and have 
your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, 
please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-03 Thread Matt Foley

Thanks, Casey, good answers to all of the below.  I suggest the term “e2e” (end 
to end) tests for what you are proposing.

On 3/3/17, 11:57 AM, "Casey Stella" <ceste...@gmail.com> wrote:

It is absolutely not a naive question, Matt.  We don't have a lot (or any)
docs about our integration tests; it's more of a "follow the lead" type of
thing at the moment, but that should be rectified.

The integration tests spin up and down infrastructure in-process, some of
which are real and some of which are mock versions of the services.  These
are good for catching some types of bugs, but often things sneak through,
like:

   - Hbase and storm can't exist in the same JVM, so HBase is mocked in
   those cases.
   - The FileSystem that we get for Hadoop is the LocalRawFileSystem, not
   truly HDFS.  There are differences and we've run into them..hilariously 
at
   times. ;)
   - Things done statically in a bolt are shared across all bolts because
   they all are threads in the same process

It's good, it catches bugs, it lets us debug things easily, it runs with
every single build automatically via travis.
It's bad because it's awkward to get the dependencies isolated sufficiently
for all of these components to get them to play nice in the same JVM.

Acceptance tests would be run against a real cluster, so they would:

   - run against real components, not testing or mock components
   - run against multiple nodes

I can imagine a world where we can unify the two to a certain degree in
many cases if we could spin up a docker version of Metron to run as part of
the build, but I think in the meantime, we should focus on providing both.

I suspect the reference application is possibly inspiring my suggestions
here, but I think the main difference here is that the reference
application is intended to be informational from a end-user perspective:
it's detailing a use-case that users will understand.  I don't think
acceptance tests should loosely associate with real uses, but they should
be free to delve into weird non-happy-pathways.

On Fri, Mar 3, 2017 at 2:16 PM, Matt Foley <ma...@apache.org> wrote:

> Automating stuff that now has to be done manually gets a big +1.
>
> But, Casey, could you please clarify the relationship between what you
> plan to do and the current “integration test” framework?  Will this be in
> the form of additional integration tests? Or a different test framework?
> Can it be done in the integration test framework, rather than creating new
> mechanism?
>
> BTW, if that’s a naïve question, forgive me, but I could find zero
> documentation for the existing integration test capability, neither wiki
> pages nor READMEs nor Jiras.  If there are any docs, please point me at
> them.  Or even archived email threads.
>
> There is also something called the “Reference Application”
> https://cwiki.apache.org/confluence/display/METRON/
> Metron+Reference+Application which sounds remarkably like what you
> propose to automate.  Is there / can there / should there be a 
relationship?
>
> Thanks,
> --Matt
>
> On 3/3/17, 7:40 AM, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
>
> +1
>
> I agree with Justin’s points.
>
>
> On March 3, 2017 at 08:41:37, Justin Leet (justinjl...@gmail.com)
> wrote:
>
> +1 to both. Having this would especially ease a lot of testing that
> hits
> multiple areas (which there is a fair amount of, given that we're
> building
> pretty quickly).
>
> I do want to point out that adding this type of thing makes the speed
> of
> our builds and tests more important, because they already take up a
> good
> amount of time. There are obviously tickets to optimize these things,
> but
> I would like to make sure we don't pile too much on to every testing
> cycle
> before a PR. Having said that, I think the testing proposed is
> absolutely
> valuable enough to go forward with.
>
> Justin
>
> On Fri, Mar 3, 2017 at 8:33 AM, Casey Stella <ceste...@gmail.com>
> wrote:
>
> > I also propose, once this is done, that we modify the developer
> bylaws
> and
> > the github PR script to ensure that PR authors:
> >
> > - Update the acceptance tests where appropriate
> > - Run the tests as a smoketest
> >
> >
> >
> > On Fri, Mar 3, 20

Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-03 Thread Matt Foley

Automating stuff that now has to be done manually gets a big +1.

But, Casey, could you please clarify the relationship between what you plan to 
do and the current “integration test” framework?  Will this be in the form of 
additional integration tests? Or a different test framework?  Can it be done in 
the integration test framework, rather than creating new mechanism?

BTW, if that’s a naïve question, forgive me, but I could find zero 
documentation for the existing integration test capability, neither wiki pages 
nor READMEs nor Jiras.  If there are any docs, please point me at them.  Or 
even archived email threads.

There is also something called the “Reference Application” 
https://cwiki.apache.org/confluence/display/METRON/Metron+Reference+Application 
which sounds remarkably like what you propose to automate.  Is there / can 
there / should there be a relationship?

Thanks,
--Matt

On 3/3/17, 7:40 AM, "Otto Fowler"  wrote:

+1

I agree with Justin’s points.

On March 3, 2017 at 08:41:37, Justin Leet (justinjl...@gmail.com) wrote:

+1 to both. Having this would especially ease a lot of testing that hits
multiple areas (which there is a fair amount of, given that we're building
pretty quickly).

I do want to point out that adding this type of thing makes the speed of
our builds and tests more important, because they already take up a good
amount of time. There are obviously tickets to optimize these things, but
I would like to make sure we don't pile too much on to every testing cycle
before a PR. Having said that, I think the testing proposed is absolutely
valuable enough to go forward with.

Justin

On Fri, Mar 3, 2017 at 8:33 AM, Casey Stella  wrote:

> I also propose, once this is done, that we modify the developer bylaws
and
> the github PR script to ensure that PR authors:
>
> - Update the acceptance tests where appropriate
> - Run the tests as a smoketest
>
>
>
> On Fri, Mar 3, 2017 at 8:21 AM, Casey Stella  wrote:
>
> > Hi All,
> >
> > After doing METRON-744, where I had to walk through a manual test of
> every
> > place that Stellar touched, it occurred to me that we should script
this.
> > It also occurred to me that some scripts that are run by the PR author
to
> > ensure no regressions and, eventually maybe, even run on an INFRA
> instance
> > of Jenkins would give all of us some peace of mind.
> >
> > I am certain that this, along with a couple other manual tests from
other
> > PRs, could form the basis of a really great regression acceptance-test
> > suite and I'd like to propose that we do that, as a community.
> >
> > What I'd like to see from such a suite has the following
characteristics:
> >
> > - Can be run on any Metron cluster, including but not limited to
> > - Vagrant
> > - AWS
> > - An existing deployment
> > - Can be *deployed* from ansible, but must be able to be deployed
> > manually
> > - With instructions in the readme
> > - Tests should be idempotent and independent
> > - Tear down what you set up
> >
> > I think between the Stellar REPL and the fundamental scriptability of
the
> > Hadoop services, we can accomplish these tests with a combination of
> shell
> > scripts and python.
> >
> > I propose we break this into the following parts:
> >
> > - Acceptance Testing Framework with a small smoketest
> > - Baseline Metron Test
> > - Send squid data through the squid topology
> > - Add an threat triage alert
> > - Ensure it gets through to the other side with alerts preserved
> > - + Enrichment
> > - Add an enrichment in the enrichment pipeline to the above
> > - + Profiler
> > - Add a profile with a tick of 1 minute to count per destination
> > address
> > - Base PCap test
> > - Something like the manual test for METRON-743 (
> > https://github.com/apache/incubator-metron/pull/467#
> issue-210285324
> >  issue-210285324>
> > )
> >
> > Thoughts?
> >
> >
> > Best,
> >
> > Casey
> >
>

Re: [DISCUSS] System time vs. Event Time

2017-03-02 Thread Matt Foley

Before the thought becomes obsolete, I’d like to say that I agree with Nick 
about the replay scenario and threat signature databases.  I think a principal 
use case is replaying old data with new threat signatures, to detect problems 
that were undetectable at the time they happened.  The use case Casey brought 
up, where you want to reproduce the exact behavior of an earlier PiT of your 
system, including using the threat signature database versions that were 
installed at that time, would also be useful for debugging, system 
understanding, and testing, but I think it is lower priority than the former.

Another high priority use case is replaying data with new Profiler 
configurations, to answer questions that we hadn’t thought about asking before.

So, Justin, I think the minimum amount of work for a useful batch process, is 
to:
(a) Make sure event time rather than system time is usable, if not the default, 
in all components that record, manipulate, or select based on timestamps.
(b) Enable a chunk of data, defined by our shiny new time window DSL, to be 
output in chron order from sources that store whole messages (HDFS, PCAP, maybe 
Solr/ES, maybe raw data files with a time window filter), and routed into a 
kafka topic, with throttling so kafka doesn’t try to swallow several TB at once.
(c) Which can then be read by a Parser, and the result piped through the whole 
system, all the way to threat detection, profiling, and filtered re-recording.
(d) The result set (in HDFS, ES, or Profiler) needs to remain “tagged” somehow 
with a batch identifier, both so it doesn’t get mixed up with all the other 
data from that event time, and so it can be bulk-deleted if you made a mistake 
and asked for TB’s of the wrong data.

An interesting part of (c) is that we don’t really want the “batch” to 
interfere with on-going real-time processing.  Ideally the mechanism would also 
deal with data analysts submitting multiple batch requests at the same time 
(altho admittedly that could be handled with a queue).

Is it sufficient to simply depend on the event time stamp to route stuff 
appropriately?  That doesn’t seem to meet (d).  We could effectively 
“virtualize” the batch job by suffixing the kafka topic names for the whole 
data flow related to a batch.  Batch id “foley3256”, being a bunch of bro 
messages, could enter the Bro Parser on topic bro_foley3256.  To carry this 
through to enrichment, etc., maybe it is sufficient to record the sensorType as 
“bro_foley3256”, or maybe it should be sensorType “bro” on kafka topic 
“enrichment_foley3256”.  Such schemes could satisfy (d) above, also.  Obviously 
there’s a lot of possible variations on this theme.  What do you think?

--Matt

On 3/2/17, 12:54 PM, "Justin Leet"  wrote:

I'm just going to throw out a few of questions, that I don't have good
answers to.  Casey and Nick, given your familiarity with the systems
involved, do you have any thoughts?

   - What's the smallest unit of work we can do to enable at least a useful
   subset of a fully featured term batch process? Looking at it from another
   angle, which of the use cases (either that Nick listed, or that anyone 
else
   has) gives us the best value for our effort?
   - Can we also do things like limiting support for the interdependencies
   Casey mentioned? If we do approach it that way, how do we avoid setting
   ourselves up for issues parallelizing the more complicated cases?  It
   sounds like we'll need to brainstorm some of the dependency stuff anyway.
   - Are there places right now (like the elasticsearch jira) where we need
   or want to make changes to either fix, or improve, or enable some of the
   larger pictures work?

Jon, any other thoughts?  Sounded like you were waiting to see how things
played out a bit, so if you have any insight, I'd love to hear it.

Justin

On Tue, Feb 28, 2017 at 11:08 AM, Justin Leet  wrote:

> @Jon, it looks like it is based on system date.
>
> From ElasticsearchWriter.write:
> String indexPostfix = dateFormat.format(new Date());
> ...
> indexName = indexName + "_index_" + indexPostfix;
> ...
> IndexRequestBuilder indexRequestBuilder = client.prepareIndex(indexName,
> sensorType + "_doc");
>
> Justin
>
> On Tue, Feb 28, 2017 at 10:44 AM, zeo...@gmail.com 
> wrote:
>
>> I'm actually a bit surprised to see METRON-691, because I know a while
>> back
>> I did some experiments to ensure that data was being written to the
>> indexes
>> that relate to the timestamp in the message, not the current time, and I
>> thought that messages were getting written to the proper historical
>> indexes, not the current one.  This was so long ago now, though, that it
>> would require another look, and I only reviewed it operationally

Re: [DISCUSS] Making adding new 3rd-party Stellar functions easier

2017-02-27 Thread Matt Foley

Couple thoughts:

1. I see the Accumulo class loader allows multiple clients with potentially 
conflicting loads, via the “context” mechanism.  That’s good. NiFi also used a 
multi-classloader mechanism to support potentially conflicting side-loads of 
their Processor bundles (“nars”), but I don’t think they supported re-loading 
(altho it’s been a few months since I looked at it).

2. I like the idea of loading from a configured location in HDFS.  This gives a 
far smaller scope of filesystem to be watched and/or searched, and of course 
obviates the deploy-to-many-servers problem.  Altho it costs another 
upload/maintenance tool for the admin to fiddle with.

Thanks,
--Matt

On 2/27/17, 11:22 AM, "Casey Stella"  wrote:

Hi All,

The benefit of Stellar is that adding new functionality is as simple as
providing a Jar.  This enables people who want to integrate with Metron to
easy add enrichments or other functionality.  The snag currently with this
is that we provide a single jar, so all stellar functions that we have
available must be dependencies of the main jar that drives the topology
plus what local directories we can configure via the storm configs.  This
makes the process of adding 3rd party jars not as easy as it could be.

What I'm proposing is the following and I'd like to get some community
feedback on it:

   - Split the stellar lang into its own project which does not shade its
   dependencies from metron-common
  - this makes creating your own stellar functions easier as you only
  need depend on a small project
   - Adjust the the following to additionally load classes from a location
   in HDFS /apps/metron/stellar using something like accumulo (
   https://accumulo.apache.org/blog/2014/05/03/accumulo-classloader.html)
  - Profiler topology
  - Parser topology
  - Enrichment topology
  - Enrichment Flat file loader
  - Enrichment MR loader
   - Make the classloader reload upon new files
  - This would necessitate a new Stellar FunctionResolver

I'd like to propose starting with the first two and attempting the third
after we get something stable with the first 2.

What this will give us is the following workflow to enable new stellar
functions:

   - Build your function depending on stellar-lang into a Jar
   - Drop the new jar onto HDFS in /apps/metron/stellar
   - Restart the topology in question (after the 3rd bullet point, this is
   no longer required)

Thoughts?

Re: Files modified after building

2017-02-25 Thread Matt Foley

Well, the last three are in a path with “/generated/” in it.

On 2/25/17, 7:39 PM, "Casey Stella"  wrote:

Crap, those are generated by the new profile selector dsl committed Friday.
I must've missed them on a commit on the PR. They are generated by the
build, so it was factored into the tests and such for the PR. Sorry for the
inconvenience, I'll have to make another PR on Monday to get them in.
On Sat, Feb 25, 2017 at 22:21 Kyle Richardson 
wrote:

> That's my guess. I noticed the same thing.
>
> -Kyle
>
> > On Feb 25, 2017, at 8:31 PM, Otto Fowler 
> wrote:
> >
> > Changes not staged for commit:
> >
> >  (use "git add ..." to update what will be committed)
> >
> >  (use "git checkout -- ..." to discard changes in working
> directory)
> >
> >
> > modified:
> > metron-analytics/metron-profiler-client/src/main/java/Window.tokens
> >
> > modified:
> > metron-analytics/metron-profiler-client/src/main/java/WindowLexer.tokens
> >
> > modified:
> >
> 
metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/window/generated/WindowLexer.java
> >
> > modified:
> >
> 
metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/window/generated/WindowListener.java
> >
> > modified:
> >
> 
metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/window/generated/WindowParser.java
> >
> >
> >
> >
> > Are these files changed by the build?
>

Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC5

2017-02-25 Thread Matt Foley

David, how much RAM do you have in the test system where you run full dev?

On 2/25/17, 8:46 AM, "David Lyle"  wrote:

+1 (binding)
Checked signatures
Tests passed
Full dev ran sucessfully-ish (bit more below, tl;dr- it's ugly, but much
better after METRON-671)
Data flowed into the index/dashboard/hdfs

On full-dev: Monit got into a state where it started killing topologies
because the Storm cli was non-responsive due to memory pressure. That
caused cascading failures. I let it be for about 20 minutes, and it seemed
to calm down. I then killed pcap-service and Monit to make some headroom. I
haven't seen this in my work with METRON-671 as only sensor probes and
pcap-service are managed by Monit. Once the topologies are monitored by
Ambari, we no longer use the Storm CLI for status so that frees up some
additional memory.

-D...



On Sat, Feb 25, 2017 at 10:42 AM, David Lyle  wrote:

> I'm running it up now. I don't think it should affect the release, but
> it's extremely important that full-dev work. It's how quick-dev is 
created.
> Any notion that it's been deprecated is incorrect.
>
> Fwiw, I've been working with it for the last few weeks as part of
> METRON-671 and it's been (knocking loudly on wood) reliable.
> I think we should leave full-dev in as the release verification step,
> particularly in the case that a release changes the Hadoop versions.
> There's a bit of a chicken-egg problem when we bump versions and quick dev
> will lag.
>
> I'll update with run results shortly.
>
> -D...
>
>
> On Sat, Feb 25, 2017 at 9:24 AM, Otto Fowler 
> wrote:
>
>> I see this with full dev on my laptop pretty much every time.
>> I think it is a resource issue.  I see not enough memory errors trying to
>> start things.
>>
>>
>>
>> On February 25, 2017 at 09:15:15, Ryan Merriman (merrim...@gmail.com)
>> wrote:
>>
>> When I go to Ambari to ensure the services are all up, HDFS is down. I
>> tried it 4 or 5 times and got the same result each time. I've seen others
>> validate with quick dev so I assumed full dev was not used anymore. I'll
>> spin it up this morning and get a more detailed error.
>>
>> Is anyone else able to validate it in full dev?
>>
>> > On Feb 25, 2017, at 7:21 AM, Casey Stella  wrote:
>> >
>> > What exactly are the errors that you saw, Ryan?
>> >> On Sat, Feb 25, 2017 at 07:31 David Lyle  wrote:
>> >>
>> >> Is there any reason full dev shouldn't be working?
>> >>
>> >>> On Fri, Feb 24, 2017 at 9:19 PM, Casey Stella 
>> wrote:
>> >>>
>> >>> Sounds like a good idea to me; thanks Ryan!
>>  On Fri, Feb 24, 2017 at 21:11 Ryan Merriman 
>> wrote:
>> 
>>  +1 binding
>> 
>>  Verified the signature
>>  Passed maven tests
>>  Started quick-dev, verified data in ES, kibana, and checked the
>> >>> topologies
>>  for errors (bro topology has parsing errors but I think a couple bad
>>  messages in bro data set is normal)
>>  Tested REPL
>>  RPMs built fine
>> 
>>  The recommended build validation wiki page (
>> https://cwiki.apache.org/
>>  confluence/display/METRON/Verifying+Builds
>>  > >)
>>  has some mistakes. This did
>>  not run successfully in full-dev-platform and the HDFS paths look
>> like
>> >>> they
>>  are old. I am happy to update the wiki page if everyone agrees these
>> >> are
>>  legitimate mistakes.
>> 
>>  On Fri, Feb 24, 2017 at 9:22 AM, Justin Leet 
>>  wrote:
>> 
>> > +1 (non-binding)
>> >
>> > Verified signature
>> > Ran build and tests in maven
>> > Ran up in quick-dev and saw data flow through topologies into the 
UI
>> > Ensured the REPL spun up and performed some basic tasks
>> > Built rpms
>> >
>> > Justin
>> >
>> >> On Thu, Feb 23, 2017 at 11:18 AM, Casey Stella > >
>> > wrote:
>> >
>> >> This is a call to vote on releasing Apache Metron 0.3.1-RC5
>> >>> incubating
>> >>
>> >> Full list of changes in this release:
>> >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>> >> 1-RC5-incubating/CHANGES
>> >>
>> >> The tag/commit to be voted upon is apache-metron-0.3.1-rc5-
>> >>> incubating:
>> >> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
>>

Re: [GitHub] incubator-metron issue #405: METRON-641: Fixed kibana_master.py Python3 miss

2017-02-24 Thread Matt Foley

Thanks David!

On 2/24/17, 5:41 PM, "David Lyle"  wrote:

It's the actual dashboard template in Python pickle format. Belongs in
github.

Re: [DISCUSS] Top domains enrichment config/extractor management

2017-02-24 Thread Matt Foley

+1 to using an Ambari view.  As for just presenting a JSON editor, that’s a lot 
better (for a first stab) than presenting a plain text editor :-)
And doing so also makes sense for an “advanced” tab, just as Ambari typically
exposes a text editor in an advanced tab for text config files.


On 2/24/17, 3:15 PM, "Ryan Merriman"  wrote:

+1 to an Ambari view over the management UI.  If we're going to go to the
trouble of exposing this feature through a UI it should be intuitive and
easy to use.  Simply exposing a json editor in Ambari gets a -1 from me.

Are we keeping track of which enrichments have been loaded?  I believe the
enrichment loader currently does this by adding a new enrichment type to
the various enrichment configs.  It's been a while since I've been in that
part of the code so please correct me if it has evolved since then.  If my
previous statement is true, then that's not ideal because a user should
have a list of available enrichments to pick from.  If we use separate
HBase tables for enrichment types then this problem goes away but if we
continue to use one HBase table then there needs to be some kind of
registry that is maintained by the enrichment loader.

On Fri, Feb 24, 2017 at 4:46 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> The reason I posed this question to the community is because I started to
> recognize some of the shortcomings of doing this solely through Ambari, as
> you and Nick have pointed out. I think an Ambari view over the management
> UI is a great idea. And I'd love to see us provide a more robust mechanism
> for loading these enrichments via the management UI. As you said, perhaps
> Ambari could be used to manage the ZK config around active
> enrichments/locations (the "USE" part of it) while the management UI is
> used for actually loading and managing the enrichments themselves?
>
>
> On Fri, Feb 24, 2017 at 8:12 AM, Casey Stella  wrote:
>
> > Late to chime in here, but I feel that we have discussed Ambari's role
> > before and I think we should probably clarify, as a community a few
> things
> > with regards Ambari vs a management UI built around the REST PR 
currently
> > under review.  (I promise, I will get to the topic at hand eventually ;)
> :
> >
> >- Where functionality should live
> >- Who is responsible for what
> >
> > I will now make a couple (possibly controversial) statements (some of
> > which) we have actually discussed prior to this on the dev list:
> >
> >
> >- I view Ambari as managing the install and the static configuration
> for
> >Metron.  For us, this would include zookeeper configs as well as
> > topology
> >configuration.  This would be the persistent store of truth.
> >- I view Zookeeper to be our runtime configuration store for the
> >topologies.
> >
> >
> >- I view a management UI (and the Stellar Shell) as managing
> >functionality for interacting with the system.  Where it changes
> >configuration, it must go through Ambari.
> >- I believe the management UI should be exposed as an ambari view
> >
> > As such, I see the importation and management of enrichments, which is a
> > data task, to be squarely in the purview of the management UI, whose job
> is
> > the care and feeding of the data.  That being said, any configuration
> > changes to USE the enrichment should at least be routed through ambari,
> but
> > should be managed in the UI.
> >
> > Now the question becomes, should we have enrichment collateral (I'm
> > including both hbase as well as geo or anything else we have) loaded at
> > install-time.  I would argue that we should not.  Rather, we should
> design
> > the management UI so that the enrichments can be added easily, with a
> > wizard to enable the use of the enrichment via stellar for a sensor
> >
> > On that topic, I think we are doing too much as part of our install.  I
> > would argue that we shouldn't pre-load even the geo data or depend on it
> > for the default parsers.
> >
> > Casey
> >
> >
> >
> > On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > With the work committed in
> > > https://github.com/apache/incubator-metron/pull/445 and
> > > https://github.com/apache/incubator-metron/pull/432, we now have a
> > robust
> > > and flexible means to import enrichment sources and transform their
> > > contents as they are inserted into HBase. One of the main motivators
> for
> > > this new functionality was to add the ability to load top domain
> rankings
> > > from sources such as Alexa. The

Re: JSONMapParser Normalizer aka Flattener

2017-02-24 Thread Matt Foley

I view “flattening” as moving from a highly functional format (JSON) to a less 
functional format (flat, text-like).
Many of the less functional formats we see, that do not permit hierarchical 
Maps, nevertheless permit List and List. (No, not List, 
obviously.)

A typical, and workable, flattening would be comma-separated substrings (the 
below illustrates a possible escape):
"list": ["e1", "e2,foo", "e3"]
->
list: "e1,e2\,foo,e3" or " e1,e2foo,e3"

A list of numbers, with period decimal mark, also does well as a 
comma-separated string.

I would suggest that only List and List be flattened this way.
This allows us to express the kind of formats that are typical in flat 
configuration parameter files used by many other systems.

List could be reasonably flattened with dotted indexes, as you suggest, 
or left for a later implementation.

--Matt

On 2/24/17, 7:34 AM, "Nick Allen"  wrote:

And sorry this should be...

"list": ["e1", "e2", "e3"]
->
list.0: e1
list.1: e2
list.2: e3

On Fri, Feb 24, 2017 at 10:26 AM, Nick Allen  wrote:

> So I don't need to unfold lists, but I do maps?  I thought the commentary
> on METRON-686 was that Solr cannot handle "complex types".  I took that to
> mean both maps and lists.
>
> Yes, Otto. The only reasonable way to unfold a list would be using the
> index of the element.
>
> "list": ["e1", "e2", "e3"]
> ->
> list.0.e1
> list.1.e2
> list.2.e3
>
>
> I don't want to unfold lists or maps. :)
>
>
>
>
> On Fri, Feb 24, 2017 at 10:17 AM, Otto Fowler 
> wrote:
>
>> No, I don’t think it does.  I am not sure how you would do that, other
>> than putting a number at the end of the unwrapped array item?
>>
>>
>>
>> On February 24, 2017 at 10:12:26, Nick Allen (n...@nickallen.org) wrote:
>>
>> Per Otto's advice, I am looking to reuse the normalizer/flattener
>> mechanism
>> that currently exists in JSONMapParser. It looks like the mechanism is
>> built into the class, so I will have to extract it. It looks like landing
>> it in JSONUtils is a logical place.
>>
>> It appears that the mechanism only handles maps, not lists. Is that true?
>> I will need to add similar functionality for lists to reuse this for
>> METRON-686.
>>
>>
>

Re: [DISCUSS] 0.3.1 Release situation

2017-02-22 Thread Matt Foley

I’m +1 on pulling the current RC and re-voting with at least the METRON-734 fix 
included.
I’m 0 (zero, unsigned :-) on whether or not to roll to current HEAD.  Normally 
I wouldn’t do it, but the CEF parser is clearly a valuable addition.

Thanks,
--Matt

On 2/22/17, 8:43 AM, "Casey Stella"  wrote:

I'm in favor of moving 0.3.1 RC5 concurrent with master.  I see a number of
things there will make the release better:

   - Better docs in the doc-book
   - The CEF parser


Casey

On Wed, Feb 22, 2017 at 7:46 AM, Kyle Richardson 
wrote:

> +1 on pulling and cutting a new RC. Would we simply patch rc4 with this 
one
> change or include all of the master commits too?
>
> -Kyle
>
> On Wed, Feb 22, 2017 at 10:29 AM, Nick Allen  wrote:
>
> > +1 I agree with you Casey.  I think we should re-cut the release.
> >
> > On Wed, Feb 22, 2017 at 10:27 AM, Casey Stella 
> wrote:
> >
> > > As you are all aware by now, we have an issue with our maven build.  
In
> > > short, we tripped on https://github.com/maxmind/GeoIP2-java/issues/77
> > >
> > > As such, our build no longer works, but also our RC for 0.3.1 no 
longer
> > > builds.  I am inclined to pull the release candidate from voting on
> > > incubator general and re-cut a new candidate after the fix METRON-734 
(
> > > https://github.com/apache/incubator-metron/pull/462) gets in later
> > today.
> > > My reasoning is that the current situation makes the release candidate
> > > un-releasable due to it not being able to be build.
> > >
> > > I would like to bring that decision to the community and get some
> > feedback,
> > > though, before I summarily retract the candidate on incubator general.
> > >
> > > Thoughts?
> > >
> > > Best,
> > >
> > > Casey
> > >
> >
>

Re: [DISCUSS] Coding style via checkstyle

2017-02-21 Thread Matt Foley

+1, so do I.  Also like the idea of providing the necessary IntelliJ 
specification.

On 2/21/17, 1:25 PM, "Otto Fowler"  wrote:

+1.  I agree with Michael’s points.

On February 21, 2017 at 16:23:21, Michael Miklavcic (
michael.miklav...@gmail.com) wrote:

+1 to a blanket reformat, failed build for improper formatting, and
automated formatting. I strongly prefer to remove "thinking" from my code
formatting and it has worked very well for me on large projects in the
past. There is capability now in IntelliJ to work with Checkstyle as well.
https://youtrack.jetbrains.com/issue/IDEA-61520#comment=27-1292600
https://plugins.jetbrains.com/idea/plugin/1065-checkstyle-idea

A quick search didn't yield any obviously robust tools for automating the
formatting other than an older non-maintained project named Jalopy. I think
the checkstyle integration with IntelliJ and Eclipse should suffice since
the Maven plugin would give devs the ability to run checks locally and in
Github via Travis.


On Tue, Feb 21, 2017 at 12:32 PM, Nick Allen  wrote:

> I would be in favor of a blanket, reformat. Whether that is for the
entire
> code base or one project at a time. Might be able to conquer and divide
> some of the heavy-lifting of testing, if we do a project at a time. But
> whichever way you think is easier. I'd be glad to help.
>
> On Tue, Feb 21, 2017 at 1:57 PM, Justin Leet 
> wrote:
>
> > I already tried a blanket, manual reformat the other day, through
> > IntelliJ. I did every file matching *.java in the project and it was
> > pretty quick. I didn't validate everything looked perfect afterwards,
> but I
> > did click into a few files and things looked fine. I'm not quite sure
> what
> > the lifecycle of our autogenerated stuff is, so we'd want to regen
> > afterwards, but it's a pretty trivial thing to do.
> >
> > I'm sure there's more nuance (and definitely more testing) than that,
but
> > off the top of my head I'm not sure what it would be. Either way, I
don't
> > think there's a huge amount of effort to just do the reformat, but we'd
> > still want to spin everything up and test it and so on. It's probably
> more
> > work for everybody to rebase onto the (vastly) reformatted code than
> > anything else, which will vary pretty significantly.
> >
> > For (slight) context, the changes are enough to eliminate ~5k
checkstyle
> > warnings (and there might be more if we have to tweak anything in the
> code
> > formatting).
> >
> > On Tue, Feb 21, 2017 at 10:34 AM, Casey Stella 
> wrote:
> >
> > > Any idea, with those modifications to checkstyle, how much effort it
> will
> > > take to reformat the code to conform?
> > >
> > > On Tue, Feb 21, 2017 at 8:23 AM, Justin Leet 
> > > wrote:
> > >
> > > > As part of:
> > > > https://issues.apache.org/jira/browse/METRON-726
> > > > https://github.com/apache/incubator-metron/pull/459
> > > >
> > > > I integrated checkstyle into the mvn:site command, and have
> checkstyle
> > > > reports being run as part of the mvn:site reporting. I expect to be
> > > > celebrating hitting 25k checkstyle warnings soon.
> > > >
> > > > I tested out creating a code formatting setup in IntelliJ, with a
> > couple
> > > > slight modifications of the default Sun conventions (extended the
> > > character
> > > > limit of a line past 80 and made it two space indents). Given that
> > > > checkstyle includes it as a default option, it's probably
reasonably
> > > close
> > > > to the Sun conventions. I'm thinking we probably also at least
create
> > an
> > > > Eclipse profile, to open up ease of development.
> > > >
> > > > There's probably also a discussion about how exactly we want to
> enforce
> > > it.
> > > > Is it just something we add to the PR checklist and have reviewers
> > give a
> > > > glance, do we setup a hook to autoformat code, etc?
> > > >
> > > > Justin
> > > >
> > >
> >
>

Re: [DISCUSS] Sketch Libraries

2017-02-21 Thread Matt Foley

Looks interesting.  Any indication whether it supports MAD (median absolute 
deviation) for outlier detection?


On 2/21/17, 8:08 AM, "Nick Allen"  wrote:

We currently use the tdunning/t-digest
 library for generating our STATS_*
sketches and then a separate library addthis/stream-lib
 for doing the HLL distinct count.

I ran across another library originating from Yahoo that looks quite
featureful, well documented and quite active.  On the surface it *seems* to
be able to do what we need for both the STATS_* sketches and HLL.

https://datasketches.github.io/


Has anyone evaluated this library before?  Are there deficiencies as
compared to the libraries that we currently use?

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-17 Thread Matt Foley

Outstanding write-up, Otto!  As Casey said, don’t expect this to be a coherent 
response, but some possibly useful thoughts:

1. It’s clear that because parsers, enrichers, and indexers are all specialized 
per sensor, that “adding a new sensor” is necessarily a complex operation.  
You’ve thrown a lasso around it all, and suggested auto-generation of the 
generic parts.  Excellent start.

In my fuzzy computer-sciencey way, your sketch makes me view this as an 
Inversion of Control scenario ( 
https://en.wikipedia.org/wiki/Inversion_of_control ).  I know I don’t have to 
define this for our readers, but allow me to quote one paragraph, from article 
http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
 :

“[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be 
added and tested independently of other objects, because they don't depend on 
anything other than what you pass them. When using traditional dependencies, to 
test an object you have to create an environment where all of its dependencies 
exist and are reachable before you can test it. With [IoC or] DI, it's possible 
to test the object in isolation passing it mock objects for the ones you don't 
want or need to create. Likewise, adding a class to a project is facilitated 
because the class is self-contained, so this avoids the ‘big hairball’ that 
large projects often evolve into.”

Surely part of what we want, no?  Does it make sense to use Spring or Guice to 
drive the integration (and design) of this extensibility capability?  I know 
this could be viewed as an implementation issue, but you said you’re starting 
to prototype, and these things are best integrated from the beginning.


2. Regarding configuration, consider that some (dynamic config parameters) will 
be dynamically read during runtime and some (static config parameters) will 
require restarting (or re-instantiating) the components.  Config params that 
want to be read dynamically should definitely go in ZK so they can take 
advantage of Curator notifications.  Static config params, that can only 
usefully be set at startup or instantiation, could either go in ZK or be 
handled the traditional way in Ambari as files on all configured hosts.  If you 
choose to put static params also in ZK, note that separating static and dynamic 
configs into different znodes makes the process of monitoring changes in the 
dynamic configs more efficient, and this is unrelated to the human-readable 
grouping of params the user sees in a UI.

I am talking with Ambari engineers about implementing an ability for Ambari to 
manage config parameters in ZK, at the option of the component implementor, and 
expect to be opening Apache Ambari jiras soon.  At the Ambari UI level there 
should be no difference; at the implementation level a json or other config 
file could be written once to a ZK znode instead of to filesystem files on all 
configured hosts.  The usages could be mixed, with the component implementation 
deciding which config files get written to which target.

3. Yes I read that far :-)

Again, great draft.
Thanks,
--Matt

On 2/17/17, 1:07 PM, "Otto Fowler"  wrote:

RE:
* One Module - yes, I think grouping for the base parsers is good,  I just
don’t want them to stay in -common, it should ‘live’ in the metron lib.  I
think a grouped set of the primitive parsers is correct, still it’s own.
* ES Templates - they don’t *have* to be there, but if they are they will
be used.  The idea that I’m having is “ someone writing a parser should be
able to produce 1 thing, in one place”.  We are talking with Simon on a
different thread about the types of indexing templates we could have.  I
think we could have from *nothing to es or solr specific to something new

As we discuss we can come up with the mv-pr.

On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote:

Ok, This is a long one, so don't expect a coherent response just yet, but I
will give some initial impressions:

- I strongly agree with the premise of this idea. Making Metron
extensible is and should be among the top of our priorities and at the
moment, it's painful to develop a new parser.
- One maven module per parser may be overkill here as the shading is
costly and I think it may make some sense to group based on characteristics
in some way (e.g. json and csv may get grouped together).
- The notion of instance vs parser is a good one
- Binding ES templates and parsers may not be a good idea. You can have
non-indexed parsers (e.g. streaming enrichments).

Can we start small here and then iterate toward the complete vision? I'd
recommend

- Splitting the parsers up into some coherent organization with common
bits separated from the parser itself
- Having a maven archetype

As

Re: Cannot close JIRAs I didn't originally request

2017-02-17 Thread Matt Foley

For what it’s worth, I agree with Jon and Billie that “somebody” (ie, a PMC 
member) should post an INFRA ticket requesting “Contributors” be given the 
“Transition Issues” permission in the METRON project in Jira.

--Matt

On 2/17/17, 9:37 AM, "zeo...@gmail.com"  wrote:

Thoughts?  Just want to put this one to rest, one way or another.

Jon

On Fri, Feb 3, 2017 at 8:03 PM zeo...@gmail.com  wrote:

> Has anybody had a chance to look into this and decide whether a change
> should be made?  This specific incident is no longer an issue but I'd like
> to clean it up for next time. Thanks,
>
> Jon
>
> On Thu, Jan 26, 2017, 12:47 PM Billie Rinaldi  wrote:
>
> It appears that Contributors are missing from the Transition Issues
> permission, which seems odd since Contributors do have the Close Issues 
and
> Resolve Issues permissions. This may be keeping Jon from transitioning the
> issue to Done, since Metron has a nonstandard workflow which has Done
> instead of Resolved as the completed state for issues. The workflow may
> also be why there is no way to specify the reason for the resolution of 
the
> ticket. If we want to make any adjustments to the permissions or workflow,
> that will require an INFRA ticket.
>
> On Thu, Jan 26, 2017 at 8:30 AM, zeo...@gmail.com 
> wrote:
>
> > I assigned a JIRA (METRON-354
> > ) to me that was
> > reported
> > by someone else, and it appears that I don't have the permissions to
> close
> > it.  For clarity, I didn't have access to close it before I assigned it
> to
> > myself either.  Would someone be willing to delegate me the appropriate
> > access?  Alternatively, if you're not comfortable providing that level 
of
> > access, or if it is not simple to do, I would be happy to ask that the
> > original requester close their own ticket.
> >
> > Thanks,
> >
> > Jon
> > --
> >
> > Jon
> >
> > Sent from my mobile device
> >
>
> --
>
> Jon
>
> Sent from my mobile device
>
-- 

Jon

Sent from my mobile device

Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC4

2017-02-14 Thread Matt Foley

I just found something in the docs.  I noticed the formatting was messed up in 
the generated html for file metron-platform/metron-data-management/README.md.  
The cause of this is due to use of quadruple back-ticks instead of the correct 
triple back-ticks to delimit codeblocks.  Doxia-markdown doesn’t like this at 
all, even tho Github-MD for some reason doesn’t have a problem.  This in turn 
interrupted the re-write process on this file, so all the markdown dialect 
issues were unfixed and the bullets got munched into paragraphs, etc.  I have 
documented this in https://issues.apache.org/jira/browse/METRON-719 

I had previously fixed this problem in this file, but a few instances snuck 
back in during a later edit.  Guilty parties have been informed privately :-)

Because this is only a docs issue, I don’t feel it is sufficient to force a 
start-over on the vote.  However, if no one objects I would like Casey to 
substitute the correctly formatted file into the site-book at 
https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/book-site/metron-platform/metron-data-management/index.html
   We could then document it as a known issue in 0.3.1, and I will submit the 
patch for integration immediately after 0.3.1.

Is that acceptable?
Thanks,
--Matt

On 2/13/17, 4:53 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:

+1

Compared contents of release tarball 
https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/apache-metron-0.3.1-rc4-incubating.tar.gz
 with contents of git tag apache-metron-0.3.0-rc4-incubating.  They match.

Confirmed build and full unit test.
Build Mpack 
Build RPMs

Install on single-node CentOS7 VM, with Ambari-2.4.2.0 and HDP-2.5.3.0 
stack (with changes from METRON-609 as known needed for single-node deployment, 
especially reduced elasticsearch.master.yml)

Ran bro data through the system and observed proportional emits from 
parser, enrichment, and indexing topologies.
Did not validate indexing due to human error during installation.

--Matt

On 2/10/17, 12:22 PM, "Casey Stella" <ceste...@gmail.com> wrote:

This is a call to vote on releasing Apache Metron 0.3.1-RC4 incubating

Full list of changes in this release:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/CHANGES

The tag/commit to be voted upon is apache-metron-0.3.1-rc4-incubating:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc4-incubating

The source archive being voted upon can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/apache-metron-0.3.1-rc4-incubating.tar.gz

Other release files, signatures and digests can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/

The release artifacts are signed with the following key:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc4-incubating

The book associated with this RC is located at

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/book-site/index.html

Please vote on releasing this package as Apache Metron 0.3.1-RC4 
incubating

When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted 
here:
https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds


This vote will be open for at least 72 hours.

[ ] +1 Release this package as Apache Metron 0.3.1-RC4 incubating

[ ]  0 No opinion

[ ] -1 Do not release this package because...

Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC4

2017-02-13 Thread Matt Foley

+1

Compared contents of release tarball 
https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/apache-metron-0.3.1-rc4-incubating.tar.gz
 with contents of git tag apache-metron-0.3.0-rc4-incubating.  They match.

Confirmed build and full unit test.
Build Mpack 
Build RPMs

Install on single-node CentOS7 VM, with Ambari-2.4.2.0 and HDP-2.5.3.0 stack 
(with changes from METRON-609 as known needed for single-node deployment, 
especially reduced elasticsearch.master.yml)

Ran bro data through the system and observed proportional emits from parser, 
enrichment, and indexing topologies.
Did not validate indexing due to human error during installation.

--Matt

On 2/10/17, 12:22 PM, "Casey Stella"  wrote:

This is a call to vote on releasing Apache Metron 0.3.1-RC4 incubating

Full list of changes in this release:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/CHANGES

The tag/commit to be voted upon is apache-metron-0.3.1-rc4-incubating:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc4-incubating

The source archive being voted upon can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/apache-metron-0.3.1-rc4-incubating.tar.gz

Other release files, signatures and digests can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/

The release artifacts are signed with the following key:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc4-incubating

The book associated with this RC is located at

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC4-incubating/book-site/index.html

Please vote on releasing this package as Apache Metron 0.3.1-RC4 incubating

When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted here:
https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds


This vote will be open for at least 72 hours.

[ ] +1 Release this package as Apache Metron 0.3.1-RC4 incubating

[ ]  0 No opinion

[ ] -1 Do not release this package because...

Re: Site-Book

2017-02-13 Thread Matt Foley

Okay, thanks.  I suggest grabbing the text from the PR#429 introduction.

From: Otto Fowler <ottobackwa...@gmail.com>
Date: Monday, February 13, 2017 at 11:09 AM
To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>, Matt 
Foley <ma...@apache.org>
Subject: Re: Site-Book

Actually I was going to take a stab at it, but I was reviewing the error 
indexing stuff.  Sorry to be tardy.

I’ll still take a stab if you have not done it.  Assign the jira to me

On February 13, 2017 at 13:51:55, Matt Foley (ma...@apache.org) wrote:

Assuming that I should take that as a request rather than an offer :-) , I’ve 
opened https://issues.apache.org/jira/browse/METRON-716 

Thanks, 
--Matt 

On 2/13/17, 7:02 AM, "Casey Stella" <ceste...@gmail.com> wrote: 

Yes, definitely. 
On Mon, Feb 13, 2017 at 09:01 Otto Fowler <ottobackwa...@gmail.com> wrote: 

> Should Site-Book have a README.md describing the contents, how to build 
> etc? 
>

Re: Site-Book

2017-02-13 Thread Matt Foley

Assuming that I should take that as a request rather than an offer :-) , I’ve 
opened https://issues.apache.org/jira/browse/METRON-716

Thanks,
--Matt

On 2/13/17, 7:02 AM, "Casey Stella"  wrote:

Yes, definitely.
On Mon, Feb 13, 2017 at 09:01 Otto Fowler  wrote:

> Should Site-Book have a README.md describing the contents, how to build
> etc?
>

Re: Rev additional metron components?

2017-02-09 Thread Matt Foley

The only reason not to go “backwards” is if someone is going to try to use 
Ambari Upgrade to move from the 0.3.0 Mpack to this one.

I THINK it’s unlikely this is a concern, so I’m okay with 0.3.1.0, but I would 
change my opinion if someone says a real-world user in the field will want to 
use Ambari Upgrade. 

A moot point, of course, if Ambari Upgrade is known to NOT work with this (and 
the previous) Mpack.  So if someone can definitively say that, that would be 
good to know too.

Cheers,
--Matt

On 2/9/17, 11:53 AM, "David Lyle"  wrote:

I'm good with 0.3.1.0.

-D...

On Thu, Feb 9, 2017 at 2:36 PM, zeo...@gmail.com  wrote:

> I agree with Casey regarding the version itself, but I'd be fine with
> somethign else if someone else has a convincing argument.
>
> Jon
>
> On Thu, Feb 9, 2017 at 2:12 PM Justin Leet  wrote:
>
> I can pick this up once we have an agreement on the version number.  When
> we agree on that, I'll make a Jira and rev it.
>
> Justin
>
> On Thu, Feb 9, 2017 at 2:05 PM, Casey Stella  wrote:
>
> > I do agree that the MPack should be rev'd and a new RC should be cut.  
Is
> > there a way to name the versioning of the management pack so that it
> > indicates the oldest version of Metron that can be installed with that
> > version?  So, in this case, maybe 0.3.1.0?
> >
> > Also, I'm looking for volunteers to take this renaming JIRA once we
> decide
> > to do it.
> >
> > Casey
> >
> > On Thu, Feb 9, 2017 at 1:56 PM, David Lyle  wrote:
> >
> > > Good looking out, Jon!
> > >
> > > I would recommend against version matching it with Metron. In the
> future,
> > > the MPack will need to rev much less frequently than Metron, so MPack
> rev
> > > x.x.x.x will install Metron y.y.y+. My read on the prior release bits
> is
> > > that 0.3.0 is using MPack 1.0.0.0-SNAPSHOT, which is either an error 
or
> > an
> > > indication that we didn't actually release the MPack as part of 0.3.0
> > > (which is my view), so if we agree it's ready, we can call this one
> > 1.0.0.0
> > > and cut a new RC with that change.
> > >
> > > I'd also support the following:
> > >
> > > Declare it "not ready" and leave it at 1.0.0.0-SNAPSHOT
> > > Decide 0.3.0 actually did contain MPack 1.0.0.0 and increment this to
> > > 1.0.1.0.
> > > (I'm sure there are other ways as well)
> > >
> > > My (weak) preference is to simply call this one 1.0.0.0.
> > >
> > >
> > > -D...
> > >
> > >
> > > On Thu, Feb 9, 2017 at 1:43 PM, zeo...@gmail.com 
> > wrote:
> > >
> > > > So I was spinning up the 0.3.1-RC3 candidate on my bare metal 
cluster
> > > today
> > > > and I noticed that when I generated the mpack it still had a version
> of
> > > > 1.0.0.0.  I double checked and made sure that the mpack existed in
> the
> > > > 0.3.0 release
> > > >  > > > 0.3.0/metron-deployment>
> > > > and
> > > > that it was modified in between releases via the changelog.  I would
> > > > normally recommend that we modify the version to match with Metron
> > > (0.3.1)
> > > > but that would be going backwards.  Thoughts?
> > > >
> > > > Jon
> > > > --
> > > >
> > > > Jon
> > > >
> > > > Sent from my mobile device
> > > >
> > >
> >
>
> --
>
> Jon
>
> Sent from my mobile device
>

Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2

2017-02-07 Thread Matt Foley

Casey, the below vote call message has several inconsistencies that invalidate 
it.  Please search for “RC1” or “rc1”.  I count three, starting with the first 
line :-)  There is also an instance of “0.3.0”.
Thanks,
--Matt

On 2/7/17, 8:18 AM, "Casey Stella"  wrote:

This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating


Full list of changes in this release:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/CHANGES


The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating:


https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating

The source archive being voted upon can be found here:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz

Other release files, signatures and digests can be found here:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/

The release artifacts are signed with the following key:


https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc2-incubating


Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating


When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted here:

https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds


This vote will be open for at least 72 hours.


[ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating

[ ]  0 No opinion

[ ] -1 Do not release this package because...

Re: [DISCUSS] Next Release (0.3.1) Content

2017-02-02 Thread Matt Foley

Thanks, Casey.  Btw, I’m not far enough along on METRON-322 (batch timeout 
flush), so it will have to wait for next cycle.
--Matt

On 2/2/17, 2:03 PM, "Casey Stella"  wrote:

Just a quick release update.  As of now, we are waiting on

   - METRON-660 to get reviewed and make it in
   - METRON-692 to get our upgrade.md completed for this release

Mike Miklavcic, you sent an email to the legal-discuss about our kraken
dependency and it looked like we didn't have to change, but could you
comment on this thread in the dev list so I know if we need to wait for
METRON-650.

Casey

On Thu, Feb 2, 2017 at 4:58 PM, Casey Stella  wrote:

> Ok, I've created the upgrading document for 0.3.0 to 0.3.1 and included
> the things that I know about and the things Jon mentioned here.  Please, 
if
> you have knowledge of other breaking/non-compatible changes between the
> 0.3.0 release and master, comment on this PR (https://github.com/apache/
> incubator-metron/pull/437) and I will incorporate them.
>
> On Fri, Jan 27, 2017 at 10:04 AM, zeo...@gmail.com 
> wrote:
>
>> To start I was mostly concerned with having a per-version list of
>> non-backwards-compatible changes, so upgrades that may skip a version of
>> two can look at what all may be impacted.  We should also probably
>> document
>> any sort of upgrade flaws as well, such as METRON-447
>> , METRON-448
>> , etc.  I do think that
>> we should have a more rigorous document, but I wouldn't push that for the
>> 0.3.1 release.  I see that (along with the Management UI, API, etc.) as
>> key
>> (required) components of a 1.0 release.  I'd just like to see the
>> foundation begin to be laid and iterated on.
>>
>> That said, this probably constitutes a mention in the development
>> guidelines
>> > pageId=61332235>
>> once
>> it's in master.
>>
>> Jon
>>
>> On Fri, Jan 27, 2017 at 9:05 AM Casey Stella  wrote:
>>
>> > I should add, you may be thinking something more rigorous and
>> > step-by-step.  If so, you think you might be interested in volunteering
>> to
>> > do a first draft as a PR that we can adjust?
>> >
>> > On Fri, Jan 27, 2017 at 9:01 AM, Casey Stella 
>> wrote:
>> >
>> > > So, I agree with the Upgrading.md and I was going to submit a PR at
>> least
>> > > to describe the the changes to indexing configurations that I made
>> during
>> > > the 3.0.1 release.
>> > >
>> > >
>> > > On Thu, Jan 26, 2017 at 10:50 PM, zeo...@gmail.com 
>> > > wrote:
>> > >
>> > >> I haven't had a chance to look through the unresolved JIRAs but I 
did
>> > want
>> > >> to mention a few quick things.
>> > >>
>> > >> First, when we released 0.3.0 and dropped the BETA flag, one of the
>> > things
>> > >> that was discussed was putting together a method of documenting
>> upgrades
>> > >> from one version to the next.  As one of the first steps toward
>> making
>> > >> that
>> > >> a reasonable process, I think we should assemble more detailed
>> release
>> > >> notes, especially outlining non-backwards compatible changes.  In 
the
>> > >> "[DISCUSS] Next Release Name" email thread Kyle Richardson suggested
>> we
>> > >> use
>> > >> "UPGRADING.md" to do this, and I still agree with that thought, but
>> I'm
>> > >> open to alternatives.
>> > >>
>> > >> Separately, a *nice to have* would be *METRON-660*, which was
>> discussed
>> > in
>> > >> the "[PROPOSAL] up-to-date versioned documentation" thread, to give
>> us
>> > >> some
>> > >> cleaner documentation using the existing READMEs.  I'd be happy to
>> help
>> > >> with this one, I'm just not sure what the next steps are, aside from
>> the
>> > >> start that Matt has here
>> > >> .
>> > >>
>> > >> My last *nice to have* is *METRON-635*, which I have a *PR open for
>> here
>> > >> *.  If I could
>> get
>> > >> someone else to reproduce the error that I'm seeing I would be happy
>> to
>> > >> pursue additional testing, troubleshooting, etc.  I've seen others
>> > report
>> > >> the same issue
>> > >> > > >> question=search%2Fsearch=relevance=scp_if_ssh>
>> > >> on the HCC boards, so I'm fairly confident that it is not an issue
>> with
>> > my
>> > >> local

Re: [DISCUSS] Expansion of the capabilities of PROFILE_GET

2017-01-31 Thread Matt Foley

Casey, this sounds great.  

1. I think you have a model for how to do the natural-language-look DSL, that 
maybe isn’t clear to the rest of us.  Does your contemplated approach meet the 
following?
a) Can it be specified as a formal grammar, not too complex, so anyone who 
bothers can visually parse a sentence and see that it is conforming (or not)? – 
Obviously I’m concerned that we not bite off a full NLP problem for the sake of 
ease of use.
b) Can it dump a human-understandable parse tree on request, so if it does the 
unexpected (or just fails to work), the user can easily figure out what the 
program did and why?
c) And you already said it wouldn’t be much more complex to implement than 
using a Map parameter list.

Given those 3 things, I’d vote for the NL, otherwise the parameter list.

2. One more thing I’d recommend for PROFILE_LOOKBACK():  wildcarding of Groups. 

- Easy to implement for groups based on small finite sets like day-of-week.
- Also easy altho perhaps slow for large finite sets, and multi-dimensional 
group sub-keys. 
- Not clear how to do for indeterminate sets from random Stellar functions, but 
we should talk to HBase gurus and see what we can do, using HBase’s scan 
capability.
- If groups sub-keys are not already at the end of the row key structure, we 
may want to move them there, to make the latter case easier and more efficient.

3. There’s another thing I’ve been noodling, that deserves its own discussion, 
but is relevant to mention here – or if you really like it, could be included 
in PROFILE_LOOKBACK project, as it’s not a difficult thing:
Currently the Profiler config settings, at the time a Profile is run, get 
burned into the HBase row keys, such that if you don’t _a priori_ know those 
config settings, you can’t read the Profile.  I’d like to suggest we start 
saving Profile metadata, also in HBase, every time a new Profile is started 
(and, if possible, ended) so that I can say “Get that old profile from November 
14-21, whatever its metadata was” without needing to know its period, groups, 
etc.  Obviously it would be nice to be able to query the metadata itself, too, 
but just having the metadata gives us the right start.

--Matt

On 1/31/17, 10:43 AM, "Casey Stella"  wrote:

Regarding the "?" syntax:
Wouldn't that be forking cron syntax so now we have a metron cron?  If
we're constructing our own syntax, then why not do it so that it reads like
natural language?

Regarding the holiday problem:
Agreed, it's a smaller problem than constructing a DSL, but that's not
really the point, I think. The concern is that it would be unable to be
expressed using cron syntax in a natural way without modifying cron syntax,
which would be constructing a new DSL.  If quartz has a clever way of doing
that, then I'd like to see it.  From a quick search, I haven't seen a
scheduling example with a compact syntax that shows skipping holidays with
cron syntax.

On Tue, Jan 31, 2017 at 1:29 PM, Nick Allen  wrote:

> >
> >- Cron syntax allows you to construct only absolute lookbacks (i.e.
> >"every tuesday at 3PM" not "every tuesday at the current hour")
>
>
> I think Cron would work for this.  I am no expert on cron expressions, but
> I think the following examples would work.
>
>- If you want "every Tuesday at 3 PM"
>   - 0 0 15 ? * TUE *
>- If you want "every Tuesday at current hour" then use something like
>the "?" placeholder maybe.
>   - 0 0 ? ? * TUE *
>
> - Cron syntax allows you to specify a point in time, not a duration.  We
> >could, of course, specify a duration as another argument
>
>
> Yes, a separate argument would be necessary.  We would have to allow the
> user to specify either a "start from date/time" or the "number of 
intervals
> to look back".
>
> Cron syntax does not allow you to skip things like holidays, etc.
>
>
> I agree, out-of-the-box Cron does not solve holiday calendars.  But this
> would be a smaller problem to solve then creating our own DSL.
>
> There is a tradition of creating shortcuts that look something like @Daily
> or @Weekdays or @Tuesdays that we could also use to make things easier for
> users.
>
> I have used Quartz with cron expressions in the past and there was some 
way
> to handle holidays with that.  I think you could create a custom calendar
> for the holidays and call it something; aka @USHolidays.  And then you
> would say "every Tuesday" except @USHolidays or something like that.  I'd
> have to look into this some more.
>
> And there are also nice online Cron expression "translators" that we could
> mimic in a Metron user interface.  For example, https://crontab.guru.
>
>
>
>
> On Tue, Jan 31, 2017 at 12:00 PM, Casey

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-29 Thread Matt Foley

Hi all, please take a look at 
https://github.com/apache/incubator-metron/pull/429
I think all major issues are resolved, as best I can tell from an eyeball scan 
of the result.
Thanks,
--Matt

On 1/19/17, 12:06 PM, "Justin Leet" <justinjl...@gmail.com> wrote:

Yeah, this looks like a huge leap forward, and I'm thrilled that you made
such good progress.  Great job, Matt.

On Thu, Jan 19, 2017 at 1:55 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Agreed! Matt, thanks for taking this on and glad I could help.
>
> M
>
> On Thu, Jan 19, 2017 at 11:42 AM, Casey Stella <ceste...@gmail.com> wrote:
>
> > Oh wow, I really like the looks of that.  I was skeptical before, but if
> > you got that far in a couple of days, I think this is a worth-while
> > endeavor!  Thanks so much Matt!
> >
> > Casey
> >
> > On Thu, Jan 19, 2017 at 12:39 PM, Matt Foley <ma...@apache.org> wrote:
> >
> > > Thanks, Jon!  I’m working on characterizing exactly how to fix the two
> > > main issues.  I think I’ve got a script that will auto-fix the
> > > triple-backtick problem.  The bullet list problem will require
> > > hand-editing, so I want to make sure I’ve got the right 
recommendation.
> > >
> > > The larger issue is going thru and making the doc content better and
> more
> > > usable.
> > > But that can occur over time, and will be motivated by having the book
> > > there to gripe about :-)
> > >
> > > On 1/19/17, 9:05 AM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
> > >
> > > Looking at the screenshot, that would be an incredible improvement
> on
> > > what
> > > we currently have.  I'd be happy to help out with any markdown
> > > modifications and documentation cleanup, if necessary, to fix the
> > > problems
> > > you outlined above.
> > >
> > > Jon
> > >
> > > On Thu, Jan 19, 2017 at 11:22 AM Matt Foley <ma...@apache.org>
> > wrote:
> > >
> > > > Sorry, I forgot text-only messages won’t accept attachments.
> > Please
> > > see
> > > >
> > > > https://issues.apache.org/jira/secure/attachment/
> > > 12848335/Metron-book-screenshot.png
> > > >
> > > > Thanks,
> > > > --Matt
> > > >
> > >     >
> > > > On 1/19/17, 6:03 AM, "Otto Fowler" <ottobackwa...@gmail.com>
> > wrote:
> > > >
> > > > Not seeing the attachment, is it attached to a jira?
> > > >
> > > >
> > > > On January 19, 2017 at 04:19:02, Matt Foley (
> ma...@apache.org)
> > > wrote:
> > > >
> > > > Here’s a screen shot, attached :-)
> > > >
> > > > On 1/19/17, 1:04 AM, "Matt Foley" <ma...@apache.org> wrote:
> > > >
> > > > Hi all,
> > > > I’ve put together a prototype doc book, along the lines we
> > > discussed,
> > > > and
> > > > it looks pretty worthwhile.
> > > > Many thanks to Mike M. who whipped the pom.xml file into
> shape,
> > > and
> > > > helped
> > > > me find the right site.xml file to imitate.
> > > >
> > > > If you’re interested, please do a single-branch clone as
> > follows:
> > > > git clone --single-branch -b METRON-660
> > > > https://github.com/mattf-horton/incubator-metron.git
> > [clonename]
> > > > (or whatever git command pleases you :-)
> > > >
> > > > In this branch, there’s a new top-level subdirectory named
> > > site-book/.
> > > > This
> > > > is not necessarily how we want to integrate stuff, it was
> just
> > > > convenient
> > > > to do it separate from the existing site/ directory for now.
> To
> > > build
> > > > the
> > > > book, do these three commands:

Re: [DISCUSS] Next Release (0.3.1) Content

2017-01-27 Thread Matt Foley

I think I have the last formatting problem resolved for METRON-660, and plan to 
submit a PR this evening.  I would like to see it added to 0.3.1.  Note that we 
still have to agree on how to integrate it.

I’ve also been working on METRON-322, providing flush timeouts for batched 
writers.  I believe that’s been discussed in previous sprint planning meetings? 
I need to rebase it, but I think I can have it ready Monday or Tuesday, if 
that’s early enough to include.

Thanks,
--Matt

On 1/26/17, 7:50 PM, "zeo...@gmail.com"  wrote:

I haven't had a chance to look through the unresolved JIRAs but I did want
to mention a few quick things.

First, when we released 0.3.0 and dropped the BETA flag, one of the things
that was discussed was putting together a method of documenting upgrades
from one version to the next.  As one of the first steps toward making that
a reasonable process, I think we should assemble more detailed release
notes, especially outlining non-backwards compatible changes.  In the
"[DISCUSS] Next Release Name" email thread Kyle Richardson suggested we use
"UPGRADING.md" to do this, and I still agree with that thought, but I'm
open to alternatives.

Separately, a *nice to have* would be *METRON-660*, which was discussed in
the "[PROPOSAL] up-to-date versioned documentation" thread, to give us some
cleaner documentation using the existing READMEs.  I'd be happy to help
with this one, I'm just not sure what the next steps are, aside from the
start that Matt has here
.

My last *nice to have* is *METRON-635*, which I have a *PR open for here
*.  If I could get
someone else to reproduce the error that I'm seeing I would be happy to
pursue additional testing, troubleshooting, etc.  I've seen others report
the same issue

on the HCC boards, so I'm fairly confident that it is not an issue with my
local environment.

Jon

On Thu, Jan 26, 2017 at 6:18 PM Casey Stella  wrote:

I took the liberty of adding the pull requests since the new year into the
in-progress list in the previous email.  If you own one of these and do not
believe that you can complete the review in the next week or so, let me
know and we can remove.  The only one of these that I see as mandatory is
METRON-650 because the reliance on the opensoc github repo might cause
issues with the release being acceptable (see discussion by Mike Miklavcic
with the mentors a couple weeks ago).

On Thu, Jan 26, 2017 at 6:06 PM, Casey Stella  wrote:

> Hello Everyone!
>
> It's been almost 2 months since the last major release and I think it
> might be time to do a minor release of 0.3.1. The purpose of this email is
> multifold:
>
>- to take stock of what we have already committed
>- figure out what (if anything) is missing to make a release that
>we're proud of
>- find volunteers for the missing items
>
> For those who get email alerts on the JIRA changes, it should be no
> surprise that I have gone through and put the committed items in the 0.3.1
> release bucket.  So, let's take stock of what we have.
>
> *What's made it so far into the next release*
>
>
>
>- METRON-283 Migrate Geo Enrichment outside of MySQL (justinleet)
>closes apache/incubator-metron#421
>- METRON-668: Remove the "tickUpdate" profile config and make the
>"init" phase not reset variables closes apache/incubator-metron#420
>- METRON-672: SolrIndexingIntegrationTest fails intermittently closes
>apache/incubator-metron#424
>- METRON-664: Make the index configuration per-writer with
>enabled/disabled closes apache/incubator-metron#419
>- METRON-600: Fix Metron Website (iraghumitra via cestella) closes
>apache/incubator-metron#399
>- METRON-666 Fix javadoc doclint errors closes
>apache/incubator-metron#418
>- METRON-659 Emulate Sensors in Development Environments (nickwallen)
>closes apache/incubator-metron#417
>- METRON-652: Extract indexing config from enrichment config closes
>apache/incubator-metron#415
>- METRON-654 Create RPM Installer for Profiler (nickwallen) closes
>apache/incubator-metron#413
>- METRON-656: Make Stellar 'in' closer to functioning like python
>closes apache/incubator-metron#416
>- METRON-532 Define Profile Period When Calling PROFILE_GET closes
>apache/incubator-metron#414
>- METRON-624: Updated Comparison/Equality Evaluations in Stellar
>closes

Re: Reporting Issues Wiki

2017-01-25 Thread Matt Foley

Actually, it is a little odd that we would ask users to email to issues@.  For 
most projects that is a broad distro list, where Apache Jira sends automatic 
notifications of every change to every ticket in the project’s (METRON) Jira 
project.  It would normally be expected to be essentially read-only, altho 
evidently ezmlm is set up to allow apache.org email addresses (meaning you’re a 
committer or PMC member) to send to it.  Being a subscriber/recipient of it is 
unrelated to the privs for sending to it (unlike users@ or dev@).

James or David, has issues@ for the metron project been set up differently than 
the Apache norm, so it is in fact an appropriate place for non-committer users 
to report problems with Jira?

Thanks,
--Matt


On 1/25/17, 7:24 PM, "zeo...@gmail.com"  wrote:

On the Reporting Issues
 wiki
page it mentions to "Please report issues related to the JIRA/Wiki to
iss...@metron.incubator.apache.org".  I have a personal permission issue
with JIRA that I was attempting to email to that list and I keep getting
the following error message:


Hi. This is the qmail-send program at apache.org.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

:
Must be sent from an @apache.org address or an address in LDAP.

--- Below this line is a copy of the message.
...


Note that I am sending the email to "iss...@metron.incubator.apache.org",
however it is replying with the text "iss...@metron.apache.org".  I have
also made sure to join the mailing list, as shown below:


Hi! This is the ezmlm program. I'm managing the
iss...@metron.incubator.apache.org mailing list.

Acknowledgment: I have added the address

zeo...@gmail.com

to the issues mailing list.

Welcome to iss...@metron.incubator.apache.org!


Am I missing something here?  The actual issue I'm having is not that
urgent, I think making sure the wiki has the appropriate instructions is
more important.

Jon
-- 

Jon

Sent from my mobile device

[DISCUSS] How to do Sliding Windows in Profiler

2017-01-24 Thread Matt Foley

Hi all,

Casey and I had an interesting chat yesterday, during which we agreed that the
example code for Outlier Analysis in
https://github.com/apache/incubator-metron/blob/master/metron-analytics/metron-statistics/README.md
and the revised example code in
https://issues.apache.org/jira/browse/METRON-668 (as of 23 January) both do not
correctly implement the desired Sliding Window model. This email gives the
argument for why, and proposes a couple ways to do it right. Your input and
preferences are requested.

First a couple statements about the STATS object that underlies most
interesting Profile work:

· The STATS object is a t-digest. It summarizes a set of data points,
such as those received during a sampling period, in a way that is nominally
O(1) regardless of the input number of data points, and preserves the info
about the “shape” of the statistical distribution of those points. Not only
info about averages and standard deviations, but also about medians and
percentiles (which, btw, is a very different kind of information), is preserved
and can be calculated correctly to within a given error epsilon. Since it is a
summary, however, time information is lost.

· STATS objects, these digests of sampling periods, are MERGEABLE,
meaning if you have a digest from time(1) to time(2), and another digest from
time(2) to time (3), you can merge them and get a digest that is statistically
equivalent to a digest from time(1) to time(3) continuously.

· They are sort of idempotent, in that if you take two identical
digests and merge them, you get almost the same object. However, the result
object will be scaled as summarizing twice the number of input data points.

· Which is why it DOESN’T work to try to merge overlapping sampling
periods. To give a crude example, if you have a digest from time(1) to time(3)
and another digest from time(2) to time(4), and merge them, the samples from
time(2) to time(3) will be over-represented by a factor of 2x, which should be
expected to skew the distribution (unless the distribution really is constant
for all sub-windows – which would mean we don’t need Sliding Windows because
nothing changes).

The Outlier Analysis profiles linked above try to implement a sliding window,
in which each profile period summarizes the Median Absolute Deviation
distribution of the last five profile periods only. An “Outlier MAD Score” can
then be determined by comparing the deviation of a new data point to the last
MAD distribution recorded in the Profile. This allows for changes over time in
the statistical distribution of inputs, but does not make the measure unduly
sensitive to just the last minute or two. This is a typical use case for
Sliding Windows.

Both example codes trip on how to do the sliding window in the context of a
Profile. At sampling period boundaries, both either compose the “result”
digest or initialize the “next” digest by reading the previous 5 result
digests. That is wrong, because it ignores the fact that those digests aren’t
just for their time periods. They too were composed with THEIR preceding 5
digests, each of which were composed with their preceding 5 digests, which in
turn… etc. The end result is sort of like the way Madieras or some brandies
are aged via the Solera process with fractional blending. You don’t get a true
sliding window, which sharply drops the past, you get a continuous dilution of
the past. In fact, it’s wrong to assume that the “tail” of the far past is more
diluted than the near past! – It would be with some algorithms, but the alg
used in these two examples causes the far past to become an exponentially MORE
important fraction of the overall data than the near past – much worse than
simply turning on digesting at time(0) and leaving it on, with no attempt at
windowing. (Simulate it in a spreadsheet and you’ll see.)

We need a Profiler structure that assists in creating Sliding Window profiles.
The problem is that Profiles let you save only one thing (quantity or object)
per sampling period, and that’s typically a different “thing” (object type or
scale) than you want to use to compose the result for each windowed span. One
way to do it correctly would be with two Profiles, like this:

(SOLUTION A)

{

"profiles": [

{

"profile": "sketchy_mad",

"foreach": "'global'",

"init" : {

"s": "OUTLIER_MAD_INIT()"

"update": {

"s": "OUTLIER_MAD_ADD(s, value)"

"result": "s"

{

"profile": "windowed_mad",

"foreach": "'global'",

"init" : { },

"update": { },

"result": "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad',

'global', 5, 'MINUTES'))"

}

]

}

This is typical. You have a fine-grain sampling period that you want to
“tumble”, and a broader window that you want to “slide” or “roll” along the

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-19 Thread Matt Foley

Thanks, Jon!  I’m working on characterizing exactly how to fix the two main 
issues.  I think I’ve got a script that will auto-fix the triple-backtick 
problem.  The bullet list problem will require hand-editing, so I want to make 
sure I’ve got the right recommendation.

The larger issue is going thru and making the doc content better and more 
usable.
But that can occur over time, and will be motivated by having the book there to 
gripe about :-)

On 1/19/17, 9:05 AM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:

Looking at the screenshot, that would be an incredible improvement on what
we currently have.  I'd be happy to help out with any markdown
modifications and documentation cleanup, if necessary, to fix the problems
you outlined above.

Jon

On Thu, Jan 19, 2017 at 11:22 AM Matt Foley <ma...@apache.org> wrote:

> Sorry, I forgot text-only messages won’t accept attachments.  Please see
>
> 
https://issues.apache.org/jira/secure/attachment/12848335/Metron-book-screenshot.png
>
> Thanks,
> --Matt
>
>
> On 1/19/17, 6:03 AM, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
>
> Not seeing the attachment, is it attached to a jira?
>
>
> On January 19, 2017 at 04:19:02, Matt Foley (ma...@apache.org) wrote:
>
>     Here’s a screen shot, attached :-)
>
> On 1/19/17, 1:04 AM, "Matt Foley" <ma...@apache.org> wrote:
>
> Hi all,
> I’ve put together a prototype doc book, along the lines we discussed,
> and
> it looks pretty worthwhile.
> Many thanks to Mike M. who whipped the pom.xml file into shape, and
> helped
> me find the right site.xml file to imitate.
>
> If you’re interested, please do a single-branch clone as follows:
> git clone --single-branch -b METRON-660
> https://github.com/mattf-horton/incubator-metron.git [clonename]
> (or whatever git command pleases you :-)
>
> In this branch, there’s a new top-level subdirectory named site-book/.
> This
> is not necessarily how we want to integrate stuff, it was just
> convenient
> to do it separate from the existing site/ directory for now. To build
> the
> book, do these three commands:
> cd site-book/
> bin/generate-md.sh #This gathers all the *.md files into
> site-book/src/site/markdown/**, and generates the menu tree into
> site-book/src/site/site.xml
> mvn clean site:site #This builds the book with the maven site plugin
> and
> doxia-markdown plugin
>
> If both those steps are successful, you can then go to a browser and
> open
>
> file:///Users/yourname/yourpath/clonename/site-book/target/site/index.html
> and see the book, with the nav menu on the LHS.
> It’s important to note that a very usable (not perfect) nav hierarchy
> has
> been auto-generated; this is not hardwired nor hand-edited.
> I do plan to add some overrides that allow individual items in the
> menu to
> be tweaked.
>
> While it already looks fairly nice, and clearly illustrates the value
> of
> building a book, there are two glaring issues.
> • Doxia-markdown doesn’t process the triple back-tick (```) the same
> way as
> Github Markdown. It seems to color-code it as , but doesn’t
> preserve
> line breaks, which is really bad.
> • Similarly, it only processes bullet lists in isolation, and it
> doesn’t
> correctly combine bullet lists subordinate to a numbered list.
>
> The upshot is that
> • both code and bullet lists often lose their linebreaks, and get
> mushed
> into run-on paragraphs, usually combined with the preceding paragraph,
> and
> • bullet lists interrupt numbered lists and make them start over at 
#1.
>
> Perhaps 80-90% of these issues can be fixed by editing the markdown
> files
> to put blank lines around the list formats. I started doing this, but 
I
> didn’t want to obscure the proto by editing tons of .md files. As of
> this
> proto, only the half dozen actually broken files (that caused maven
> site
> build errors) have been fixed.
> The other 10-20% will just require simplification of the markdown 
used,
> unless we can get an updated version of the plugins.
    >
> Anyway, please take a look and share your thoughts.
>
> Thanks,
> --Ma

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-19 Thread Matt Foley

Hi all,
I’ve put together a prototype doc book, along the lines we discussed, and it 
looks pretty worthwhile.
Many thanks to Mike M. who whipped the pom.xml file into shape, and helped me 
find the right site.xml file to imitate.

If you’re interested, please do a single-branch clone as follows:
git clone --single-branch -b METRON-660 
https://github.com/mattf-horton/incubator-metron.git  [clonename]
(or whatever git command pleases you :-)

In this branch, there’s a new top-level subdirectory named site-book/.  This is 
not necessarily how we want to integrate stuff, it was just convenient to do it 
separate from the existing site/ directory for now.  To build the book, do 
these three commands:
cd site-book/
bin/generate-md.sh   #This gathers all the *.md files into 
site-book/src/site/markdown/**, and generates the menu tree into 
site-book/src/site/site.xml
mvn clean site:site #This builds the book with the maven site plugin 
and doxia-markdown plugin

If both those steps are successful, you can then go to a browser and open
file:///Users/yourname/yourpath/clonename/site-book/target/site/index.html
and see the book, with the nav menu on the LHS.
It’s important to note that a very usable (not perfect) nav hierarchy has been 
auto-generated; this is not hardwired nor hand-edited.
I do plan to add some overrides that allow individual items in the menu to be 
tweaked.

While it already looks fairly nice, and clearly illustrates the value of 
building a book, there are two glaring issues.
• Doxia-markdown doesn’t process the triple back-tick (```) the same way as 
Github Markdown.  It seems to color-code it as , but doesn’t preserve 
line breaks, which is really bad.
• Similarly, it only processes bullet lists in isolation, and it doesn’t 
correctly combine bullet lists subordinate to a numbered list.

The upshot is that 
• both code and bullet lists often lose their linebreaks, and get mushed into 
run-on paragraphs, usually combined with the preceding paragraph, and
• bullet lists interrupt numbered lists and make them start over at #1.

Perhaps 80-90% of these issues can be fixed by editing the markdown files to 
put blank lines around the list formats.  I started doing this, but I didn’t 
want to obscure the proto by editing tons of .md files.  As of this proto, only 
the half dozen actually broken files (that caused maven site build errors) have 
been fixed.
The other 10-20% will just require simplification of the markdown used, unless 
we can get an updated version of the plugins.

Anyway, please take a look and share your thoughts.

Thanks,
--Matt

On 1/16/17, 1:02 PM, "Michael Miklavcic" <michael.miklav...@gmail.com> wrote:

Hey Matt, feel free to ping me.

On Mon, Jan 16, 2017 at 1:39 PM, Matt Foley <ma...@apache.org> wrote:

> I looked into the Falcon website and doxia over the weekend, and I’m
> convinced that using the doxia-markdown plugin should make it dirt simple
> to do what’s been discussed in this thread, with no overhead on the part 
of
> people writing the README.md files.
>
> I fiddled with trying to do a POC, and unfortunately concluded (again)
> that I don’t really know maven very well :-)
> Are there any maven experts out there who would be willing to give me some
> pointers (offline) on how to make use of this apparently simple maven
> plug-in?
>
> I can do the bit of scripting needed to gather the docs.  I’ve opened
> https://issues.apache.org/jira/browse/METRON-660 with some sub-tasks for
> this work.
> --Matt
>
> On 1/13/17, 12:04 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
>
> +1 on any improvement to documentation and more consistency.  At this
> point, I think getting rid of or hiding some of the pages on the wiki
> (at
> least for the short term) would be better than leaving them around
> because
> there's a lot of misinformation.
>
> Jon
>
> On Fri, Jan 13, 2017 at 10:13 AM Nick Allen <n...@nickallen.org>
> wrote:
>
> > +1 I think it is sorely needed.
> >
> > If we can come up with a really slick solution like Spark, then
> great. I am
> > also not against a half-baked solution that can later evolve into
> something
> > else.  For example, create an index README.md that links together
> all the
> > existing READMEs and run Pandoc on it.  Not ideal, but way better
> than what
> > we have.
> >
> >
> >
> > On Fri, Jan 13, 2017 at 9:53 AM, Otto Fowler <
> ottobackwa...@gmail.com>
> > wrote:
    > >
> > > I t

Re: [PROPOSAL] Metron Community

2017-01-18 Thread Matt Foley

+1



On 1/17/17, 9:27 PM, "James Sirota"  wrote:

Right now we have 2 entries for Metron Community.  One on our Wiki, which I 
think is outdated and should be removed as it no longer offers value.  The 
second is on our website, which is up to date.  So I am proposing we remove the 
wiki entry.



--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [VOTE] Release Process

2017-01-18 Thread Matt Foley

+1 (non-binding)

BTW, here is a collection of small editorial changes.  Since these are 
editorial rather than substantive, most project teams accept that they can be 
made by a responsible PMC member (such as our esteemed chair :-) without 
re-voting or disrupting a vote in progress.  I suggest we let James make these 
changes without changing the vote, altho of course if anyone who already voted 
+1 feels that correcting these issues would invalidate your vote, please say so.

Step 4: 2nd bullet: Remove or change obsolete references to the github release 
tarball.

Step 6:  “compiles” --> “complies”

Step 7:  “threat” --> “thread”

Introduction, section “Initiating a New Metron Release”
This sentence is almost certainly a cut-and-paste error:
“Create the MR branch for the previous Metron release by incrementing 
the second digit of the previous release like so 0.[FR].[MR].”
I’m not entirely sure what it should read, but the most probable correction 
based on the sentence before it would, I think, be (remove the asterisks):
“Create the MR branch for the previous Metron release by incrementing 
the *third* digit of the previous release like so 0.[FR].[*MR++*].”

At the end, section “Creating a Maintenance Release”
We got clarification on the urgent voting issue from Mentors, but steps 2-5 
aren’t the steps that get waived.  The two sentences:
“Second, if a critical JIRA comes in that requires an immediate patch 
we may forego steps 2-5 and immediately cut the MR release.  By this we mean 
that 3 binding +1 votes are still required, but the 72 hour waiting period can 
be waved.”
Should be changed to:
“Second, if a critical JIRA comes in that requires an immediate patch, 
the votes with three binding +1's are still required, but Step 1 (discussion) 
and Step 2 (Jira collecting and tracking), and the 72 hour waiting periods in 
Steps 7 and 8 can be waived.”

Cheers,
--Matt

On 1/17/17, 8:17 PM, "James Sirota"  wrote:

I made the revisions based on the discuss thread

The document is available here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854770

And is also attached for reference to this email. 

Please vote +1, -1, or 0 for neutral.  The vote will last 72 hours. 

Thanks,
James 

-
Metron Release Types
There are two types of Metron releases:
Feature Release (FR) - this is a release that has a significant step 
forward in feature capability and is denoted by an upgrade of the second digit
Maintenance Release (MR) - this is a set of patches and fixes that are 
issued following the FR and is denoted by an upgrade of the third digit
Release Naming Convention
Metron build naming convention is as follows: 0.[FR].[MR].  We keep the 0. 
notation to signify that the project is still under active development and we 
will hold a community vote to go to 1.x at a future time
Initiating a New Metron Release
Immediately upon the release of the previous Metron release create two 
branches: FR ++ and MR.  Create the FR++ branch by incrementing the second 
digit like so 0.[FR++].0.  Create the MR branch for the previous Metron release 
by incrementing the second digit of the previous release like so 0.[FR].[MR].  
All patches to the previous Metron release will be checked in under the MR 
branch and where it makes sense also under the FR branch.  All new features 
will be checked in under the FR branch.
Creating a Feature Release
Step 1 - Initiate a discuss thread
Prior to the release The Release manager should do the following 
(preferably a month before the release):
Make sure that the list of JIRAs slated for the release accurately reflects 
to reflects the pull requests that are currently in master
Construct an email to the Metron dev board 
(dev@metron.incubator.apache.org) which discusses with the community the desire 
to do a release. This email should contain the following:
The list of JIRAs slated for the release with descriptions (use the output 
of git log and remove all the JIRAs from the last release’s changelog)
A solicitation of JIRAs that should be included with the next release. 
Users should rate them as must/need/good to have as well as volunteering.
A release email template is provided here.
Step 2 - Monitor and Verify JIRAs
Once the community votes for additional JIRAs they want included in the 
release verify that the pull requests are in before the release, close these 
JIRAs and tag them with the release name. All pull requests and JIRAs that were 
not slated for this release will go into the next releases.  The release 
manager should continue to monitor the JIRA to ensure that the timetable is on 
track until the release date.  On the release date the release manager should 
message the Metron dev board (dev@metron.incubator.apache.org) announcing the 
code freeze for the release.

Re: [DISCUSS] Release Process

2017-01-17 Thread Matt Foley

Sure, sounds fine to me.

On 1/17/17, 1:03 PM, "Casey Stella" <ceste...@gmail.com> wrote:

We haven't actually bitten off the "publishing maven artifacts" just yet,
so I can't say that I have a good idea in my head what the detailed steps
are going to be.  If we think that it's a good idea, I can release them and
figure out the steps during our next release and then vote on a
modification to this doc afterwards.  Thoughts?

Casey

On Tue, Jan 17, 2017 at 3:43 PM, Matt Foley <ma...@apache.org> wrote:

> Casey and James,
> Do we also want to include in the Release Process that we publish Maven
> artifacts?  The (detailed) procedures for Apache conformance
> are in http://www.apache.org/dev/publishing-maven-artifacts.html
>
> This probably wants to be integrated with our build tools.
>
> This is optional, so we could leave it for later.
>
> Thanks,
> --Matt
>
>
>
> On 1/17/17, 12:33 PM, "Casey Stella" <ceste...@gmail.com> wrote:
>
> Larry,
>
> Thanks for the info.  In that case, then the following passage:
>
> > Now, we must grab the release candidate binary from the github
> releases
> > page (https://github.com/apache/incubator-metron/releases). In our
> case,
> > for RC1, that would be
> > https://github.com/apache/incubator-metron/archive/
> apache-metron-0.[FR++].0-rc1-incubating.tar.gz We
> > will refer to this as the release candidate tarball.
>
>
> Should be replaced with:
>
> > Now we must create the release candidate tarball.  From the apache
> repo,
> > you should run:
> > git archive --prefix=apache-metron-0.[FR++].0-rc1-incubating/
> > apache-metron-0.[FR++].0-rc1-incubating | gzip >
> > apache-metron-0.[FR++].0-rc-incubating.tar.gz  We will refer to
> this as the
> > release candidate tarball.
>
>
> On Tue, Jan 17, 2017 at 3:20 PM, larry mccay <lmc...@apache.org>
> wrote:
>
> > It is technically a violation of apache release policy to build
> releases in
> > such a way [1]:
> >
> > MUST RELEASES BE BUILT ON HARDWARE OWNED AND CONTROLLED BY THE
> COMMITTER?
> > <http://www.apache.org/dev/release.html#owned-controlled-hardware>
> >
> > Strictly speaking, releases must be verified
> > <https://svn.apache.org/repos/private/committers/tools/
> > releases/compare_dirs.pl>
> > on
> > hardware owned and controlled by the committer. That means hardware
> the
> > committer has physical possession and control of and exclusively 
full
> > administrative/superuser access to. That's because only such
> hardware is
> > qualified to hold a PGP private key, and the release should be
> verified on
> > the machine the private key lives on or on a machine as trusted as
> that.
> >
> > Practically speaking, when a release consists of anything beyond an
> archive
> > (e.g., tarball or zip file) of a source control tag, the only
> practical way
> > to validate that archive is to build it locally; manually inspecting
> > generated files (especially binary files) is not feasible. So,
> basically,
> > "Yes".
> >
> > *Note: This answer refers to the process used to produce a release
> artifact
> > from a source control tag. It does not refer to testing that
> artifact for
> > technical quality.*
> >
> >
> > Knox is still using the archive from a jenkins build and is also out
> of
> > compliance.
> >
> > We will need to eventually change this approach as well.
> >
> > The threat is that someone could compromise such a remote system in
> a way
> > that adds additional classes or alters the code in someway that the
> project
> > will then be propagating this compromised binary under the Apache
> brand.
> >
> >
> > 1. http://www.apache.org/dev/release.html#owned-controlled-hardware
> >
> > On Tue, Jan 17, 2017 at 2:43 PM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> > > Hey Matt,
> &g

Re: [PROPOSAL] Reduce Reliance on Ansible for Deployment

2017-01-17 Thread Matt Foley

+1, great you’re addressing this!

On 1/17/17, 7:34 AM, "David Lyle"  wrote:

In our "Dev Guide and Committer Review Guide additions" discussion, we had
a bit of a side discussion about reducing reliance (perhaps to zero) on
Ansible for our installation.

It seemed there was consensus around that idea (if not, please let me
know), so I propose the following steps to get there:

1) Refactor existing Ansible deployment to use the Ambari MPack to install
metron-common, metron-enrichments and metron-parsers.
2) Regenerate quick-dev to leverage the change.
3) Create rpm packages for all deployed components that don't currently
have them.
 - Sensor probes
 - Sensor stubs
4) Create MPack service defs for the RPMs in (2).
5) Refactor existing Ansible deployment to use the Ambari MPack to install
all services.
6) Regenerate quick-dev to leverage the change.
7) Plan iteration 2 to see if there are other opportunities to reduce our
use of Ansible.

One note: if we decide to go this direction, it'd be helpful if, during the
transition, we stopped adding additional Ansible deployment code.

Thoughts?

Thanks,

-David...

Re: [DISCUSS] Moving GeoIP management away from MySQL

2017-01-16 Thread Matt Foley

Sounds good!  And use a versioning scheme via subdirectories in HDFS, so you 
can revert back if you want.

On 1/16/17, 4:11 PM, "Casey Stella" <ceste...@gmail.com> wrote:

I'd recommend storing the MM data location in HDFS in the global config.
When the config property changes, then you know you need to reread the
database from HDFS.  This would keep you from re-reading frequently.

On Mon, Jan 16, 2017 at 18:45 Matt Foley <ma...@apache.org> wrote:

> I agree too.  I confirmed the GeoIP2 Java API is ASF2.0 licensed, as you
> all no doubt knew already.
>
> Just a couple comments and a question:
>
> First note that storing data in HDFS, while it avoids the deployment
> question, also induces a network hop to read it.
> Presumably that only happens once per update per geo bolt instance, but
> how do you avoid re-reading it frequently, to make sure you see updates?
>
> Second, I just want to comment that there is not a single point of failure
> for an enterprise db that has been properly set up for HA.  Granted that’s
> neither here nor there if we don’t need a db, but it isn’t a valid 
argument
> against using a db. :-)
>
> Thanks,
> --Matt
>
> On 1/16/17, 1:36 PM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> wrote:
>
> I'm also in agreement on this.
>
> On Mon, Jan 16, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org>
> wrote:
>
> > +1 to using the Java API with the MMDB file provided by Maxmind.
> This is
> > what I had thought we were doing when we discussed this a few months
> back.
> > I'd rather use the Maxmind tools as-provided instead of engineering
> > something on top of it.
> >
> > On Mon, Jan 16, 2017 at 3:59 PM, JJ Meyer <jjmey...@gmail.com>
> wrote:
> >
> > > Matt, I agree with your points on why we shouldn't just get rid of
> the
> > > database just to get rid of a database. But IMO, I think we may be
> > > reinventing the wheel a little bit by even putting the maxmind
> data into
> > > MySQL. Right now we are already downloading a maxmind file. To me
> it
> > seems
> > > simpler to push the file to HDFS where we can pick it up and have
> the
> > > maxmind client use that instead of importing data into a DB and
> then
> > > running a query. Also, I believe the data gets updated weekly. So
> syncing
> > > may become easier too.
> > >
> > > James, I believe it works with the paid and free versions of
> geoip. I
> > know
> > > NiFi uses this client library in their Geo enrichment processor.
> > >
> > > Also, if it is decided that using a SQL database is still the best
> > > solution, I think there is a benefit to using their library. We
> would
> > just
> > > have to implement a `DatabaseProvider` that hits some SQL db
> instead of
> > > using their standard implementation.
> > >
> > > Thanks,
> > > JJ
> > >
> > > On Mon, Jan 16, 2017 at 2:27 PM, James Sirota <jsir...@apache.org>
> > wrote:
> > >
> > > > Hi Guys, I just wanted to clarify one point that I think is lost
> in
> > this
> > > > tread.  Geo enrichment is NOT a key-value enrichment.  It
> requires a
> > > range
> > > > scan and a join (which is why it's implemented via mySql and not
> > Hbase).
> > > > To account for this access pattern via a key-value store you
> would
> > > > inevitably have to do something funky or in case of Hbase I
> don't think
> > > > there is a way to avoid doing a range scan.
> > > >
> > > > With respect to mapdb it only has support for Maps, Sets, Lists,
> > Queues.
> > > > Are we sure it provides enough functionality for us to do this
> > > enrichment?
> > > >
> > > > With respect to the Maxmind client, are we sure we can use it on
> the
> > > > mySql-backed version of their DB?  I thought the Maxmind 
database
> > itself
> > > is
>

Re: [DISCUSS] Moving GeoIP management away from MySQL

2017-01-16 Thread Matt Foley

I agree too.  I confirmed the GeoIP2 Java API is ASF2.0 licensed, as you all no 
doubt knew already.

Just a couple comments and a question:

First note that storing data in HDFS, while it avoids the deployment question, 
also induces a network hop to read it.
Presumably that only happens once per update per geo bolt instance, but how do 
you avoid re-reading it frequently, to make sure you see updates?

Second, I just want to comment that there is not a single point of failure for 
an enterprise db that has been properly set up for HA.  Granted that’s neither 
here nor there if we don’t need a db, but it isn’t a valid argument against 
using a db. :-)

Thanks,
--Matt

On 1/16/17, 1:36 PM, "Michael Miklavcic" <michael.miklav...@gmail.com> wrote:

I'm also in agreement on this.

On Mon, Jan 16, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org> wrote:

> +1 to using the Java API with the MMDB file provided by Maxmind.  This is
> what I had thought we were doing when we discussed this a few months back.
> I'd rather use the Maxmind tools as-provided instead of engineering
> something on top of it.
>
> On Mon, Jan 16, 2017 at 3:59 PM, JJ Meyer <jjmey...@gmail.com> wrote:
>
> > Matt, I agree with your points on why we shouldn't just get rid of the
> > database just to get rid of a database. But IMO, I think we may be
> > reinventing the wheel a little bit by even putting the maxmind data into
> > MySQL. Right now we are already downloading a maxmind file. To me it
> seems
> > simpler to push the file to HDFS where we can pick it up and have the
> > maxmind client use that instead of importing data into a DB and then
> > running a query. Also, I believe the data gets updated weekly. So 
syncing
> > may become easier too.
> >
> > James, I believe it works with the paid and free versions of geoip. I
> know
> > NiFi uses this client library in their Geo enrichment processor.
> >
> > Also, if it is decided that using a SQL database is still the best
> > solution, I think there is a benefit to using their library. We would
> just
> > have to implement a `DatabaseProvider` that hits some SQL db instead of
> > using their standard implementation.
> >
> > Thanks,
> > JJ
> >
> > On Mon, Jan 16, 2017 at 2:27 PM, James Sirota <jsir...@apache.org>
> wrote:
> >
> > > Hi Guys, I just wanted to clarify one point that I think is lost in
> this
> > > tread.  Geo enrichment is NOT a key-value enrichment.  It requires a
> > range
> > > scan and a join (which is why it's implemented via mySql and not
> Hbase).
> > > To account for this access pattern via a key-value store you would
> > > inevitably have to do something funky or in case of Hbase I don't 
think
> > > there is a way to avoid doing a range scan.
> > >
> > > With respect to mapdb it only has support for Maps, Sets, Lists,
> Queues.
> > > Are we sure it provides enough functionality for us to do this
> > enrichment?
> > >
> > > With respect to the Maxmind client, are we sure we can use it on the
> > > mySql-backed version of their DB?  I thought the Maxmind database
> itself
> > is
> > > proprietary and is something you have to pay for.  My understanding is
> > that
> > > the client is designed for that proprietary version.
> > >
> > > I somewhat agree with Matt's point.  If mySql is a problem because of
> > > licensing, the path of least resistance to remove mySql dependencies
> > would
> > > be to simply switch to postgresql.  We will always have conventional
> sql
> > > databases in our stack because other big data tools use them. Why not
> > take
> > > advantage of them too?
> > >
> > > Thanks,
> > > James
> > >
> > > 16.01.2017, 12:27, "Matt Foley" <ma...@apache.org>:
> > > > Hi Justin, and team,
> > > > Several components of the Hadoop Stack utilize a SQL database,
> usually
> > > for metadata of some sort. Ambari knows this and arranges for them to
> > share
> > > a single database installation (on or off the cluster), unless they
> > > explicitly configure use of different databases (which is allowed for
> > sites
> > > that desire it). Ambari defaults to using Postgr

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-16 Thread Matt Foley

I agree, we should get all configs into ZK, and managed by Ambari.

I think, when managed by Ambari, a change to configs would always start with 
Ambari, not Zookeeper:
a) interact with Ambari (either GUI or API) to mutate the configs (starting 
with current, or with a historical version from Ambari CM)
b) when finalized in Ambari, Ambari shall push to Zookeeper, and store the new 
HEAD in Ambari CM.
With this mechanism in place, there would be no need to “snapshot a config out 
of Zookeeper and push it into CM” – it would already be in CM.

The interesting thing not mentioned yet, is that we have to institute security 
in ZK, so everyone with access to zkcli can’t change the ZK configs directly.
This is actually a gap that needs to be addressed regardless of our config 
management approach.

--Matt

On 1/16/17, 1:19 PM, "Casey Stella" <ceste...@gmail.com> wrote:

I presumed that the solution would involve passing kerberos authentication
tickets to the API calls.  This kind of global context (the
authorization/authentication bit) is what the Stellar Context was built
for, in part.  Regardless, I think authorized config updates with reasons
stored in Ambari is a good thing (TM) and should be the goal.

Casey

On Mon, Jan 16, 2017 at 4:13 PM, David Lyle <dlyle65...@gmail.com> wrote:

> Totally agree with step 1, step 2.
>
> That said, we'll have to figure out some method of passing authorization
> credentials with configuration change requests in order to integrate with
> Ambari. It will not allow interaction without those (and rightly so, 
imho).
> This doesn't affect all Stellar functions, only those that mutate configs.
> Ambari will require username/password pairs but I think, technology
> notwithstanding, it's a gotta have. I think it's a bad practise to expose 
a
> open endpoint to mutate configs anyway and I would want the ability to
> audit configuration changes.
>
> I don't want to drill too deeply in the implementation, but there are
> solutions- one easy one would be require authentication to the REPL for
> functions that mutate the live system.
>
> So, perhaps it's a 3 step transition with one step being noodle out how
> we're going to do access control and audit for configuration?
>
> -D...
>
>
> On Mon, Jan 16, 2017 at 3:40 PM, James Sirota <jsir...@apache.org> wrote:
>
> > In my view the live configs should live in Zookeeper.  It's basically
> what
> > it's designed for.  However, we also have a need for CM of these configs
> in
> > case you want to roll back or push a different config set into 
Zookeeper.
> > That's what I would use Ambari for...have the ability to take a config
> out
> > of CM and push it into Zookeeper...or snapshot a config out of Zookeeper
> > and push it into CM.  The obvious pre-requisite to having this 
capability
> > is to not rely on local storage or HDFS for any config.  So in my mind
> this
> > is a 2-step transition.  Step 1 - transition all current configs into
> > Zookeeper.  Step 2 - integrate config management with Ambari.
> >
> > I think passing usernames/passwords to stellar functions is not a
> feasible
> > solution at this point
> >
> > Thanks,
> > James
> >
> > 15.01.2017, 18:28, "JJ Meyer" <jjmey...@gmail.com>:
> > > Quite late to the party, but with all this great back and forth I felt
> > like
> > > I had to join in :)
> > >
> > > I believe SolrCloud uses ZooKeeper to manage most of its configuration
> > > files. When searching, I was only able to find this (
> > > https://cwiki.apache.org/confluence/display/solr/Using+
> > ZooKeeper+to+Manage+Configuration+Files).
> > > I wasn't able to find any initial discussion on their architecture. If
> we
> > > can find more we still may be able to learn from them.
> > >
> > > Also, on the idea of passing a username/password to a Stellar function
> or
> > > to some shell script. We may want to do it a bit differently or at
> least
> > > give the option to do it differently. I know supplying the
> > > username/password directly is easy when testing and playing around, 
but
> > it
> > > probably isn't going to be allowed for a user in production. Maybe we
> can
> > > also support a credentials file and eventually support encrypting
> > sensitive
> > > values in configs?
> > >
>

Re: [DISCUSS] Release Process

2017-01-16 Thread Matt Foley

nsing
>>>   Make sure the release compiles with the following Apache licensing
>>>   guidelines: http://www.apache.org/foundation/license-faq.html
>>>   Step 8 - Generate the changes file
>>>   Go through the JIRA to generate the changes file, which contains a
>>>   list of all JIRAs included in the upcoming release. An example of a
>>>   changes file can be found here: https://dist.apache.org/repos/
>>>   dist/dev/incubator/metron/0.3.0-RC1-incubating/CHANGES
>>>   Step 9 - Tag the RC release
>>>   Tag the release for the RC in case we need to roll back at some
>>>   point. An example of a valid tag can be seen here:
>>>   https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
>>>   git;a=shortlog;h=refs/tags/apache-metron-0.3.0-rc1-incubating
>>>   Step 10 - Stage the release
>>>   The next thing to do is to sign and stage the release including 
the
>>>   DISCLAIMER, KEYS, and LICENSE files. A properly signed and staged 
release
>>>   can be found here:
>>>   https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
>>>   0-RC1-incubating/
>>>   * Make sure you have your correct profile and keys uploaded to
>>>   https://id.apache.org/ to properly sign the release and to get access 
to
>>>   dist.apache.org
>>>   Step 11 - Call for a community release vote
>>>   Next initiate a [VOTE] threat on the dev list to announce the 
build
>>>   vote. The vote email template can be found here: Build Vote Template.
>>>   Allow at least 72 hours for the community to vote on the release. 
When you
>>>   get enough votes close the vote by replying [RESULT][VOTE] to the 
email
>>>   thread with the tally of all the votes
>>>   Step 12 - Call for a incubator release vote
>>>   Upon successful completion of step 11, repeat, but now send the 
email
>>>   to the incubator general boards. The email should be identical. Again,
>>>   wait for at least 72 hours and then close the vote.
>>>   Step 13 - Stage the finished release
>>>   If the vote fails at any stage then incorporate feedback, create
>>>   another RC, and repeat. If both votes pass then stage the resulting
>>>   artifacts here: https://dist.apache.org/repos/
>>>   dist/release/incubator/metron/
>>>   Step 14 - Announce build
>>>   Send a discuss thread to the Metron dev boards announcing the new
>>>   Metron build
>>>   Creating a Maintenance Release
>>>   Creation of the Maintenance Release should follow exactly the 
same set
>>>   of steps as creating the Feature Release as outlined above, but with 
two
>>>   exception. First, the version incremented on the maintenance release
>>>   should be the MR++ so that the release is named 0.[FR].[MR++]. 
Second, if
>>>   a critical JIRA comes in that requires an immediate patch we may 
forego
>>>   steps 2-5 and immediately cut the MR release. A critical JIRA is 
something
>>>   that is either a security vulnerability or a functional show stopper .
>>>   Ensuring Consistency between Feature and Maintenance releases
>>>   Being able to maintain the previous release train, with only 
critical
>>>   or important bug fixes and security fixes (generally not new 
features) for
>>>   users who are averse to frequent large changes is very important for
>>>   production use. They get stability, while the feature code proceeds as
>>>   fast as the community wishes. It is important to assure that all 
commits
>>>   to the maintenance release also get made in the feature branch (if
>>>   relevant), to avoid the appearance of regressions in the maintenance
>>>   branch. The formal process for assuring this is as follows:
>>>   Every maintenance release JIRA should have a corresponding feature
>>>   JIRA to make sure that the patch is applied consistently to both 
branches.
>>>   The maintenance JIRA should be cloned and appropriate fix version for 
the
>>>   feature release should be applied. If the fix is not relevant to the
>>>   feature or maintenance branch then the submitter must explicitly state
>>>   this. In general reviewers should refuse a patch PR unless both 
feature
>>>   and maintenance JIRAs have been created.

Re: [DISCUSS] Dev Guide and Committer Review Guide additions?

2017-01-16 Thread Matt Foley

+1 to both.


On 1/16/17, 12:02 PM, "James Sirota" <jsir...@apache.org> wrote:

Going back to the original intent of this thread.  Do we (a) want to make 
any concrete modifications to our Dev Guide to account for some of the 
suggestions that Otto is making? and (b) do we want a Reviewer's Guide, which 
is a document that focuses on the review process specifically.

Thanks,
James 

16.01.2017, 06:58, "David Lyle" <dlyle65...@gmail.com>:
> Speaking on dropping (or at the very least, reducing our reliance on)
> Ansible, I'm a HUGE +1 on that. @MIke - I think you propose a reasonable
> approach. I was working a branch a little bit ago that does something very
> similar, if that's something we think is valuable, I'd be happy to
> resurrect it. I think (hope) we all agree that we're far too reliant on
> Ansible and our current usage of it is a bit outside of it's design
> mission. As a result, installation is very brittle wrt versions and target
> OSes.
>
> Nothing much to add on the other 2 points outside of agreement.
>
> -D...
>
> On Thu, Jan 12, 2017 at 7:08 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
>>  "Also, what would people think of dropping Ansible in favor of Ambari 
and
>>  Docker as the preferred deployment management approaches?"
>>
>>  Agreed about publishing via Ambari. I'm not sure about fully replacing
>>  Vagrant just yet, but we could move that direction. Docker would allow 
us
>>  to more easily test a realistic multi-node setup on a single machine. In
>>  the meantime, maybe a quick win could be to use Ansible to deploy and
>>  install the MPack to the quickdev environment? This way we're leveraging
>>  the rpm's as well as the MPack code and installing in nearly the same
>>  manner as most users.
>>
>>  On Thu, Jan 12, 2017 at 3:49 PM, Matt Foley <ma...@apache.org> wrote:
>>
>>  > I think I hear 3 major areas not adequately covered by our usual “code
>>  > review”:
>>  > 1. Documentation
>>  > 2. Deployment Builds
>>  > 3. Management of config parameters
>>  >
>>  > The other areas mentioned by Otto (testing, perf test, Stellar impact,
>>  and
>>  > REST api impact), are entirely valid, but fall under existing code and
>>  > architecture that seems generally adequate.
>>  >
>>  > Regarding #1, Documentation, I’d like to branch a discussion thread 
for a
>>  > proposal I’m about to make, to enhance our use of README files as 
usable
>>  > and up-to-date end-user documentation, linked from the Metron site.
>>  > Implicit in that is the idea that we’d deprecate using the cwiki for
>>  > anything but long-lived demonstrations/tutorials that are unlikely to 
go
>>  > obsolete.
>>  >
>>  > For #2, Deployment Builds: This is difficult, and unfortunately I’m 
not
>>  > an expert with these things, but we need to automate this as much as
>>  > possible. Config params will always interact heavily with deployment
>>  > issues, but let’s leave that for #3 :0)
>>  >
>>  > As far as RPMs, Ansible playbooks, or Docker images go, we’d like to
>>  > automate so that developers never have to do anything when they are
>>  > committing modifications of existing components, and even when new
>>  > components are added (like the Profiler is being added now), it should
>>  > insofar as possible be automated via maven declarations. But that 
takes
>>  > input from the experts in each of the areas.
>>  >
>>  > Also, what would people think of dropping Ansible in favor of Ambari 
and
>>  > Docker as the preferred deployment management approaches?
>>  >
>>  > #3, Management of config parameters: I’ve been thinking about this
>>  > lately, but haven’t written up a proposal yet. I’m bothered by the 
wide
>>  > ranging variability in the way Metron configs are managed: files,
>>  > zookeeper, environment variables, traditional Hadoop-style configs, 
and
>>  > roll-your-own json configs, sometimes shared, sometimes duplicated, 
not
>>  to
>>  > mention Ambari over it all. This has been encouraged by the huge 
number
>>  of
>>  > Stack components that Metron depends on, and the relative 
independence of
>>  > the components Metron itself

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-16 Thread Matt Foley

I looked into the Falcon website and doxia over the weekend, and I’m convinced 
that using the doxia-markdown plugin should make it dirt simple to do what’s 
been discussed in this thread, with no overhead on the part of people writing 
the README.md files.

I fiddled with trying to do a POC, and unfortunately concluded (again) that I 
don’t really know maven very well :-)
Are there any maven experts out there who would be willing to give me some 
pointers (offline) on how to make use of this apparently simple maven plug-in?

I can do the bit of scripting needed to gather the docs.  I’ve opened 
https://issues.apache.org/jira/browse/METRON-660 with some sub-tasks for this 
work.
--Matt

On 1/13/17, 12:04 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:

+1 on any improvement to documentation and more consistency.  At this
point, I think getting rid of or hiding some of the pages on the wiki (at
least for the short term) would be better than leaving them around because
there's a lot of misinformation.

Jon

On Fri, Jan 13, 2017 at 10:13 AM Nick Allen <n...@nickallen.org> wrote:

> +1 I think it is sorely needed.
>
> If we can come up with a really slick solution like Spark, then great. I 
am
> also not against a half-baked solution that can later evolve into 
something
> else.  For example, create an index README.md that links together all the
> existing READMEs and run Pandoc on it.  Not ideal, but way better than 
what
> we have.
>
>
>
> On Fri, Jan 13, 2017 at 9:53 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > I think something that does what you have laid out here, no matter the
> > implementation details would be ideal
> >
> >
> > On January 12, 2017 at 18:05:24, Matt Foley (ma...@apache.org) wrote:
> >
> > We currently have three forms of documentation, with the following
> > advantages and disadvantages:
> >
> > || Docs || Pro || Con ||
> > | CWiki |
> > Easy to edit, no special tools required, don't have to be a developer to
> > contribute, google and wiki search |
> > Not versioned, no review process, distant from the code, obsolete 
content
> > tends to accumulate |
> > | Site |
> > Versioned and reviewed, only committers can edit, google search |
> > Yet another arcane toolset must be learned, only web programmers feel
> > comfortable contributing, "asf-site" branch not related to code 
versions,
> > distant from the code, tends to go obsolete due to non-maintenance |
> > | README.md |
> > Versioned and reviewed, only committers can edit, tied to code versions,
> > highly local to the code being documented |
> > Non-developers don't know about them, may be scared by github, poor
> scoring
> > in google search, no high-level presentation |
> >
> > Various discussion threads indicate the developer community likes
> > README-based docs, and it's easy to see why from the above. I propose
> this
> > extension to the README-based documentation, to address their
> > disadvantages:
> >
> > 1. Produce a script that gathers the README.md files from all code
> > subdirectories into a hierarchical list. The script would have an
> exclusion
> > list for non-user-content, which at this point would consist of [site/*,
> > build_utils/*]. The hierarchy would be sorted depth-first. The resulting
> > hierarchical list at this time (with six added README files to complete
> the
> > hierarchy) would be:
> >
> > ./README.md
> > ./metron-analytics/README.md <== (need file here)
> > ./metron-analytics/metron-maas-service/README.md
> > ./metron-analytics/metron-profiler/README.md
> > ./metron-analytics/metron-profiler-client/README.md
> > ./metron-analytics/metron-statistics/README.md
> > ./metron-deployment/README.md
> > ./metron-deployment/amazon-ec2/README.md
> > ./metron-deployment/packaging/README.md <== (need file here)
> > ./metron-deployment/packaging/ambari/README.md <== (need file here)
> > ./metron-deployment/packaging/docker/ansible-docker/README.md
> > ./metron-deployment/packaging/docker/rpm-docker/README.md
> > ./metron-deployment/packer-build/README.md
> > ./metron-deployment/roles/ <== (need file here)
> > ./metron-deployment/roles/kibana/README.md
> > ./metron-deployment/roles/monit/README.md
> > ./metron-deployment/roles/opentaxii/README.md

Re: [DISCUSS] Moving GeoIP management away from MySQL

2017-01-16 Thread Matt Foley

Hi Justin, and team,
Several components of the Hadoop Stack utilize a SQL database, usually for 
metadata of some sort.  Ambari knows this and arranges for them to share a 
single database installation (on or off the cluster), unless they explicitly 
configure use of different databases (which is allowed for sites that desire 
it).  Ambari defaults to using PostgreSQL, altho it’s happy to use MySQL, 
Oracle, or Microsoft, along with whatever each component historically defined 
as their default (such as Derby).

If we want to start with a replacement of current functionality, I would 
suggest switching the default database to PostgreSQL.  Replacing fast, 
efficient, and proven db services with a file-based api library (but no 
standard way to propagate the underlying storage files) seems to me to be 
taking a step backwards.

Sticking with a SQL-based service will surely minimize the amount of code 
changes needed.  And making the SQL either dialect-independent or capable of 
switching among dialects, then enables us to do what the rest of the Hadoop 
stack does:  allow enterprise customers to substitute Oracle or Microsoft 
enterprise-class databases where they wish.  Regarding the drivers, we should 
study what the other Stack components do; I’m not an expert in those areas.

Using the same db as the rest of the stack also means administrators can be 
confident they’ve set up adequate backup and recovery processes.
All these are valuable reasons not to roll our own storage system for this 
enrichment data.  IMO, of course.

Cheers,
--Matt

On 1/16/17, 9:52 AM, "Kyle Richardson"  wrote:

+1 Agree with David's order

-Kyle

On Mon, Jan 16, 2017 at 12:41 PM, David Lyle  wrote:

> Def agree on the parity point.
>
> I'm a little worried about Supervisor relocations for non-HBase solutions,
> but having much of the work done for us by MaxMind changes my preference 
to
> (in order)
>
> 1) MM API
> 2) HBase Enrichment
> 3) MapDB should the others prove not feasible
>
>
> -D...
>
>
> On Mon, Jan 16, 2017 at 12:15 PM, Justin Leet 
> wrote:
>
> > I definitely agree on checking out the MaxMind API.  I'll take a look at
> > it, but at first glance it looks like it does include everything we use.
> > Great find, JJ.
> >
> > More details on various people's points:
> >
> > As a note to anyone hopping in, Simon's point on the range lookup vs a
> key
> > lookup is why it becomes a Scan in HBase vs a Get.  As an addendum to
> what
> > Simon mentioned, denormalizing is easy enough and turns it into an easy
> > range lookup.
> >
> > To David's point, the MapDB does require a network hop, but it's once 
per
> > refresh of the data (Got a relevant callback? Grab new data, load it,
> swap
> > out) instead of (up to) once per message.  I would expect the same to be
> > true of the MaxMind db files.
> >
> > I'd also argue MapDB not really more complex than refreshing the HBase
> > table, because we potentially have to start worrying about things like
> > hashing and/or indices and even just general data represtation. It's
> > definitely correct that the file processing has to occur on either path,
> so
> > it really boils down to handling the callback and reloading the file vs
> > handling some of the standard HBasey things.  I don't think either is an
> > enormous amount of work (and both are almost certainly more work than
> > MaxMind's API)
> >
> > Regarding extensibility, I'd argue for parity with what we have first,
> then
> > build what we need from there.  Does anybody have any disagreement with
> > that approach for right now?
> >
> > Justin
> >
> > On Mon, Jan 16, 2017 at 12:04 PM, David Lyle 
> wrote:
> >
> > > It is interesting- save us a ton of effort, and has the right license.
> I
> > > think it's worth at least checking out.
> > >
> > > -D...
> > >
> > >
> > > On Mon, Jan 16, 2017 at 12:00 PM, Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > > > I like that approach even more. That way we would only have to worry
> > > about
> > > > distributing the database file in binary format to all the 
supervisor
> > > nodes
> > > > on update.
> > > >
> > > > It would also make it easier for people to switch to the enterprise
> DB
> > > > potentially if they had the license.
> > > >
> > > > One slight issue with this might be for people who wanted to extend
> the
> > > > database. For example, organisations may want to add geo-enrichment
> to
> > > > their own private network addresses based modified versions of the
> geo
> > > > database. Currently we

Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-13 Thread Matt Foley

Perhaps it would be more appropriate to put it under 
https://dist.apache.org/repos/dist/release/incubator/metron/ , perhaps as 
https://dist.apache.org/repos/dist/release/incubator/metron/mvn-repo ?

We should not host anything with a license that isn’t compatible with inclusion 
in an Apache project.  If we post only non-source artifacts, then that would 
include packages with “Category B List” licenses (that is, ‘"WEAK COPYLEFT" 
LICENSES’) as well as “Category A List” licenses (those “SIMILAR IN TERMS TO 
THE APACHE LICENSE 2.0”) -- per  https://www.apache.org/legal/resolved .  For 
versioning, we could simply structure as a maven repo, and in fact that’s what 
I think we should do.

Hosting the source code is not, I think, something we are supposed to do for 
non-Apache projects: https://www.apache.org/legal/resolved again, this time the 
very first question:

CAN ASF PMCS HOST PROJECTS THAT ARE NOT UNDER THE APACHE LICENSE?
No. See the Apache Software Foundation licenses page for more details, and 
the Apache Software Foundation page for additional background.

On 1/13/17, 8:11 AM, "Billie Rinaldi"  wrote:

No, we can't host artifacts in a git repo, or on a website. It would be
like distributing a release that hasn't been voted upon.

Regarding message threading, in Gmail adding a [tag] to the subject does
not create a new thread. So the change is not visible in my mailbox unless
the rest of the subject is changed as well.

On Mon, Jan 9, 2017 at 1:00 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> This is a question primarily for the mentors.
>
> *Background*
> metron-common is currently depending on the openSOC github repo for 
hosting
> kraken artifacts. The original reason for this was that these jars are not
> hosted in Maven Central, and they were not reliably available in the 
Kraken
> repo. https://issues.apache.org/jira/browse/METRON-650 is tracking work
> around copying these artifacts to the Metron repo.
>
> Kraken source on openSOC - https://github.com/OpenSOC/kraken
> Krake maven repo on openSOC -
> https://github.com/OpenSOC/kraken/tree/mvn-repo
>
> *Ask*
> Create a new branch in incubator-metron to host any necessary maven
> artifacts. This branch would simply be incubator-metron/mvn-repo. This is
> similar to how we've hosted the asf-site.
>
> *Concerns/Questions*
>
>1. Can we host these jars/artifacts in this manner?
>2. Concerns regarding licensing?
>3. Do we need to also grab and host the source code?
>

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-13 Thread Matt Foley

 > (imo)
> > > > > > > be
> > > > > > > > to
> > > > > > > > > > > always know who and why and make sure that Ambari is
> > aware
> > > > and
> > > > > is
> > > > > > > the
> > > > > > > > > > > static backing store for Zookeeper.
> > > > > > > > > > >
> > > > > > > > > > > -D...
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 13, 2017 at 9:19 AM, Casey Stella <
> > > > > > ceste...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > So, basically, your proposed changes, broken into
> > > tangible
> > > > > > > gobbets
> > > > > > > > of
> > > > > > > > > > > work:
> > > > > > > > > > > >
> > > > > > > > > > > >- Expand ambari to manage the remaining
> > > sensor-specific
> > > > > > > configs
> > > > > > > > > > > >- Refactor the push calls to zookeeper (in
> > > > > > > ConfigurationUtils, I
> > > > > > > > > > > think)
> > > > > > > > > > > >    to push to ambari and take a reason
> > > > > > > > > > > >   - Question remains about whether ambari can do
> > the
> > > > push
> > > > > > to
> > > > > > > > > > > zookeeper
> > > > > > > > > > > >   or whether ConfigurationUtils has to push to
> > > > zookeeper
> > > > > as
> > > > > > > > well
> > > > > > > > > as
> > > > > > > > > > > > update
> > > > > > > > > > > >   ambari.
> > > > > > > > > > > >- Refactor the middleware that Ryan submitted to
> > have
> > > > the
> > > > > > API
> > > > > > > > > calls
> > > > > > > > > > > take
> > > > > > > > > > > >a reason
> > > > > > > > > > > >- Refactor the management UI to pass in a reason
> > > > > > > > > > > >- Refactor the Stellar Management functions
> > CONFIG_PUT
> > > > to
> > > > > > > > accept a
> > > > > > > > > > > > reason
> > > > > > > > > > > >
> > > > > > > > > > > > Just so we can evaluate it and I can ensure I 
haven't
> > > > > > overlooked
> > > > > > > > some
> > > > > > > > > > > > important point.  Please tell me if Ambari cannot do
> > the
> > > > > things
> > > > > > > > we're
> > > > > > > > > > > > suggesting it can do.
> > > > > > > > > > > >
> > > > > > > > > > > > Casey
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 13, 2017 at 9:15 AM, David Lyle <
> > > > > > > dlyle65...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > That's exactly correct, Casey. Basically, an
> > expansion
> > > of
> > > > > > what
> > > > > > > > > we're
> > > > > > > > > > > > > currently doing with global.json,
> > enrichment.properties
> > > > and
> > > > > > > > > > > > > elasticsearch.properties.
> > > > > > > > > > > > >
> > > >

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-12 Thread Matt Foley

The Spark docs sure are pretty.  I suspect there’s a lot of person-weeks of 
work behind the content.  I don’t know how hard it was to set up the 
infrastructure, but the instructions for generating the site mention an 
impressive list of tools needed.

The Falcon docs site seems much more straightforward, and reasonably pretty 
too.  I can take a little time to understand it better.

Thanks,
--Matt


On 1/12/17, 6:19 PM, "Kyle Richardson" <kylerichards...@gmail.com> wrote:

Matt, thanks for pulling this together. I completely agree that we need to
go all in on either cwiki or the README.md's. I think the wiki is poorly
updated and can cause confusion for new users and devs. My preference is
certainly for the README.md's.

I like your approach but also agree that we shouldn't need to roll our own
here. I really like the Spark documentation that Mike pointed out. Any way
we can duplicate/adapt their approach?

-Kyle

On Thu, Jan 12, 2017 at 7:19 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Casey, Matt - These guys are using doxia
> https://github.com/apache/falcon/tree/master/docs
>
> Honestly, I kind of like Spark's approach -
> https://github.com/apache/spark/tree/master/docs
>
> Mike
    >
    > On Thu, Jan 12, 2017 at 4:48 PM, Matt Foley <ma...@apache.org> wrote:
>
> > I’m ambivalent; I think we’d end up tied to the doxia processing
> pipeline,
> > which is “yet another arcane toolset” to learn.  Using .md as the input
> > format decreases the dependency, but we’d still be dependent on it.
> >
> > I had anticipated that the web page would be a write-once thing that
> would
> > be only a couple days for an experienced Web developer. But I was going
> to
> > get an estimate from some co-workers before actually trying to get it
> > implemented. And the script is a few hours of work with find and awk.
> >
> > On the other hand, doxia is certainly an expectable solution.  Is 
setting
> > up that infrastructure less work than developing the web page?  Or is it
> > actually just a matter of a few lines in pom.xml?
> >
> >
> > On 1/12/17, 3:24 PM, "Casey Stella" <ceste...@gmail.com> wrote:
> >
> > Just a followup thought that's a bit more constructive, maybe we
> could
> > migrate the README.md's into a site directory and use doxia markdown
> > (example here <https://github.com/larrycai/doxia-markdown-demo>) to
> > generate the site as part of the build to resolve 1 through 3?
> >
> > On Thu, Jan 12, 2017 at 6:19 PM, Casey Stella <ceste...@gmail.com>
> > wrote:
> >
> > > So, I do think this would be better than what we currently do.  I
> > like a
> > > few things in particular:
> > >
> > >- I don't like the wiki one bit.
> > >- We have a LOT of documentation in the README.md's and it's
> > sometimes
> > >poorly organized
> > >- I like a documentation preprocessing pipeline to be present.
> > For
> > >instance, a major ask is all of the stellar functions in one
> > place.  That's
> > >solved by updating an index manually in the READMEs and keeping
> > it in sync
> > >with the annotation.  I'd like to make a stellar annotation ->
> > markdown
> > >generator as part of the build and this would be nice for such 
a
> > task.
> > >
> > > My only concern is that the html generation/viewer seems like a
> fair
> > > amount of engineering.  Are you sure there isn't something easier
> > that we
    > > > could conform to?  I'm sure we aren't the only project in the 
world
> > that
> > > has this particular issue.  Is there something like a maven site
> > plugin or
> > > something?  Just a thought.  I'll come back with more :)
> > >
> > > Great ideas!  Keep them coming!
> > >
> > > Casey
> > >
> > > On Thu, Jan 12, 2017 at 6:05 PM, Matt Foley <ma...@apache.org>
> > wrote:
> > >
> > >> We currently have three forms of documentation, with the 
following
> > >> advantages and disadvantages:
> > >>
> > >> || Docs || Pro

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-12 Thread Matt Foley

Mike, could you try again on the image, please, making sure it is a simple 
format (gif, png, or jpeg)?  It got munched, at least in my viewer.  Thanks.

Casey, responding to some of the questions you raised:

I’m going to make a rather strong statement:  We already have a service “to 
intermediate and handle config update/retrieval”.  
Furthermore, it:
- Correctly handles the problems of distributed services running on multi-node 
clusters.  (That’s a HARD problem, people, and we shouldn’t try to reinvent the 
wheel.)
- Correctly handles Kerberos security. (That’s kinda hard too, or at least a 
lot of work.)
- It does automatic versioning of configurations, and allows viewing, 
comparing, and reverting historical configs
- It has a capable REST API for all those things.
It doesn’t natively integrate Zookeeper storage of configs, but there is a 
natural place to specify copy to/from Zookeeper for the files desired.

It is Ambari.  And we should commit to it, rather than try to re-create such 
features.
Because it has a good REST API, it is perfectly feasible to implement Stellar 
functions that call it.
GUI configuration tools can also use the Ambari APIs, or better yet be 
integrated in an “Ambari View”. (Eg, see the “Yarn Capacity Scheduler 
Configuration Tool” example in the Ambari documentation, under “Using Ambari 
Views”.)

Arguments are: Parsimony, Sufficiency, Not reinventing the wheel, and Not 
spending weeks and weeks of developer time over the next year reinventing the 
wheel while getting details wrong multiple times…

Okay, off soapbox.  

Casey asked what the config update behavior of Ambari is, and how it will 
interact with changes made from outside Ambari.
The following is from my experience working with the Ambari Mpack for Metron.  
I am not otherwise an Ambari expert, so tomorrow I’ll get it reviewed by an 
Ambari development engineer.

Ambari-server runs on one node, and Ambari-agent runs on each of all the nodes.
Ambari-server has a private set of py, xml, and template files, which together 
are used both to generate the Ambari configuration GUI, with defaults, and to 
generate configuration files (of any needed filetype) for the various Stack 
components.
Ambari-server also has a database where it stores the schema related to these 
files, so even if you reach in and edit Ambari’s files, it will Error out if 
the set of parameters or parameter names changes.  The historical information 
about configuration changes is also stored in the db.
For each component (and in the case of Metron, for each topology), there is a 
python file which controls the logic for these actions, among others:
- Install
- Start / stop / restart / status
- Configure

It is actually up to this python code (which we wrote for the Metron Mpack) 
what happens in each of these API calls.  But the current code, and I believe 
this is typical of Ambari-managed components, performs a “Configure” action 
whenever you press the “Save” button after changing a component config in 
Ambari, and also on each Install and Start or Restart.

The Configure action consists of approximately the following sequence (see 
disclaimer above :-)
- Recreate the generated config files, using the template files and the actual 
configuration most recently set in Ambari
o Note this is also under the control of python code that we wrote, and this is 
the appropriate place to push to ZK if desired.
- Propagate those config files to each Ambari-agent, with a command to set them 
locally
- The ambari-agents on each node receive the files and write them to the 
specified locations on local storage

Ambari-server then whines that the updated services should be restarted, but 
does not initiate that action itself (unless of course the initiating action 
was a Start command from the administrator).

Make sense?  It’s all quite straightforward in concept, there’s just an awful 
lot of stuff wrapped around that to make it all go smoothly and handle the 
problems when it doesn’t.

There’s additional complexity in that the Ambari-agent also caches (on each 
node) both the template files and COMPILED forms of the python files (.pyc) 
involved in transforming them.  The pyc files incorporate some amount of 
additional info regarding parameter values, but I’m not sure of the form.  I 
don’t think that changes the above in any practical way unless you’re trying to 
cheat Ambari by reaching in and editing its files directly.  In that case, you 
also need to whack the pyc files (on each node) to force the data to be 
reloaded from Ambari-server.  Best solution is don’t cheat.

Also, there may be circumstances under which the Ambari-agent will detect 
changes and re-write the latest version it knows of the config files, even 
without a Save or Start action at the Ambari-server.  I’m not sure of this and 
need to check with Ambari developers.  It may no longer happen, altho I’m 
pretty sure change detection/reversion was a feature of early versions of

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley

Ah, I see.  If overriding the default index name allows using the same name for 
multiple sensors, then the goal can be achieved.
Thanks,
--Matt


On 1/12/17, 3:30 PM, "Casey Stella" <ceste...@gmail.com> wrote:

Oh, you could!  Let's say you have a syslog parser with data from sources 1
2 and 3.  You'd end up with one kafka queue with 3 parsers attached to that
queue, each picking part the messages from source 1, 2 and 3.  They'd go
through separate enrichment and into the indexing topology.  In the
indexing topology, you could specify the same index name "syslog" and all
of the messages go into the same index for CEP querying if so desired.

On Thu, Jan 12, 2017 at 6:27 PM, Matt Foley <ma...@apache.org> wrote:

> Syslog is hell on parsers – I know, I worked at LogLogic in a previous
> life.  It makes perfect sense to route different lines from syslog through
> different appropriate parsers.  But a lot of what the parsers do is
> identify consistent subsets of metadata and annotate it – eg, src_ip_addr,
> event timestamps, etc.  Once those metadata are annotated and available
> with common field names, why doesn’t it make sense to index the messages
> together, for CEP querying?  I think Splunk has illustrated this model.
>
> On 1/12/17, 3:00 PM, "Casey Stella" <ceste...@gmail.com> wrote:
>
> yeah, I mean, honestly, I think the approach that we've taken for
> sources
> which aggregate different types of data is to provide filters at the
> parser
> level and have multiple parser topologies (with different, possibly
> mutually exclusive filters) running.  This would be a completely
> separate
> sensor.  Imagine a syslog data source that aggregates and you want to
> pick
> apart certain pieces of messages.  This is why the initial thought and
> architecture was one index per sensor.
>
> On Thu, Jan 12, 2017 at 5:55 PM, Matt Foley <ma...@apache.org> wrote:
>
> > I’m thinking that CEP (Complex Event Processing) is contrary to the
> idea
> > of silo-ing data per sensor.
> > Now it’s true that some of those sensors are already aggregating
> data from
> > multiple sources, so maybe I’m wrong here.
> > But it just seems to me that the “data lake” insights come from
> being able
> > to make decisions over the whole mass of data rather than just
> vertical
> > slices of it.
> >
> > On 1/12/17, 2:15 PM, "Casey Stella" <ceste...@gmail.com> wrote:
> >
> > Hey Matt,
> >
> > Thanks for the comment!
> > 1. At the moment, we only have one index name, the default of
> which is
> > the
> > sensor name but that's entirely up to the user.  This is sensor
> > specific,
> > so it'd be a separate config for each sensor.  If we want to
> build
> > multiple
> > indices per sensor, we'd have to think carefully about how to do
> that
> > and
> > would be a bigger undertaking.  I guess I can see the use, 
though
> > (redirect
> > messages to one index vs another based on a predicate for a 
given
> > sensor).
> > Anyway, not where I was originally thinking that this discussion
> would
> > go,
> > but it's an interesting point.
>     >
    > > 2. I hadn't thought through the implementation quite yet, but we
> don't
> > actually have a splitter bolt in that topology, just a spout
> that goes
> > to
> > the elasticsearch writer and also to the hdfs writer.
> >
> > On Thu, Jan 12, 2017 at 4:52 PM, Matt Foley <ma...@apache.org>
> wrote:
> >
> > > Casey, good to have controls like this.  Couple questions:
> > >
> > > 1. Regarding the “index” : “squid” name/value pair, is the
> index name
> > > expected to always be a sensor name?  Or is the given json
> structure
> > > subordinate to a sensor name in zookeeper?  Or can we build
> arbitrary
> > > indexes with this new specification, independent of sensor?
> Should
> > there
> > > actually be a list of “indexes”, ie
> > > { “indexes” : [
> >

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-12 Thread Matt Foley

I’m ambivalent; I think we’d end up tied to the doxia processing pipeline, 
which is “yet another arcane toolset” to learn.  Using .md as the input format 
decreases the dependency, but we’d still be dependent on it.

I had anticipated that the web page would be a write-once thing that would be 
only a couple days for an experienced Web developer. But I was going to get an 
estimate from some co-workers before actually trying to get it implemented. And 
the script is a few hours of work with find and awk.

On the other hand, doxia is certainly an expectable solution.  Is setting up 
that infrastructure less work than developing the web page?  Or is it actually 
just a matter of a few lines in pom.xml?


On 1/12/17, 3:24 PM, "Casey Stella" <ceste...@gmail.com> wrote:

Just a followup thought that's a bit more constructive, maybe we could
migrate the README.md's into a site directory and use doxia markdown
(example here <https://github.com/larrycai/doxia-markdown-demo>) to
generate the site as part of the build to resolve 1 through 3?

On Thu, Jan 12, 2017 at 6:19 PM, Casey Stella <ceste...@gmail.com> wrote:

> So, I do think this would be better than what we currently do.  I like a
> few things in particular:
>
>- I don't like the wiki one bit.
>- We have a LOT of documentation in the README.md's and it's sometimes
>poorly organized
>- I like a documentation preprocessing pipeline to be present.  For
>instance, a major ask is all of the stellar functions in one place.  
That's
>solved by updating an index manually in the READMEs and keeping it in 
sync
>with the annotation.  I'd like to make a stellar annotation -> markdown
>generator as part of the build and this would be nice for such a task.
>
> My only concern is that the html generation/viewer seems like a fair
> amount of engineering.  Are you sure there isn't something easier that we
> could conform to?  I'm sure we aren't the only project in the world that
> has this particular issue.  Is there something like a maven site plugin or
> something?  Just a thought.  I'll come back with more :)
>
> Great ideas!  Keep them coming!
>
> Casey
>
> On Thu, Jan 12, 2017 at 6:05 PM, Matt Foley <ma...@apache.org> wrote:
>
>> We currently have three forms of documentation, with the following
>> advantages and disadvantages:
>>
>> || Docs || Pro || Con ||
>> | CWiki |
>>   Easy to edit, no special tools required, don't have to be a
>> developer to contribute, google and wiki search |
>> Not versioned, no review process, distant from the code, obsolete content
>> tends to accumulate |
>> | Site |
>>   Versioned and reviewed, only committers can edit, google search |
>>   Yet another arcane toolset must be learned, only web programmers
>> feel comfortable contributing, "asf-site" branch not related to code
>> versions, distant from the code, tends to go obsolete due to
>> non-maintenance |
>> | README.md |
>>   Versioned and reviewed, only committers can edit, tied to code
>> versions, highly local to the code being documented |
>>   Non-developers don't know about them, may be scared by github, poor
>> scoring in google search, no high-level presentation |
>>
>> Various discussion threads indicate the developer community likes
>> README-based docs, and it's easy to see why from the above.  I propose 
this
>> extension to the README-based documentation, to address their 
disadvantages:
>>
>> 1. Produce a script that gathers the README.md files from all code
>> subdirectories into a hierarchical list.  The script would have an
>> exclusion list for non-user-content, which at this point would consist of
>> [site/*, build_utils/*].  The hierarchy would be sorted depth-first.  The
>> resulting hierarchical list at this time (with six added README files to
>> complete the hierarchy) would be:
>>
>> ./README.md
>> ./metron-analytics/README.md  <== (need file here)
>> ./metron-analytics/metron-maas-service/README.md
>> ./metron-analytics/metron-profiler/README.md
>> ./metron-analytics/metron-profiler-client/README.md
>> ./metron-analytics/metron-statistics/README.md
>> ./metron-deployment/README.md
>> ./metron-deployment/amazon-ec2/README.md
>> ./metron-deployment/packaging/README.md  <== (need file here)
>> ./metron-deployment/packaging/ambari/README.md <== (n

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley

Syslog is hell on parsers – I know, I worked at LogLogic in a previous life.  
It makes perfect sense to route different lines from syslog through different 
appropriate parsers.  But a lot of what the parsers do is identify consistent 
subsets of metadata and annotate it – eg, src_ip_addr, event timestamps, etc.  
Once those metadata are annotated and available with common field names, why 
doesn’t it make sense to index the messages together, for CEP querying?  I 
think Splunk has illustrated this model. 

On 1/12/17, 3:00 PM, "Casey Stella" <ceste...@gmail.com> wrote:

yeah, I mean, honestly, I think the approach that we've taken for sources
which aggregate different types of data is to provide filters at the parser
level and have multiple parser topologies (with different, possibly
mutually exclusive filters) running.  This would be a completely separate
sensor.  Imagine a syslog data source that aggregates and you want to pick
apart certain pieces of messages.  This is why the initial thought and
architecture was one index per sensor.

On Thu, Jan 12, 2017 at 5:55 PM, Matt Foley <ma...@apache.org> wrote:

> I’m thinking that CEP (Complex Event Processing) is contrary to the idea
> of silo-ing data per sensor.
> Now it’s true that some of those sensors are already aggregating data from
> multiple sources, so maybe I’m wrong here.
> But it just seems to me that the “data lake” insights come from being able
> to make decisions over the whole mass of data rather than just vertical
> slices of it.
>
> On 1/12/17, 2:15 PM, "Casey Stella" <ceste...@gmail.com> wrote:
>
> Hey Matt,
>
> Thanks for the comment!
> 1. At the moment, we only have one index name, the default of which is
> the
> sensor name but that's entirely up to the user.  This is sensor
> specific,
> so it'd be a separate config for each sensor.  If we want to build
> multiple
> indices per sensor, we'd have to think carefully about how to do that
> and
> would be a bigger undertaking.  I guess I can see the use, though
> (redirect
> messages to one index vs another based on a predicate for a given
> sensor).
> Anyway, not where I was originally thinking that this discussion would
> go,
> but it's an interesting point.
>
> 2. I hadn't thought through the implementation quite yet, but we don't
> actually have a splitter bolt in that topology, just a spout that goes
    > to
> the elasticsearch writer and also to the hdfs writer.
>
> On Thu, Jan 12, 2017 at 4:52 PM, Matt Foley <ma...@apache.org> wrote:
>
> > Casey, good to have controls like this.  Couple questions:
> >
> > 1. Regarding the “index” : “squid” name/value pair, is the index 
name
> > expected to always be a sensor name?  Or is the given json structure
> > subordinate to a sensor name in zookeeper?  Or can we build 
arbitrary
> > indexes with this new specification, independent of sensor?  Should
> there
> > actually be a list of “indexes”, ie
> > { “indexes” : [
> > {“index” : “name1”,
> > …
> > },
> > {“index” : “name2”,
> > …
> > } ]
> > }
> >
> > 2. Would the filtering / writer selection logic take place in the
> indexing
> > topology splitter bolt?  Seems like that would have the smallest
> impact on
> > current implementation, no?
> >
> > Sorry if these are already answered in PR-415, I haven’t had time to
> > review that one yet.
> > Thanks,
> > --Matt
> >
> >
> > On 1/12/17, 12:55 PM, "Michael Miklavcic" <
> michael.miklav...@gmail.com>
> > wrote:
> >
> > I like the flexibility and expressibility of the first option
> with
> > Stellar
> > filters.
> >
> > M
> >
> > On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <
> ceste...@gmail.com>
> > wrote:
> >
> > > As of METRON-652 <https://github.com/apache/
> > incubator-metron/pull/415>, we
> > > will have decoupled the indexing configuration from the
> enrichment
> > > configuration.  As an im

[PROPOSAL] up-to-date versioned documentation

2017-01-12 Thread Matt Foley

We currently have three forms of documentation, with the following advantages 
and disadvantages:

|| Docs || Pro || Con ||
| CWiki | 
  Easy to edit, no special tools required, don't have to be a developer to 
contribute, google and wiki search | 
Not versioned, no review process, distant from the code, obsolete content tends 
to accumulate |
| Site | 
  Versioned and reviewed, only committers can edit, google search | 
  Yet another arcane toolset must be learned, only web programmers feel 
comfortable contributing, "asf-site" branch not related to code versions, 
distant from the code, tends to go obsolete due to non-maintenance |
| README.md | 
  Versioned and reviewed, only committers can edit, tied to code versions, 
highly local to the code being documented | 
  Non-developers don't know about them, may be scared by github, poor 
scoring in google search, no high-level presentation |

Various discussion threads indicate the developer community likes README-based 
docs, and it's easy to see why from the above.  I propose this extension to the 
README-based documentation, to address their disadvantages:

1. Produce a script that gathers the README.md files from all code 
subdirectories into a hierarchical list.  The script would have an exclusion 
list for non-user-content, which at this point would consist of [site/*, 
build_utils/*].  The hierarchy would be sorted depth-first.  The resulting 
hierarchical list at this time (with six added README files to complete the 
hierarchy) would be:

./README.md
./metron-analytics/README.md  <== (need file here)
./metron-analytics/metron-maas-service/README.md
./metron-analytics/metron-profiler/README.md
./metron-analytics/metron-profiler-client/README.md
./metron-analytics/metron-statistics/README.md
./metron-deployment/README.md
./metron-deployment/amazon-ec2/README.md
./metron-deployment/packaging/README.md  <== (need file here)
./metron-deployment/packaging/ambari/README.md <== (need file here)
./metron-deployment/packaging/docker/ansible-docker/README.md
./metron-deployment/packaging/docker/rpm-docker/README.md
./metron-deployment/packer-build/README.md
./metron-deployment/roles/  <== (need file here)
./metron-deployment/roles/kibana/README.md
./metron-deployment/roles/monit/README.md
./metron-deployment/roles/opentaxii/README.md
./metron-deployment/roles/pcap_replay/README.md
./metron-deployment/roles/sensor-test-mode/README.md
./metron-deployment/vagrant/README.md  <== (need file here)
./metron-deployment/vagrant/codelab-platform/README.md
./metron-deployment/vagrant/fastcapa-test-platform/README.md
./metron-deployment/vagrant/full-dev-platform/README.md
./metron-deployment/vagrant/quick-dev-platform/README.md
./metron-platform/README.md
./metron-platform/metron-api/README.md
./metron-platform/metron-common/README.md
./metron-platform/metron-data-management/README.md
./metron-platform/metron-enrichment/README.md
./metron-platform/metron-indexing/README.md
./metron-platform/metron-management/README.md
./metron-platform/metron-parsers/README.md
./metron-platform/metron-pcap-backend/README.md
./metron-sensors/README.md  <== (need file here)
./metron-sensors/fastcapa/README.md
./metron-sensors/pycapa/README.md

2. Arrange to run this script as part of the build process, and commit the 
resulting hierarchy list to someplace in the versioned and branched ./site/ 
subdirectory.

3. Produce a "doc reader" web page that takes in this hierarchy of .md pages, 
and presents a LHS doc tree of links, and a main display area for a currently 
selected file.  If we want to get fancy, this page would also provide: (a) 
telescoping (collapse/expand) of the doc tree; (b) floating next/prev/up/home 
buttons in the display area.

#4. Add to this web page a pull-down menu that selects among all the release 
versions of Metron, and (if not running in the Apache site) a SNAPSHOT version 
for the current filesystem version (for developer preview).  Let it re-write 
the file paths per release version to the proper release tag in github.  This 
web page will therefore be version-independent.  Put it in the asf-site branch 
of the Apache site, as the new "docs" sub-site from the home web page.  Update 
the list of releases at each release, or if we want to get fancy, teach it to 
read the release tags from github.

5. As part of the release process, the release manager (a) assures the release 
is tagged in github with a consistent naming convention, and (b) submits the 
new hierarchy of links to google search (there's an api for that).

6. Deprecate the use of cwiki for anything but long-lived 
demonstrations/tutorials that are unlikely to go obsolete.


Do folks feel this would be a good contribution to the visibility, timeliness, 
and usability of our docs?
Is this an adequate solution for the current problems?

Thanks, 
--Matt

Re: [DISCUSS] Dev Guide and Committer Review Guide additions?

2017-01-12 Thread Matt Foley

Casey, great, we crossed messages!  Thanks for starting that thread, I’ll 
participate there.
--Matt

On 1/12/17, 2:51 PM, "Casey Stella" <ceste...@gmail.com> wrote:

Regarding 3, Matt, I just started a dev list discussion about configs and
the various components that manage them and how they interact.  Hopefully
we end up in a coherent approach, but in the lead of that, I'd say yes,
valid need for such an architecture.  Please chime in on that thread or
even in reply to this thread (I'll take anything I can get ;) with thoughts.

On Thu, Jan 12, 2017 at 5:49 PM, Matt Foley <ma...@apache.org> wrote:

> I think I hear 3 major areas not adequately covered by our usual “code
> review”:
> 1. Documentation
> 2. Deployment Builds
> 3. Management of config parameters
>
> The other areas mentioned by Otto (testing, perf test, Stellar impact, and
> REST api impact), are entirely valid, but fall under existing code and
> architecture that seems generally adequate.
>
> Regarding #1, Documentation, I’d like to branch a discussion thread for a
> proposal I’m about to make, to enhance our use of README files as usable
> and up-to-date end-user documentation, linked from the Metron site.
> Implicit in that is the idea that we’d deprecate using the cwiki for
> anything but long-lived demonstrations/tutorials that are unlikely to go
> obsolete.
>
> For #2, Deployment Builds:  This is difficult, and unfortunately I’m not
> an expert with these things, but we need to automate this as much as
> possible.  Config params will always interact heavily with deployment
> issues, but let’s leave that for #3 :0)
>
> As far as RPMs, Ansible playbooks, or Docker images go, we’d like to
> automate so that developers never have to do anything when they are
> committing modifications of existing components, and even when new
> components are added (like the Profiler is being added now), it should
> insofar as possible be automated via maven declarations.  But that takes
> input from the experts in each of the areas.
>
> Also, what would people think of dropping Ansible in favor of Ambari and
> Docker as the preferred deployment management approaches?
>
> #3, Management of config parameters:  I’ve been thinking about this
> lately, but haven’t written up a proposal yet.  I’m bothered by the wide
> ranging variability in the way Metron configs are managed: files,
> zookeeper, environment variables, traditional Hadoop-style configs, and
> roll-your-own json configs, sometimes shared, sometimes duplicated, not to
> mention Ambari over it all.  This has been encouraged by the huge number 
of
> Stack components that Metron depends on, and the relative independence of
> the components Metron itself is composed of.
>
> But I think as Otto points out, as we grow the number of components and
> mature out of the incubator, we have to get this under control.  We need 
an
> architecture for management of configuration parameters of the Metron
> topologies.  (We can’t do much about the Stack components, but Ambari is
> establishing a culture around managing those.)  The architecture needs to
> include update methodology for semantic changes in parameter sets.
>
> I’m mulling such an architecture, but what do other people think?  Is this
> a valid need?
>
> Thanks,
> --Matt
>
> On 1/12/17, 8:23 AM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> wrote:
>
> Hi Otto,
>
> You make a great point.
>
> AFA RPM/MPack, we do have some work in the pipeline for streamlining
> things
> a bit with the RPM's and MPack code such that they will be used for
> performing the Metron install in the sandbox VM's rather than Ansible.
> (I'd
> search for the public Jiras and post them here, but Jira is down for
> maintenance currently.) This should help make it obvious that a change
> or
> new feature requires modifications because they will be in the 
critical
> path to testing.
>
> Documentation is still tricky because we have README files, javadoc,
> and
> the wiki. But in general I think the current approach is to put
> concrete
> functionality docs in the READMEs as much as possible because they can
> be
> tracked and versioned with Git. I think the community has actually 
been
> doing a pretty good job here. The wiki is a little more tricky because
> there is typically only one

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley

I’m thinking that CEP (Complex Event Processing) is contrary to the idea of 
silo-ing data per sensor.
Now it’s true that some of those sensors are already aggregating data from 
multiple sources, so maybe I’m wrong here.
But it just seems to me that the “data lake” insights come from being able to 
make decisions over the whole mass of data rather than just vertical slices of 
it.

On 1/12/17, 2:15 PM, "Casey Stella" <ceste...@gmail.com> wrote:

Hey Matt,

Thanks for the comment!
1. At the moment, we only have one index name, the default of which is the
sensor name but that's entirely up to the user.  This is sensor specific,
so it'd be a separate config for each sensor.  If we want to build multiple
indices per sensor, we'd have to think carefully about how to do that and
would be a bigger undertaking.  I guess I can see the use, though (redirect
messages to one index vs another based on a predicate for a given sensor).
Anyway, not where I was originally thinking that this discussion would go,
but it's an interesting point.

2. I hadn't thought through the implementation quite yet, but we don't
actually have a splitter bolt in that topology, just a spout that goes to
the elasticsearch writer and also to the hdfs writer.

On Thu, Jan 12, 2017 at 4:52 PM, Matt Foley <ma...@apache.org> wrote:

> Casey, good to have controls like this.  Couple questions:
>
> 1. Regarding the “index” : “squid” name/value pair, is the index name
> expected to always be a sensor name?  Or is the given json structure
> subordinate to a sensor name in zookeeper?  Or can we build arbitrary
> indexes with this new specification, independent of sensor?  Should there
> actually be a list of “indexes”, ie
> { “indexes” : [
> {“index” : “name1”,
> …
> },
> {“index” : “name2”,
> …
> } ]
> }
>
> 2. Would the filtering / writer selection logic take place in the indexing
> topology splitter bolt?  Seems like that would have the smallest impact on
> current implementation, no?
>
> Sorry if these are already answered in PR-415, I haven’t had time to
> review that one yet.
> Thanks,
> --Matt
>
>
> On 1/12/17, 12:55 PM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> wrote:
>
> I like the flexibility and expressibility of the first option with
> Stellar
> filters.
>
> M
>
> On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <ceste...@gmail.com>
> wrote:
>
> > As of METRON-652 <https://github.com/apache/
> incubator-metron/pull/415>, we
> > will have decoupled the indexing configuration from the enrichment
> > configuration.  As an immediate follow-up to that, I'd like to
> provide the
> > ability to turn off and on writers via the configs.  I'd like to get
> some
> > community feedback on how the functionality should work, if y'all 
are
> > amenable. :)
> >
> >
> > As of now, we have 3 possible writers which can be used in the
> indexing
> > topology:
> >
> >- Solr
> >- Elasticsearch
> >- HDFS
> >
> > HDFS is always used, elasticsearch or solr is used depending on how
> you
> > start the indexing topology.
> >
> > A couple of proposals come to mind immediately:
> >
> > *Index Filtering*
> >
> > You would be able to specify a filter as defined by a stellar
> statement
> > (likely a reuse of the StellarFilter that exists in the Parsers)
> which
> > would allow you to indicate on a message-by-message basis whether or
> not to
> > write the message.
> >
> > The semantics of this would be as follows:
> >
> >- Default (i.e. unspecified) is to pass everything through (hence
> >backwards compatible with the current default config).
> >- Messages which have the associated stellar statement evaluate
> to true
> >for the writer type will be written, otherwise not.
> >
> >
> > Sample indexing config which would write out no messages to HDFS and
> write
> > out only messages containing a field called "field1":
> > {
> >"index" : &quo

Re: [DISCUSS] Dev Guide and Committer Review Guide additions?

2017-01-12 Thread Matt Foley

I think I hear 3 major areas not adequately covered by our usual “code review”:
1. Documentation
2. Deployment Builds
3. Management of config parameters

The other areas mentioned by Otto (testing, perf test, Stellar impact, and REST 
api impact), are entirely valid, but fall under existing code and architecture 
that seems generally adequate.

Regarding #1, Documentation, I’d like to branch a discussion thread for a 
proposal I’m about to make, to enhance our use of README files as usable and 
up-to-date end-user documentation, linked from the Metron site.  Implicit in 
that is the idea that we’d deprecate using the cwiki for anything but 
long-lived demonstrations/tutorials that are unlikely to go obsolete.

For #2, Deployment Builds:  This is difficult, and unfortunately I’m not an 
expert with these things, but we need to automate this as much as possible.  
Config params will always interact heavily with deployment issues, but let’s 
leave that for #3 :0)

As far as RPMs, Ansible playbooks, or Docker images go, we’d like to automate 
so that developers never have to do anything when they are committing 
modifications of existing components, and even when new components are added 
(like the Profiler is being added now), it should insofar as possible be 
automated via maven declarations.  But that takes input from the experts in 
each of the areas.  

Also, what would people think of dropping Ansible in favor of Ambari and Docker 
as the preferred deployment management approaches?

#3, Management of config parameters:  I’ve been thinking about this lately, but 
haven’t written up a proposal yet.  I’m bothered by the wide ranging 
variability in the way Metron configs are managed: files, zookeeper, 
environment variables, traditional Hadoop-style configs, and roll-your-own json 
configs, sometimes shared, sometimes duplicated, not to mention Ambari over it 
all.  This has been encouraged by the huge number of Stack components that 
Metron depends on, and the relative independence of the components Metron 
itself is composed of.

But I think as Otto points out, as we grow the number of components and mature 
out of the incubator, we have to get this under control.  We need an 
architecture for management of configuration parameters of the Metron 
topologies.  (We can’t do much about the Stack components, but Ambari is 
establishing a culture around managing those.)  The architecture needs to 
include update methodology for semantic changes in parameter sets.

I’m mulling such an architecture, but what do other people think?  Is this a 
valid need?

Thanks,
--Matt

On 1/12/17, 8:23 AM, "Michael Miklavcic"  wrote:

Hi Otto,

You make a great point.

AFA RPM/MPack, we do have some work in the pipeline for streamlining things
a bit with the RPM's and MPack code such that they will be used for
performing the Metron install in the sandbox VM's rather than Ansible. (I'd
search for the public Jiras and post them here, but Jira is down for
maintenance currently.) This should help make it obvious that a change or
new feature requires modifications because they will be in the critical
path to testing.

Documentation is still tricky because we have README files, javadoc, and
the wiki. But in general I think the current approach is to put concrete
functionality docs in the READMEs as much as possible because they can be
tracked and versioned with Git. I think the community has actually been
doing a pretty good job here. The wiki is a little more tricky because
there is typically only one version, which tracks master, not necessarily
the latest stable release.

Mike

On Thu, Jan 12, 2017 at 8:42 AM, Otto Fowler 
wrote:

> As Metron evolves to include new deployment options, features, and
> configurations it is hard and only getting harder for contributors,
> committers, and reviewers to understand what the required changes are
> across the different areas of the system to correctly and completely
> introduce a change or new feature in the system.
>
> We have talked some about the requirements or expectations for submitters
> with regards to tests and coverage, coding style, and documentation  but I
> don’t think we have enough guidance on deployment or other changes that
> need to be considered.  For committers it is pretty much the same, with 
the
> extra stuff around that process.
>
> Right now it seems as a committer I’m counting on others like Nick or 
Casey
> to understand anything that may be missing from a submission when I review
> it.  Should there by an Ambari/RPM change?   Does this change the RestAPI?
> Does this effect STELLAR Lang/SHELL?  Does it need customer Docker Compose
> work?  etc etc.
>
> I think as we grow the community and try to get out of incubation it will

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley

Casey, good to have controls like this.  Couple questions:

1. Regarding the “index” : “squid” name/value pair, is the index name expected 
to always be a sensor name?  Or is the given json structure subordinate to a 
sensor name in zookeeper?  Or can we build arbitrary indexes with this new 
specification, independent of sensor?  Should there actually be a list of 
“indexes”, ie
{ “indexes” : [
{“index” : “name1”,
…
},
{“index” : “name2”,
…
} ]
}

2. Would the filtering / writer selection logic take place in the indexing 
topology splitter bolt?  Seems like that would have the smallest impact on 
current implementation, no?

Sorry if these are already answered in PR-415, I haven’t had time to review 
that one yet.
Thanks,
--Matt


On 1/12/17, 12:55 PM, "Michael Miklavcic"  wrote:

I like the flexibility and expressibility of the first option with Stellar
filters.

M

On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella  wrote:

> As of METRON-652 , we
> will have decoupled the indexing configuration from the enrichment
> configuration.  As an immediate follow-up to that, I'd like to provide the
> ability to turn off and on writers via the configs.  I'd like to get some
> community feedback on how the functionality should work, if y'all are
> amenable. :)
>
>
> As of now, we have 3 possible writers which can be used in the indexing
> topology:
>
>- Solr
>- Elasticsearch
>- HDFS
>
> HDFS is always used, elasticsearch or solr is used depending on how you
> start the indexing topology.
>
> A couple of proposals come to mind immediately:
>
> *Index Filtering*
>
> You would be able to specify a filter as defined by a stellar statement
> (likely a reuse of the StellarFilter that exists in the Parsers) which
> would allow you to indicate on a message-by-message basis whether or not 
to
> write the message.
>
> The semantics of this would be as follows:
>
>- Default (i.e. unspecified) is to pass everything through (hence
>backwards compatible with the current default config).
>- Messages which have the associated stellar statement evaluate to true
>for the writer type will be written, otherwise not.
>
>
> Sample indexing config which would write out no messages to HDFS and write
> out only messages containing a field called "field1":
> {
>"index" : "squid"
>   ,"batchSize" : 100
>   ,"filters" : {
>   "HDFS" : "false"
>  ,"ES" : "exists(field1)"
>  }
> }
>
> *Index On/Off Switch*
>
> A simpler solution would be to just provide a list of writers to write
> messages.  The semantics would be as follows:
>
>- If the list is unspecified, then the default is to write all messages
>for every writer in the indexing topology
>- If the list is specified, then a writer will write all messages if 
and
>only if it is named in the list.
>
> Sample indexing config which turns off HDFS and keeps on Elasticsearch:
> {
>"index" : "squid"
>   ,"batchSize" : 100
>   ,"writers" : [ "ES" ]
> }
>
> Thanks in advance for the feedback!  Also, if you have any other, better
> ideas than the ones presented here, let me know too.
>
> Best,
>
> Casey
>

Re: [VOTE] Reporting Issues Wiki

2017-01-05 Thread Matt Foley

+1 (non-binding)

One typo is still in there:
 >>After discussion of the issue on the JIRA if it is clear that you found 
 >> a bug then you should file a JIRA
should be
>>After discussion of the issue on the mailing list if it is clear that you 
>> found a bug then you should file a JIRA

Cheers,
--Matt

On 1/5/17, 2:41 PM, "James Sirota"  wrote:

Based on feedback from the discuss thread.

Please vote +1, -1, or 0.  The vote will be open for 72 hours

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635199

All “I have found a bug” issues are considered developer-level issues.  
Please report all developer-level issues to dev@metron.incubator.apache.org.  
Examples of developer issues would be:
Project fails to compile or is failing unit or integration tests
Project or individual components fail to install 
There are error messages or failures in my logs
I need help with coding or extending a specific component
etc...
After discussion of the issue on the JIRA if it is clear that you found a 
bug then you should file a JIRA (unless you found a security vulnerability).  
Follow up on the mailing lists if you want advice with respect to workaround or 
a local fix.  Our JIRA is located here. 

All  “I have a problem” or "How do you use x" issues are usability issues.  
If you are an end-user of the product and have a comment or question then use 
u...@metron.incubator.apache.org.  If you have a problem and a strong suspicion 
that you might have found a bug, please cross-reference 
dev@metron.incubator.apache.org as well
I don't understand the UI, what does button x do?
What should the output of function x be?
It would be nice if I had feature x along with feature y
etc...

If you found a security-related issue, please report immediately to 
secur...@metron.incubator.apache.org. Please adhere to the following Apache 
policy found here. DO NOT FILE A JIRA, DO NOT POST ON ANY OTHER BOARD
I can get access to data I should not have access to
I have privileges to do things I should not be allowed to do
I found that this project is susceptible to an exploit 
etc...

Please report issues related to the JIRA/Wiki to 
iss...@metron.incubator.apache.org
I don't have access to create/assign JIRAs to myself
I don't have visibility/access to certain JIRA featuers
I can't create or view a wiki entry
etc

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS] Release Process

2017-01-04 Thread Matt Foley

finished release
If the vote fails at any stage then incorporate feedback, create another 
RC, and repeat.  If both votes pass then stage the resulting artifacts here:  
https://dist.apache.org/repos/dist/release/incubator/metron/
Step 14 - Announce build
Send a discuss thread to the Metron dev boards announcing the new Metron 
build
Creating a Maintenance Release
Creation of the Maintenance Release should follow exactly the same set of 
steps as creating the Feature Release as outlined above, but with two 
exception.  First, the version incremented on the maintenance release should be 
the MR++ so that the release is named 0.[FR].[MR++].  Second, if a critical 
JIRA comes in that requires an immediate patch we may forego steps 2-5 and 
immediately cut the MR release.  A critical JIRA is something that is either a 
security vulnerability or a functional show stopper .  
Ensuring Consistency between Feature and Maintenance releases
Being able to maintain the previous release train, with only critical or 
important bug fixes and security fixes (generally not new features) for users 
who are averse to frequent large changes is very important for production use.  
They get stability, while the feature code proceeds as fast as the community 
wishes.  It is important to assure that all commits to the maintenance release 
also get made in the feature branch (if relevant), to avoid the appearance of 
regressions in the maintenance branch.  The formal process for assuring this is 
as follows:
Every maintenance release JIRA should have a corresponding feature JIRA to 
make sure that the patch is applied consistently to both branches.  The 
maintenance JIRA should be cloned and appropriate fix version for the feature 
release should be applied.  If the fix is not relevant to the feature or 
maintenance branch then the submitter must explicitly state this.  In general 
reviewers should refuse a patch PR unless both feature and maintenance JIRAs 
have been created.
The release manager has a responsibility to review all commits to the 
maintenance line since last release, and make sure they were duplicated to the 
feature branch (unless not relevant, which must also be determined).

20.12.2016, 11:45, "Matt Foley" <ma...@apache.org>:
> 1. Agree. Being able to maintain the previous release train, with only 
critical or important bug fixes and security fixes (generally not new features) 
for users who are averse to frequent large changes, is very important for 
production use. They get stability, while the mainline code proceeds as fast as 
the community wishes.
> a. As Kyle points out, it is important to assure that all commits to the 
maintenance line also get made in the mainline (if relevant), to avoid the 
appearance of regressions in the mainline. There should be a formal process for 
assuring this. Possibilities are:
> i. The release manager has a responsibility to review all commits to the 
maint line since last release, and make sure they were duplicated to the 
mainline (unless not relevant, which must also be determined).
> ii. Reviewers refuse to accept PRs for the maint line unless they are 
twinned with PRs for corresponding changes in the mainline (unless not 
relevant, which must be stated by the submitter). This should be reflected in 
Jira practices as well as PR practices. Note Jira is poor at tracking multiple 
“Fix Version/s” values (due to the ambiguous use of “Fix version” to mean both 
“target version” and “done version”). Most teams just clone jira tickets for 
multiple target releases.
> 2. Agree. Being a release manager is a significant commitment of both 
time and care, and should be rotated around; both for the benefit of the 
individuals involved and so that at least 2 or 3 people are deeply familiar 
with the process at any given time.
> --Matt
>
> On 12/20/16, 8:15 AM, "James Sirota" <jsir...@apache.org> wrote:
>
> You are correct. This thread is about the release process:
> 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854770
>
> Does anyone have additional opinions on this?
>
> 1. Maintenance release would just contain patches to the existing 
release. Feature release would contain everything, including patches and new 
features.
> 2. The intention is to rotate the build manager. I did it for the 
first few releases, then Casey did it for the next few releasees, someone else 
will probably do it for the next few releases, etc...
>
> Does this seem reasonable to everyone?
>
> Thanks,
> James
>
> 18.12.2016, 18:15, "Kyle Richardson" <kylerichards...@gmail.com>:
> > I think this thread got commingled with the discussion on Coding
> > Guidelines. The wiki page on

Re: Tests failing due to new year

2017-01-03 Thread Matt Foley

Heh, darn it, crossed in Jira – see METRON-648.  You win :-)

On 1/3/17, 1:13 PM, "Nick Allen"  wrote:

Thanks, Kyle.  I am seeing the same issue.  Happy Y2K... I mean 2017.

On Tue, Jan 3, 2017 at 4:09 PM, Kyle Richardson 
wrote:

> Created METRON-647 for tracking.
>
> -Kyle
>
> On Tue, Jan 3, 2017 at 3:49 PM, Kyle Richardson  >
> wrote:
>
> > ** This is causing all new PRs to fail Travis CI **
> >
> > The rollover to the new year is causing unit test failures for some of
> our
> > parser classes. It looks like the issue in the same in all cases... We
> have
> > hard coded a timestamp assertion but the original message does not
> contain
> > the year and is now parsing as 2017 instead of 2016.
> >
> > I'm currently investigating the failure for BasicAsaParserTest.
> > testIp6Addr:151.
> >
> > Other failures from the Travis CI log I'm looking at are:
> > GrokWebSphereParserTest.testParseLoginLine:60
> > GrokWebSphereParserTest.testParseMalformedLoginLine:151
> > GrokWebSphereParserTest.tetsParseLogoutLine:84
> > GrokWebSphereParserTest.tetsParseMalformedLogoutLine:175
> > GrokWebSphereParserTest.tetsParseMalformedOtherLine:220
> > GrokWebSphereParserTest.tetsParseMalformedRBMLine:198
> > GrokWebSphereParserTest.tetsParseOtherLine:129
> > GrokWebSphereParserTest.tetsParseRBMLine:107
> >
> > -Kyle
> >
> >
>



-- 
Nick Allen

Re: Custom Storm Topologies

2017-01-03 Thread Matt Foley

Well, yes :-)  
And clearly it should always be more efficient to write a custom bolt in Java 
than to invoke a script and manage it.

--Matt

From: Otto Fowler <ottobackwa...@gmail.com>
Date: Tuesday, January 3, 2017 at 7:08 AM
To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>, Matt 
Foley <ma...@apache.org>
Subject: Re: Custom Storm Topologies

Wouldn’t that be a bolt?


On January 2, 2017 at 14:39:34, Matt Foley (ma...@apache.org) wrote:
Should we consider a script calling capability that can launch a streaming 
script and keep it alive and fed, long-term, rather than launching the script 
anew every time the Stellar function is invoked? I’m thinking two basic rules: 
Write a line, read a line; and always have a timeout. Prob need a UID of some 
sort for a cache of running process objects. 

--Matt 

On 1/2/17, 8:50 AM, "Carolyn Duby" <cd...@hortonworks.com> wrote: 


Inserting a script inline is ok for low throughput and prototyping but once you 
get higher throughput (millions of events per second), it’s probably going to 
be a bottleneck. 


For Metron-571 you might want to consider a java based extension plugin similar 
to Eclipse plugins. 

Thanks 
Carolyn 

On 12/31/16, 5:22 PM, "Tyler Moore" <tmo...@goflyball.com> wrote: 

>Thanks Jon, 
> 
>I'll look over the tutorial and put something together for the SHELL_EXEC 
>stellar function. 
>I don't believe I have permissions to assign in Jira if you want to assign 
>to me my username is devopsec. 
>I'll post back details and we can review security issues 
> 
>Regards, 
> 
>Tyler Moore 
>Software Engineer 
>Phone: 248-909-2769 
>Email: moore.ty...@goflyball.com 
> 
> 
>On Sat, Dec 31, 2016 at 9:46 AM, zeo...@gmail.com <zeo...@gmail.com> wrote: 
> 
>> Casey did a tutorial on how to add your own Stellar function here 
>> <https://www.youtube.com/watch?v=VAEU4JjbS1o> - there is not an existing 
>> function that does this (current functions are listed here 
>> <https://github.com/apache/incubator-metron/tree/master/ 
>> metron-platform/metron-common#stellar-core-functions>). 
>> I noticed that some of the Stellar function documentation was a bit dated 
>> so I've opened a PR to update it here 
>> <https://github.com/apache/incubator-metron/pull/407>. 
>> 
>> As this is something I need as well, I'd be happy to assist you where I 
>> can. Perhaps you want to self-assign METRON-571 
>> <https://issues.apache.org/jira/browse/METRON-571>? I do have some 
>> security concerns with a SHELL_EXEC function because it could result in RCE 
>> - if that's the route you go I could probably help with a thorough secure 
>> code review. 
>> 
>> Jon 
>> 
>> On Fri, Dec 30, 2016 at 10:43 PM Tyler Moore <tmo...@goflyball.com> wrote: 
>> 
>> Thank you everyone for your suggestions, 
>> 
>> I believe that kicking off the function via stellar would be the optimal 
>> solution. If anyone has an example of calling external code via stellar 
>> that would be very helpful. Thanks! 
>> 
>> Regards, 
>> 
>> Tyler Moore 
>> IT Specialist 
>> tyler.math...@yahoo.com 
>> 248-909-2769 <(248)%20909-2769> 
>> 
>> > On Dec 30, 2016, at 17:54, Otto Fowler <ottobackwa...@gmail.com> wrote: 
>> > 
>> > They are all extension points. 
>> > 
>> >> On December 30, 2016 at 16:34:58, zeo...@gmail.com (zeo...@gmail.com) 
>> wrote: 
>> >> 
>> >> Right but unless I'm missing something, both of those options are more 
>> >> rigid and the MaaS service would have an unnecessary delay as opposed to 
>> >> doing it entirely in Stellar. Unless there's a reason to do otherwise 
>> that 
>> >> I'm missing, I would think doing this in Stellar gives you a more timely 
>> >> and (re)configurable end result. 
>> >> 
>> >> Jon 
>> >> 
>> >>> On Fri, Dec 30, 2016, 16:22 Otto Fowler <ottobackwa...@gmail.com> 
>> wrote: 
>> >>> 
>> >>> I think there are a couple of things you can do here. There way to get 
>> >>> something else into the split is to have another adapter to split to, 
>> which 
>> >>> is what I think you mean. You can also integrate with MaaS and create 
>> a 
>> >>> service that you can call via STELLAR. 
>> >>> 
>> >>> 
>> >>> 
>> >>> On December 30, 2016 at 15:08:48, Otto Fowler (ottobackwa...@gmail.com 
>> ) 
>> >>> wrote: 
>> >>> 
&

Re: Confluence write access to a space

2017-01-03 Thread Matt Foley

Hi Dima,

I think it would make a lot of sense to make 642 a sub-task of 634.  Certainly 
it’s a needed improvement and there’s no need to wait for the laundry list of 
other stuff in 634.  In fact, shorter is better.  But gathering the pieces as 
subtasks will make it easier to track and avoid duplicating work.

Regarding 641, as I commented in the PR, I think the proposed change is only 
needed if the user is using Python 2.6.  But there’s no harm in making the 
change anyway.  Since it isn’t mentioned in 634, just go ahead and pursue it 
separately.

Thanks for asking.

Regarding elasticsearch_config_path in 
metron-deployment/roles/metron_streaming/defaults/main.yml , I don’t really 
know.  I think it is probably a reference to the location of elastic-env.sh, 
which is itself not actually used for anything as far as I can tell.  In 
METRON-634 I actually propose removing elastic-env.sh and its progenitor file, 
elastic-env.xml.  But other more knowledgeable people haven’t had the chance to 
chime in on that yet :-)

--Matt

On 1/2/17, 11:01 PM, "Dima Kovalyov" <dima.koval...@sstech.us> wrote:

Thank you Matt,

I haven't seen 634 before. Should I merge my tickets as sub-tasks
addressing some of the points your bring up there?

Also, I have a note about elasticsearch_config_path which is set in
metron-deployment/roles/metron_streaming/defaults/main.yml. It seems
like it is not used anywhere in the code base.

Please let me know how I should proceed with two tickets I have created
that are relevant to yours.

- Dima

On 01/01/2017 04:43 AM, Matt Foley wrote:
> Hi Dima,
> Great to have the how-to doc in the wiki where it belongs.  Now we have a 
doc to edit as we improve the install process :-)
>
> Did you look at https://issues.apache.org/jira/browse/METRON-634 before 
opening METRON-642?  Please see my comments in the Jira for METRON-642.
> Thanks,
> --Matt
>
>
>
> On 12/30/16, 11:44 PM, "Dima Kovalyov" <dima.koval...@sstech.us> wrote:
>
> Hey,
> 
> I wanted to finish what I've started with document for Metron with HDP
> 2.5, so I have migrated document (with minor text fixes and
> clarifications) to here:
> 
https://cwiki.apache.org/confluence/display/METRON/Metron+with+HDP+2.5+bare-metal+install
> Old google doc was replaced with the link to this article.
> 
> I also, created number of pull requests to fix minor bugs here and 
there
> and created these two tickets: METRON-641 and METRON-642.
> Please let me know if I did something out of proper procedure.
> 
> Also, I agree that we should eventually strip HDP related steps from 
the
> document, so in the end it will be like:
> 1. Build Mpack
> 2. Add to Ambari
> 3. Assigned Masters and Slave
> 4. PROFIT
> But since we are where we are, let's leave it like that and fix all 
the
> bugs first.
>     
    > p.s. have a happy holidays everyone
> 
> - Dima
> 
> On 12/16/2016 04:21 AM, Matt Foley wrote:
> > I seem to have found the difficulty.  It will NOT show up on any 
system that has /bin/java defined, which may account for why other folks with 
Centos7 test systems aren’t seeing the behavior.
> >
> > On my Centos7 test system, it so happens that /bin/java is not 
defined, even though $JAVA_HOME is correctly defined, and “$JAVA_HOME/bin” is 
in the PATH.  In Centos7, when services launch through the (new in 7) systemctl 
process, it drops all inherited environment variables and starts over fresh.  
Although the systemd launch script 
/usr/lib/systemd/system/elasticsearch.service does read in the 
/etc/sysconfig/elasticsearch as an “EnvironmentFile”, it does not include 
JAVA_HOME.
> >
> > When, eventually, the user-level launcher script at 
/usr/share/elasticsearch/bin/elasticsearch gets invoked, JAVA_HOME is still 
undefined.  But it looks for $JAVA_HOME/bin/java, so if “/bin/java” is linked 
in the file system, then it’s good!  But if not, the launcher script dies.  
Regrettably that launcher script, even though it is fairly complex, does not 
write to any log file, and its stdout was closed long ago by the service-level 
launcher.  So I had to hack it to see what it was doing.
> >
> > The solution is to simply write JAVA_HOME={{java64_home}} into the 
elastic-sysconfig template.
> >
> > BTW, while munging thru code I reached the conclusion that 
elastic-env.sh is basically orphaned.  Does anyone know of scripts that source 
it? (Of course elastic-env.xml is still importa

Re: Custom Storm Topologies

2017-01-02 Thread Matt Foley

Should we consider a script calling capability that can launch a streaming 
script and keep it alive and fed, long-term, rather than launching the script 
anew every time the Stellar function is invoked?  I’m thinking two basic rules: 
 Write a line, read a line; and always have a timeout.  Prob need a UID of some 
sort for a cache of running process objects.

--Matt

On 1/2/17, 8:50 AM, "Carolyn Duby"  wrote:


Inserting a script inline is ok for low throughput and prototyping but once 
you get higher throughput (millions of events per second), it’s probably going 
to be a bottleneck.


For Metron-571 you might want to consider a java based extension plugin 
similar to Eclipse plugins.

Thanks
Carolyn

On 12/31/16, 5:22 PM, "Tyler Moore"  wrote:

>Thanks Jon,
>
>I'll look over the tutorial and put something together for the SHELL_EXEC
>stellar function.
>I don't believe I have permissions to assign in Jira if you want to assign
>to me my username is devopsec.
>I'll post back details and we can review security issues
>
>Regards,
>
>Tyler Moore
>Software Engineer
>Phone: 248-909-2769
>Email: moore.ty...@goflyball.com
>
>
>On Sat, Dec 31, 2016 at 9:46 AM, zeo...@gmail.com  wrote:
>
>> Casey did a tutorial on how to add your own Stellar function here
>>  - there is not an existing
>> function that does this (current functions are listed here
>> > metron-platform/metron-common#stellar-core-functions>).
>> I noticed that some of the Stellar function documentation was a bit dated
>> so I've opened a PR to update it here
>> .
>>
>> As this is something I need as well, I'd be happy to assist you where I
>> can.  Perhaps you want to self-assign METRON-571
>> ?  I do have some
>> security concerns with a SHELL_EXEC function because it could result in 
RCE
>> - if that's the route you go I could probably help with a thorough secure
>> code review.
>>
>> Jon
>>
>> On Fri, Dec 30, 2016 at 10:43 PM Tyler Moore  
wrote:
>>
>> Thank you everyone for your suggestions,
>>
>> I believe that kicking off the function via stellar would be the optimal
>> solution. If anyone has an example of calling external code via stellar
>> that would be very helpful. Thanks!
>>
>> Regards,
>>
>> Tyler Moore
>> IT Specialist
>> tyler.math...@yahoo.com
>> 248-909-2769 <(248)%20909-2769>
>>
>> > On Dec 30, 2016, at 17:54, Otto Fowler  wrote:
>> >
>> > They are all extension points.
>> >
>> >> On December 30, 2016 at 16:34:58, zeo...@gmail.com (zeo...@gmail.com)
>> wrote:
>> >>
>> >> Right but unless I'm missing something, both of those options are more
>> >> rigid and the MaaS service would have an unnecessary delay as opposed 
to
>> >> doing it entirely in Stellar.  Unless there's a reason to do otherwise
>> that
>> >> I'm missing, I would think doing this in Stellar gives you a more 
timely
>> >> and (re)configurable end result.
>> >>
>> >> Jon
>> >>
>> >>> On Fri, Dec 30, 2016, 16:22 Otto Fowler 
>> wrote:
>> >>>
>> >>> I think there are a couple of things you can do here.  There way to 
get
>> >>> something else into the split is to have another adapter to split to,
>> which
>> >>> is what I think you mean.  You can also integrate with MaaS and 
create
>> a
>> >>> service that you can call via STELLAR.
>> >>>
>> >>>
>> >>>
>> >>> On December 30, 2016 at 15:08:48, Otto Fowler 
(ottobackwa...@gmail.com
>> )
>> >>> wrote:
>> >>>
>> >>> Or a Maas service?
>> >>>
>> >>>
>> >>> On December 30, 2016 at 13:52:06, zeo...@gmail.com (zeo...@gmail.com)
>> >>> wrote:
>> >>>
>> >>> Depending on the details it sounds like a much simpler solution would
>> be
>> >>> to
>> >>> handle this in a Stellar function.
>> >>>
>> >>> Jon
>> >>>
>>  On Fri, Dec 30, 2016, 13:27 Tyler Moore  
wrote:
>> 
>>  Happy Holidays Metron Devs!
>> 
>>  Could anyone lend me some guidance on customizing the storm 
topologies
>> >>> in
>>  metron? What I am am trying to accomplish:
>> 
>>  1) Add a method to the threat intel joiner bolt that sends an http
>> post
>>  with the score of the threat to a remote rest api. This will
>> >>> conditionally
>>  trigger notifications based on user settings in another database

Re: [DISCUSS] Coding Guidelines

2016-12-21 Thread Matt Foley

Works for me, thanks.

On 12/21/16, 11:21 AM, "Casey Stella" <ceste...@gmail.com> wrote:

Sure, how about making it generic to "a deployed cluster"?

On Wed, Dec 21, 2016 at 2:20 PM, Matt Foley <ma...@apache.org> wrote:

> +1 on Casey’s first edit.  However, wrt the second, can we please not
> require vagrant?  Any of our single-node test deployments, including
> vagrant, ansible, mpack, or (soon :-) docker, should be acceptable.
>
> Thanks,
> --Matt (who can’t run vagrant workably on the systems available to me)
>
>
> On 12/21/16, 8:52 AM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> wrote:
>
> Agreed on Casey's addition to 2.5. What do you think about saying the
> pla
> should be stated on the PR, since that will be replicated to Jira
> automatically?
>
> On Wed, Dec 21, 2016 at 7:49 AM, Casey Stella <ceste...@gmail.com>
> wrote:
>
> > Oh, one more, I propose the following addition to 2.5:
> > >
> > > JIRAs will have a description of how to exercise the functionality
> in a
> > > step-by-step manner on a Quickdev vagrant instance to aid review
> and
> > > validation.
> >
> >
> > When Mike, Otto and I moved the system to the current version of
> Storm, we
> > needed a broader smoke test than just running data through that
> exercised a
> > variety of the features. We pulled those smoke tests from the 
various
> > discussions in the JIRAs.
> >
> >
> >
> > On Wed, Dec 21, 2016 at 9:38 AM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> > > We have been having a lively discussion on METRON-590 (see
> > > https://github.com/apache/incubator-metron/pull/395) around
> creating
> > > multiple abstractions to do the same (or very nearly the same)
> thing.
> > >
> > > I'd like to propose an addition to section 2.3 which reads:
> > >
> > >> Contributions which provide abstractions which are either very
> similar
> > to
> > >> or a subset of existing abstractions should use and extend
> existing
> > >> abstractions rather than provide competing abstractions unless
> > engineering
> > >> exigencies (e.g. performance ) make such an operation impossible
> without
> > >> compromising core functionality of the platform.
> > >
> > >
> > > I'd like to suggest the following anecdote from the early years of
> the
> > > codebase to justify the above:
> > >
> > > Stellar started as a predicate language only for threat triage
> rules. As
> > > such, when the task of creating Field Transformations came to me, 
I
> > needed
> > > something like Stellar except I needed it to return arbitrary
> objects,
> > > rather than just booleans. In my infinite wisdom, I chose to fork
> the
> > > language, create a second, more specific DSL for field
> transformations,
> > > thereby creating "Metron Query Language" and "Metron 
Transformation
> > > Language."
> > >
> > > I felt a nagging feeling at the time that I should just expand the
> query
> > > language, but I convinced myself that it would require too much
> testing
> > and
> > > it would be a change that was too broad in scope. It took 3 months
> for me
> > > to get around to unifying those languages and if we had more
> people using
> > > it, it would have been an absolute nightmare.
> > >
> > > On Wed, Dec 21, 2016 at 9:31 AM, Casey Stella <ceste...@gmail.com>
> > wrote:
> > >
> > >> Yeah, I +1 the notion of thorough automated tests.
> > >>
> > >> On Tue, Dec 20, 2016 at 4:36 PM, Matt Foley <ma...@apache.org>
> wrote:
> > >>
> > >>> Hard to mark diffs in text-only mode :-)  My proposed change is:
> > >>>
> > >>> >> All merged patches will be reviewed with the expectation that
>

Re: [DISCUSS] Coding Guidelines

2016-12-21 Thread Matt Foley

+1 on Casey’s first edit.  However, wrt the second, can we please not require 
vagrant?  Any of our single-node test deployments, including vagrant, ansible, 
mpack, or (soon :-) docker, should be acceptable.

Thanks,
--Matt (who can’t run vagrant workably on the systems available to me)


On 12/21/16, 8:52 AM, "Michael Miklavcic" <michael.miklav...@gmail.com> wrote:

Agreed on Casey's addition to 2.5. What do you think about saying the pla
should be stated on the PR, since that will be replicated to Jira
automatically?

On Wed, Dec 21, 2016 at 7:49 AM, Casey Stella <ceste...@gmail.com> wrote:

> Oh, one more, I propose the following addition to 2.5:
> >
> > JIRAs will have a description of how to exercise the functionality in a
> > step-by-step manner on a Quickdev vagrant instance to aid review and
> > validation.
>
>
> When Mike, Otto and I moved the system to the current version of Storm, we
> needed a broader smoke test than just running data through that exercised 
a
> variety of the features. We pulled those smoke tests from the various
> discussions in the JIRAs.
>
>
>
> On Wed, Dec 21, 2016 at 9:38 AM, Casey Stella <ceste...@gmail.com> wrote:
>
> > We have been having a lively discussion on METRON-590 (see
> > https://github.com/apache/incubator-metron/pull/395) around creating
> > multiple abstractions to do the same (or very nearly the same) thing.
> >
> > I'd like to propose an addition to section 2.3 which reads:
> >
> >> Contributions which provide abstractions which are either very similar
> to
> >> or a subset of existing abstractions should use and extend existing
> >> abstractions rather than provide competing abstractions unless
> engineering
> >> exigencies (e.g. performance ) make such an operation impossible 
without
> >> compromising core functionality of the platform.
> >
> >
> > I'd like to suggest the following anecdote from the early years of the
> > codebase to justify the above:
> >
> > Stellar started as a predicate language only for threat triage rules. As
> > such, when the task of creating Field Transformations came to me, I
> needed
> > something like Stellar except I needed it to return arbitrary objects,
> > rather than just booleans. In my infinite wisdom, I chose to fork the
> > language, create a second, more specific DSL for field transformations,
> > thereby creating "Metron Query Language" and "Metron Transformation
> > Language."
> >
> > I felt a nagging feeling at the time that I should just expand the query
> > language, but I convinced myself that it would require too much testing
> and
> > it would be a change that was too broad in scope. It took 3 months for 
me
> > to get around to unifying those languages and if we had more people 
using
> > it, it would have been an absolute nightmare.
> >
> > On Wed, Dec 21, 2016 at 9:31 AM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> >> Yeah, I +1 the notion of thorough automated tests.
> >>
> >> On Tue, Dec 20, 2016 at 4:36 PM, Matt Foley <ma...@apache.org> wrote:
> >>
> >>> Hard to mark diffs in text-only mode :-)  My proposed change is:
> >>>
> >>> >> All merged patches will be reviewed with the expectation that
> >>> thorough automated tests shall be provided and are consistent with …
> >>>
> >>>
> >>>  ^^
> >>> Added word “thorough” and changed “exist” to “shall be provided”.
> >>> Thanks,
> >>> --Matt
> >>>
> >>> On 12/20/16, 1:22 PM, "James Sirota" <jsir...@apache.org> wrote:
> >>>
> >>> Hi Matt, thats already in there. See last bullet point of 2.6
> >>>
> >>> 20.12.2016, 14:14, "Matt Foley" <ma...@apache.org>:
> >>> > If we aren't going to require 100% test coverage for new code,
> >>> then we should at least say "thorough" automated tests, in the last
> bullet
> >>> of 2.6. And it should be a mandate not an assumption:
> >>> >
> >>> > All merged patches will be reviewed with the expectation that
> >>>

Re: [DISCUSS] Coding Guidelines

2016-12-20 Thread Matt Foley

Hard to mark diffs in text-only mode :-)  My proposed change is:

>> All merged patches will be reviewed with the expectation that thorough 
>> automated tests shall be provided and are consistent with … 

      ^^
Added word “thorough” and changed “exist” to “shall be provided”.
Thanks,
--Matt

On 12/20/16, 1:22 PM, "James Sirota" <jsir...@apache.org> wrote:

Hi Matt, thats already in there. See last bullet point of 2.6
    
    20.12.2016, 14:14, "Matt Foley" <ma...@apache.org>:
> If we aren't going to require 100% test coverage for new code, then we 
should at least say "thorough" automated tests, in the last bullet of 2.6. And 
it should be a mandate not an assumption:
>
> All merged patches will be reviewed with the expectation that thorough 
automated tests shall be provided and are consistent with project testing 
methodology and practices, and cover the appropriate cases ( see reviewers 
guide )
>
> IMO,
> --Matt
>
> On 12/20/16, 12:51 PM, "James Sirota" <jsir...@apache.org> wrote:
>
> Good feedback. Here is the next iteration that accounts for your 
suggestions:
> 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235
>
> 1. How To Contribute
> We are always very happy to have contributions, whether for trivial 
cleanups, little additions or big new features.
> If you don't know Java or Scala you can still contribute to the 
project. We strongly value documentation and gladly accept improvements to the 
documentation.
> 1.1 Contributing A Code Change
> To submit a change for inclusion, please do the following:
> If there is not already a JIRA associated with your pull request, 
create it, assign it to yourself, and start progress
> If there is a JIRA already created for your change then assign it to 
yourself and start progress
> If you don't have access to JIRA or can't assign an issue to 
yourself, please message dev@metron.incubator.apache.org and someone will 
either give you permission or assign a JIRA to you
> If you are introducing a completely new feature or API it is a good 
idea to start a discussion and get consensus on the basic design first. Larger 
changes should be discussed on the dev boards before submission.
> New features and significant bug fixes should be documented in the 
JIRA and appropriate architecture diagrams should be attached. Major features 
may require a vote.
> Note that if the change is related to user-facing protocols / 
interface / configs, etc, you need to make the corresponding change on the 
documentation as well.
> Craft a pull request following the guidelines in Section 2 of this 
document
> Pull requests should be small to facilitate easier review. Studies 
have shown that review quality falls off as patch size grows. Sometimes this 
will result in many small PRs to land a single large feature.
> People will review and comment on your pull request. It is our job to 
follow up on pull requests in a timely fashion.
> Once the pull request is merged the person doing the merge 
(committer) should manually close the corresponding JIRA.
> 1.2 Reviewing and merging patches
> Everyone is encouraged to review open pull requests. We only ask that 
you try and think carefully, ask questions and are excellent to one another. 
Code review is our opportunity to share knowledge, design ideas and make 
friends.
> When reviewing a patch try to keep each of these concepts in mind:
>
> Is the proposed change being made in the correct place? Is it a fix 
in a backend when it should be in the primitives? In Kafka vs Storm?
> What is the change being proposed? Is it based on Community 
recognized issues?
> Do we want this feature or is the bug they’re fixing really a bug?
> Does the change do what the author claims?
> Are there sufficient tests?
> Has it been documented?
> Will this change introduce new bugs?
>
> 2. Implementation
>
> 2.1 Grammar and style
> These are small things that are not caught by the automated style 
checkers.
> Does a variable need a better name?
> Should this be a keyword argument?
> In a PR, maintain the existing style of the file.
> Don’t combine code changes with lots of edits of whitespace or 
comments; it makes code review too difficult. It’s okay to fix an occasional 
comment or indenting, but if wholesale comment or whitespace changes are

Re: [DISCUSS] Coding Guidelines

2016-12-20 Thread Matt Foley

th
>>  great results.
>>
>>  Jon
>>
>>  On Tue, Dec 20, 2016 at 11:29 AM Michael Miklavcic <
>>  michael.miklav...@gmail.com> wrote:
>>
>>>   Were you thinking javadoc or something more? I wouldn't mind seeing us
>>>   produce a javadoc site, if we aren't already doing so.
>>>
>>>   On Dec 20, 2016 9:25 AM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
>>>
>>>   > Regarding documentation - while I'm not a huge fan of that approach
>
> (I
>>>   > would prefer to see documentation generated from the code), I think
>
> it
>>>   > could work in the short term. Having that outlined both in the 
coding
>>>   > guidelines and on the wiki would be important.
>>>   >
>>>   > I agree with the comments about author != committer, and 100% code
>>>   > coverage.
>>>   >
>>>   > Jon
>>>   >
>>>   > On Tue, Dec 20, 2016 at 11:10 AM James Sirota <jsir...@apache.org>
>>>   wrote:
>>>   >
>>>   > > In my view the lower-level documentation that should be source
>>>   controlled
>>>   > > with the code belongs on github and then use case documentation 
and
>>>   > > top-level architecture diagrams belong on the wiki. What do you
>
> think?
>>>   > >
>>>   > > I think if the author is not a committer and can't merge then the
>>>   > reviewer
>>>   > > should probably merge or the PR originator should ping the dev
>
> board to
>>>   > get
>>>   > > someone to merge the PR in. Does that seem reasonable to everyone?
>>>   > >
>>>   > > 18.12.2016, 13:10, "Kyle Richardson" <kylerichards...@gmail.com>:
>>>   > > > Couple of questions/comments:
>>>   > > >
>>>   > > > In 2.4, we talk about Javadoc and code comments but not too much
>>>   about
>>>   > > the
>>>   > > > user documentation. Should we, possibly in a section 4, give 
some
>>>   > > > recommendations on what should go into the README files versus 
on
>
> the
>>>   > > wiki?
>>>   > > > This could also help the reviewer know if the change is
>
> documented
>>>   > > > sufficiently.
>>>   > > >
>>>   > > > In 2.6, we say that 1 qualified reviewer (Apache committer or
>
> PPMC
>>>   > > member)
>>>   > > > other than the author of the PR must have given it a +1. In the
>
> case
>>>   > > where
>>>   > > > the author is not a committer (who could merge their own PR),
>
> should
>>>   we
>>>   > > > state that the reviewer will be responsible for the merge?
>>>   > > >
>>>   > > > -Kyle
>>>   > > >
>>>   > > > On Fri, Dec 16, 2016 at 6:39 PM, James Sirota 
<jsir...@apache.org>
>
>>>   > > wrote:
>>>   > > >
>>>   > > >> Lets move this back to the discuss thread since it's still
>>>   generating
>>>   > > that
>>>   > > >> many comments. Please post all your feedback and I will
>
> incorporate
>>>   > it
>>>   > > and
>>>   > > >> put it back to a vote.
>>>   > > >>
>>>   > > >> Thanks,
>>>   > > >> James
>>>   > > >>
>>>   > > >> 16.12.2016, 16:12, "Matt Foley" <ma...@apache.org>:
>>>   > > >> > +1
>>>   > > >> >
>>>   > > >> > In 2.2 (follow Sun guidelines), do you want to add the
>
> notation
>>>   > > “except
>>>   > > >> that indents are 2 spaces instead of 4”, as Hadoop does? Or 
does
>>>   the
>>>   > > Metron
>>>   > > >> community like 4-space indents? I see both in the Metron code.
>>>   > > >> >
>>>   > > >> > My +1 holds in either case.
>>>   > > >> > --Matt
>>>   > > >> >
>>>   > > >> > On 12/16/16, 9:34 AM, "James Sirota" <jsir...@apache.org>
>
> wrote:
>>>   > > >> >
>>>   > > >> > I incorporated the changes to the coding guidelines from our
>>>   > discuss
>>>   > > >> thread. I'd like to get them voted on to make them official.
>>>   > > >> >
>>>   > > >> > https://cwiki.apache.org/confluence/pages/viewpage.
>>>   > > >> action?pageId=61332235
>>>   > > >> >
>>>   > > >> > Please vote +1, -1, 0
>>>   > > >> >
>>>   > > >> > The vote will be open for 72 hours.
>>>   > > >> >
>>>   > > >> > ---
>>>   > > >> > Thank you,
>>>   > > >> >
>>>   > > >> > James Sirota
>>>   > > >> > PPMC- Apache Metron (Incubating)
>>>   > > >> > jsirota AT apache DOT org
>>>   > > >>
>>>   > > >> ---
>>>   > > >> Thank you,
>>>   > > >>
>>>   > > >> James Sirota
>>>   > > >> PPMC- Apache Metron (Incubating)
>>>   > > >> jsirota AT apache DOT org
>>>   > >
>>>   > > ---
>>>   > > Thank you,
>>>   > >
>>>   > > James Sirota
>>>   > > PPMC- Apache Metron (Incubating)
>>>   > > jsirota AT apache DOT org
>>>   > >
>>>   > --
>>>   >
>>>   > Jon
>>>   >
>>>   > Sent from my mobile device
>>>   >
>>  --
>>
>>  Jon
>>
>>  Sent from my mobile device
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS] Release Process

2016-12-20 Thread Matt Foley

1. Agree.  Being able to maintain the previous release train, with only 
critical or important bug fixes and security fixes (generally not new features) 
for users who are averse to frequent large changes, is very important for 
production use.  They get stability, while the mainline code proceeds as fast 
as the community wishes.
a. As Kyle points out, it is important to assure that all commits to the 
maintenance line also get made in the mainline (if relevant), to avoid the 
appearance of regressions in the mainline.  There should be a formal process 
for assuring this.  Possibilities are:
i. The release manager has a responsibility to review all commits to the maint 
line since last release, and make sure they were duplicated to the mainline 
(unless not relevant, which must also be determined).
ii. Reviewers refuse to accept PRs for the maint line unless they are twinned 
with PRs for corresponding changes in the mainline (unless not relevant, which 
must be stated by the submitter).  This should be reflected in Jira practices 
as well as PR practices.  Note Jira is poor at tracking multiple “Fix 
Version/s” values (due to the ambiguous use of “Fix version” to mean both 
“target version” and “done version”).  Most teams just clone jira tickets for 
multiple target releases.
2. Agree.  Being a release manager is a significant commitment of both time and 
care, and should be rotated around; both for the benefit of the individuals 
involved and so that at least 2 or 3 people are deeply familiar with the 
process at any given time.
--Matt

On 12/20/16, 8:15 AM, "James Sirota"  wrote:

You are correct.  This thread is about the release process:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854770

Does anyone have additional opinions on this?

1. Maintenance release would just contain patches to the existing release.  
Feature release would contain everything, including patches and new features. 
2. The intention is to rotate the build manager.  I did it for the first 
few releases, then Casey did it for the next few releasees, someone else will 
probably do it for the next few releases, etc...

Does this seem reasonable to everyone?

Thanks,
James 

18.12.2016, 18:15, "Kyle Richardson" :
> I think this thread got commingled with the discussion on Coding
> Guidelines. The wiki page on the Release Process is at
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854770.
>
> Overall, a really informative document. Thanks for pulling this together.
> Two questions:
>
> 1) I'm a little confused about how the feature release and maintenance
> release branches are going to work. Is the idea that all PRs will be 
merged
> into master and then also be committed to a FR++ or a MR++ branch (or 
maybe
> even both)?
>
> 2) Are these steps to be taken by a release manager only or is the
> intention that other committers or PMC members rotate through this
> responsibly? Just curious. I actually kind of like the idea of shuffling
> the duty every now and then to avoid burnout by one person.
>
> -Kyle
>
> On Fri, Dec 16, 2016 at 1:31 PM, James Sirota  wrote:
>
>>  fixed the link and made one addition that a qualified reviewer is a
>>  committer or PPMC member
>>
>>  16.12.2016, 11:07, "zeo...@gmail.com" :
>>  > Right, I agree. That change looks good to me.
>>  >
>>  > Looks like the Log4j levels links is broken too.
>>  >
>>  > For a broken travis - how about "If somehow the tests get into a 
failing
>>  > state on master (such as by a backwards incompatible release of a
>>  > dependency) only pull requests intended to rectify master may be 
merged,
>>  > and the removal or disabling of any tests must be +1'd by two 
reviewers."
>>  >
>>  > Also, reading through this, should there should be a delineation 
between
>>  a
>>  > "reviewer" and somebody who has the ability to vote/+1 a PR? Unless 
I'm
>>  > missing something, right now it looks open to anybody.
>>  >
>>  > Jon
>>  >
>>  > On Fri, Dec 16, 2016 at 12:48 PM Nick Allen  
wrote:
>>  >
>>  > Personally, I don't think it matters who merges the pull request. As 
long
>>  > as you meet the requirements for code review, then anyone should be 
able
>>  to
>>  > merge it. In fact, I'd rather have the person who knows most about the
>>  > change actually merge it into master to ensure that it goes smoothly.
>>  >
>>  > On Fri, Dec 16, 2016 at 12:15 PM, James Sirota 
>>  wrote:
>>  >
>>  >> Jon, for #2 I changed it to: A committer may merge their own pull
>>  request,
>>  >> but only after a second reviewer has given it a +1.
>>  >>

Re: [VOTE] Modify Bylaws

2016-12-16 Thread Matt Foley

Um, should have stated “non-binding”, on both recents.

On 12/16/16, 3:17 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:

+1

On 12/16/16, 10:30 AM, "Nick Allen" <n...@nickallen.org> wrote:

I am reading the aggregate effect of these changes as a veto only exists
for a code commit.  For all other votes, there is no such thing as a 
veto.

+1

On Fri, Dec 16, 2016 at 1:13 PM, James Sirota <jsir...@apache.org> 
wrote:

> Sorry, cut and paste error. Of course the original text currently 
says the
> following:
>
> -1 – This is a negative vote. On issues where consensus is required, 
this
> vote counts as a veto. All vetoes must contain an explanation of why 
the
> veto is appropriate. Vetoes with no explanation are void. It may also 
be
> appropriate for a -1 vote to include an alternative course of action.
>
> 16.12.2016, 10:54, "Nick Allen" <n...@nickallen.org>:
> > I don't see any changes in your "Change 1". Am I missing it? What
> changed?
> >
> > On Fri, Dec 16, 2016 at 12:01 PM, James Sirota <jsir...@apache.org>
> wrote:
> >
> >>  Based on the discuss thread I propose the following changes:
> >>
> >>  Change 1 - Replace:
> >>
> >>  -1 – This is a negative vote. On issues where consensus is 
required,
> this
> >>  vote counts as a veto. Vetoes are only valid for code commits and 
must
> >>  include a technical explanation of why the veto is appropriate. 
Vetoes
> with
> >>  no or non-technical explanation are void. On issues where a 
majority is
> >>  required, -1 is simply a vote against. In either case, it may 
also be
> >>  appropriate for a -1 vote to include a proposed alternative 
course of
> >>  action.
> >>
> >>  With
> >>
> >>  -1 – This is a negative vote. On issues where consensus is 
required,
> this
> >>  vote counts as a veto. Vetoes are only valid for code commits and 
must
> >>  include a technical explanation of why the veto is appropriate. 
Vetoes
> with
> >>  no or non-technical explanation are void. On issues where a 
majority is
> >>  required, -1 is simply a vote against. In either case, it may 
also be
> >>  appropriate for a -1 vote to include a proposed alternative 
course of
> >>  action.
> >>
> >>  Change 2 - Replace:
> >>
> >>  A valid, binding veto cannot be overruled. If a veto is cast, it 
must
> be
> >>  accompanied by a valid reason explaining the reasons for the 
veto. The
> >>  validity of a veto, if challenged, can be confirmed by anyone who 
has a
> >>  binding vote. This does not necessarily signify agreement with the
> veto -
> >>  merely that the veto is valid. If you disagree with a valid veto, 
you
> must
> >>  lobby the person casting the veto to withdraw their veto. If a 
veto is
> not
> >>  withdrawn, any action that has already been taken must be 
reversed in a
> >>  timely manner.
> >>
> >>  With:
> >>
> >>  A valid, binding veto regarding a code commit cannot be 
overruled. If a
> >>  veto is cast, it must be accompanied by a valid technical 
explanation
> >>  giving the reasons for the veto. The technical validity of a 
veto, if
> >>  challenged, can be confirmed by anyone who has a binding vote. 
This
> does
> >>  not necessarily signify agreement with the veto - merely that the 
veto
> is
> >>  valid. If you disagree with a valid veto, you must lobby the 
person
> casting
> >>  the veto to withdraw their veto. If a veto is not withdrawn, any 
action
> >>  that has already been taken must be reversed in a timely manner.
> >>
> >>  Please vote +1, -1, 0
> >>
> >>  The vote will be open for 72 hours
> >>
> >>  ---
> >>  Thank you,
> >>
> >>  James Sirota
> >>  PPMC- Apache Metron (Incubating)
> >>  jsirota AT apache DOT org
> >
> > --
> > Nick Allen <n...@nickallen.org>
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>



-- 
Nick Allen <n...@nickallen.org>

Re: [VOTE] Modify Bylaws

2016-12-16 Thread Matt Foley

+1

On 12/16/16, 10:30 AM, "Nick Allen"  wrote:

I am reading the aggregate effect of these changes as a veto only exists
for a code commit.  For all other votes, there is no such thing as a veto.

+1

On Fri, Dec 16, 2016 at 1:13 PM, James Sirota  wrote:

> Sorry, cut and paste error. Of course the original text currently says the
> following:
>
> -1 – This is a negative vote. On issues where consensus is required, this
> vote counts as a veto. All vetoes must contain an explanation of why the
> veto is appropriate. Vetoes with no explanation are void. It may also be
> appropriate for a -1 vote to include an alternative course of action.
>
> 16.12.2016, 10:54, "Nick Allen" :
> > I don't see any changes in your "Change 1". Am I missing it? What
> changed?
> >
> > On Fri, Dec 16, 2016 at 12:01 PM, James Sirota 
> wrote:
> >
> >>  Based on the discuss thread I propose the following changes:
> >>
> >>  Change 1 - Replace:
> >>
> >>  -1 – This is a negative vote. On issues where consensus is required,
> this
> >>  vote counts as a veto. Vetoes are only valid for code commits and must
> >>  include a technical explanation of why the veto is appropriate. Vetoes
> with
> >>  no or non-technical explanation are void. On issues where a majority 
is
> >>  required, -1 is simply a vote against. In either case, it may also be
> >>  appropriate for a -1 vote to include a proposed alternative course of
> >>  action.
> >>
> >>  With
> >>
> >>  -1 – This is a negative vote. On issues where consensus is required,
> this
> >>  vote counts as a veto. Vetoes are only valid for code commits and must
> >>  include a technical explanation of why the veto is appropriate. Vetoes
> with
> >>  no or non-technical explanation are void. On issues where a majority 
is
> >>  required, -1 is simply a vote against. In either case, it may also be
> >>  appropriate for a -1 vote to include a proposed alternative course of
> >>  action.
> >>
> >>  Change 2 - Replace:
> >>
> >>  A valid, binding veto cannot be overruled. If a veto is cast, it must
> be
> >>  accompanied by a valid reason explaining the reasons for the veto. The
> >>  validity of a veto, if challenged, can be confirmed by anyone who has 
a
> >>  binding vote. This does not necessarily signify agreement with the
> veto -
> >>  merely that the veto is valid. If you disagree with a valid veto, you
> must
> >>  lobby the person casting the veto to withdraw their veto. If a veto is
> not
> >>  withdrawn, any action that has already been taken must be reversed in 
a
> >>  timely manner.
> >>
> >>  With:
> >>
> >>  A valid, binding veto regarding a code commit cannot be overruled. If 
a
> >>  veto is cast, it must be accompanied by a valid technical explanation
> >>  giving the reasons for the veto. The technical validity of a veto, if
> >>  challenged, can be confirmed by anyone who has a binding vote. This
> does
> >>  not necessarily signify agreement with the veto - merely that the veto
> is
> >>  valid. If you disagree with a valid veto, you must lobby the person
> casting
> >>  the veto to withdraw their veto. If a veto is not withdrawn, any 
action
> >>  that has already been taken must be reversed in a timely manner.
> >>
> >>  Please vote +1, -1, 0
> >>
> >>  The vote will be open for 72 hours
> >>
> >>  ---
> >>  Thank you,
> >>
> >>  James Sirota
> >>  PPMC- Apache Metron (Incubating)
> >>  jsirota AT apache DOT org
> >
> > --
> > Nick Allen 
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>



-- 
Nick Allen

Re: [VOTE] Coding Guidelines

2016-12-16 Thread Matt Foley

+1

In 2.2 (follow Sun guidelines), do you want to add the notation “except that 
indents are 2 spaces instead of 4”, as Hadoop does?  Or does the Metron 
community like 4-space indents?  I see both in the Metron code.

My +1 holds in either case.
--Matt

On 12/16/16, 9:34 AM, "James Sirota"  wrote:

I incorporated the changes to the coding guidelines from our discuss 
thread.  I'd like to get them voted on to make them official.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235

Please vote +1, -1, 0

The vote will be open for 72 hours.

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS] Modify Bylaws for veto clarification

2016-12-16 Thread Matt Foley

Great, agreed. --Matt

On 12/16/16, 9:00 AM, "James Sirota" <jsir...@apache.org> wrote:

Matt, I modified the requirement for 2 committers in our coding guidelines 
to a single review to be consistent with our bylaws. thank you for pointing 
that out

29.11.2016, 17:09, "Matt Foley" <ma...@apache.org>:
> Forgive me, but this is text editing so I’m going to get editorial.
>
> A. In the current Bylaws, 
https://cwiki.apache.org/confluence/display/METRON/Apache+Metron+Bylaws , there 
are two paragraphs that might be affected by this change. The first is a bullet 
under “Voting”, which says:
>
> -1 – This is a negative vote. On issues where consensus is required, this 
vote counts as a veto. All vetoes must contain an explanation of why the veto 
is appropriate. Vetoes with no explanation are void. It may also be appropriate 
for a -1 vote to include an alternative course of action.
>
> I suggest that this should read:
>
> -1 – This is a negative vote. On issues where consensus is required, this 
vote counts as a veto. Vetoes are only valid for code commits and must include 
a technical explanation of why the veto is appropriate. Vetoes with no or 
non-technical explanation are void. On issues where a majority is required, -1 
is simply a vote against. In either case, it may also be appropriate for a -1 
vote to include a proposed alternative course of action.
>
> B. Second, under “Approvals”, there is currently:
>
> A valid, binding veto cannot be overruled. If a veto is cast, it must be 
accompanied by a valid reason explaining the reasons for the veto. The validity 
of a veto, if challenged, can be confirmed by anyone who has a binding vote. 
This does not necessarily signify agreement with the veto - merely that the 
veto is valid. If you disagree with a valid veto, you must lobby the person 
casting the veto to withdraw their veto. If a veto is not withdrawn, any action 
that has already been taken must be reversed in a timely manner.
>
> I suggest that this should read:
>
> A valid, binding veto regarding a code commit cannot be overruled. If a 
veto is cast, it must be accompanied by a valid technical explanation giving 
the reasons for the veto. The technical validity of a veto, if challenged, can 
be confirmed by anyone who has a binding vote. This does not necessarily 
signify agreement with the veto - merely that the veto is valid. If you 
disagree with a valid veto, you must lobby the person casting the veto to 
withdraw their veto. If a veto is not withdrawn, any action that has already 
been taken must be reversed in a timely manner.
>
> C. The above changes impact the semantics of PMC votes for new committers 
and new PMC members. Under “Actions” these votes are specified to be by 
“consensus approval”. Consensus means “no -1 votes”, in other words a -1 is a 
veto. Yet we’ve just declared that vetoes are only valid for code changes, not 
people votes. So these parts of the “Actions” section need to be clarified.
>
> D. There is an inconsistency in the “Actions” : “Code Change” paragraph. 
It says “The code can be committed after the first +1.” But in 
https://cwiki.apache.org/confluence/display/METRON/Development+Guidelines , 
section “Merge requirements”, second bullet, it says “There should be 2 parties 
besides the committer that have reviewed the patch before merge.” This 
inconsistency should be resolved by changing one of the two sentences.
>
> Thanks,
> --Matt
>
> On 11/29/16, 3:30 PM, "Casey Stella" <ceste...@gmail.com> wrote:
>
> Yeah, I can agree with that. I believe the procedure for this is to 
vote
> on the bylaws change and a simple majority of the PMC members is 
required
> to ratify.
>
> On Tue, Nov 29, 2016 at 6:27 PM, James Sirota <jsir...@apache.org> 
wrote:
>
> > Hi Guys, any thoughts on this?
> >
> > 11.11.2016, 16:50, "James Sirota" <jsir...@apache.org>:
> > > going through the Apache Maturity Model we have to respond to the
> > following point:
> > >
> > > CS40In Apache projects, vetoes are only valid for code commits 
and are
> > justified by a technical explanation, as per the Apache voting rules
> > defined in CS30.
> > >
> > > The voting section of our bylaws does not currently explicitly 
define
> > this:
> > > 
https://cwiki.apache.org/confluence/display/METRON/Apache+Metron+Bylaws
> > >
> > > I propose to add the following bullet point to the Voting section 
of our
> > bylaws:
> > >

Re: Confluence write access to a space

2016-12-15 Thread Matt Foley

I seem to have found the difficulty.  It will NOT show up on any system that 
has /bin/java defined, which may account for why other folks with Centos7 test 
systems aren’t seeing the behavior.

On my Centos7 test system, it so happens that /bin/java is not defined, even 
though $JAVA_HOME is correctly defined, and “$JAVA_HOME/bin” is in the PATH.  
In Centos7, when services launch through the (new in 7) systemctl process, it 
drops all inherited environment variables and starts over fresh.  Although the 
systemd launch script /usr/lib/systemd/system/elasticsearch.service does read 
in the /etc/sysconfig/elasticsearch as an “EnvironmentFile”, it does not 
include JAVA_HOME.

When, eventually, the user-level launcher script at 
/usr/share/elasticsearch/bin/elasticsearch gets invoked, JAVA_HOME is still 
undefined.  But it looks for $JAVA_HOME/bin/java, so if “/bin/java” is linked 
in the file system, then it’s good!  But if not, the launcher script dies.  
Regrettably that launcher script, even though it is fairly complex, does not 
write to any log file, and its stdout was closed long ago by the service-level 
launcher.  So I had to hack it to see what it was doing.

The solution is to simply write JAVA_HOME={{java64_home}} into the 
elastic-sysconfig template.

BTW, while munging thru code I reached the conclusion that elastic-env.sh is 
basically orphaned.  Does anyone know of scripts that source it? (Of course 
elastic-env.xml is still important, I’m only asking about the elastic-env.sh 
file templated from it.)

Thanks,
--Matt

On 12/14/16, 2:41 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:

No, node.data and node.master are both correctly set to true (with Ambari’s 
agreement/participation) in the elasticsearch.yml file in CONF_DIR, and this is 
being correctly picked up by ES when launched interactively.  I really think 
this is in the service management stuff in /etc/init.d/elasticsearch and 
/usr/share/elasticsearch/bin/elasticsearch .  Remains to be proven, of course…

The reason I think ES isn’t even being successfully launched by systemd, is 
there is zero logging anywhere, except in ambari where it shows nothing but a 
successful service launch.  No files created in /var/log/elasticsearch, which 
all scripts agree is the value of LOG_DIR, despite permissions set to 
“drwxr-xr-x. elasticsearch elasticsearch”

Thanks,
--Matt

On 12/14/16, 2:07 PM, "David Lyle" <dlyle65...@gmail.com> wrote:

Aha! There's your problem. :)

Kidding aside, that is weird. I would expect the ES instance to come up and
go status red right away, not up and die.

I did have a horrible, horrible hack that made all that work, it involved
modifying the stored es templates to both have node.master and node.data
set to true in
/var/lib/ambari-server/resources/common-services/ELASTICSEARCH (from
memory, path may be a bit off). It occurred to me that an easy incremental
step toward to full METRON-608 would be to simply expose the templates to
the config pages...

-D...

On Wed, Dec 14, 2016 at 4:59 PM, Matt Foley <mfo...@hortonworks.com> wrote:

> David,
> No, it’s in my METRON-608 single-node deployment  :-)
>
> On 12/14/16, 12:46 PM, "David Lyle" <dlyle65...@gmail.com> wrote:
>
> Hi Matt and Jon,
>
> FWIW, Metron with the MPack has been tested extensively on CentOS 7.
> Works
> like a champ. The issue right now is that sensor install is a CentOS
> 6/Ansible proposition.
>
> I haven't seen the issue with ES that you're experiencing, Matt. Is
> that in
> a 1 Master 3 Data node ES config or something else?
>
> -D...
>
>
> On Wed, Dec 14, 2016 at 3:29 PM, Matt Foley <mfo...@hortonworks.com>
> wrote:
>
> > I hope we will try to support Centos7. Many of my company’s
> customers are
> > requiring new installs to use RHEL 7 or Centos 7 rather than 6.  In
> > addition, we have the benefit that Centos 6 RPMs basically always
> run fine
> > in Centos7 (barring heavy-duty low-level system manipulations that
> Metron,
> > Kibana, and ES just don’t use), ES doesn’t distinguish between them,
> and
> > Kibana install support for Centos7 is already in place in the
> Mpack.  And
> > of course the Hadoop Stack runs fine in 7.
> >
> > For what it’s worth, I’ve been working with the Mpack installation 
on
> > Centos7 for the last few weeks, and it works fine except for a
> persistent
> > issue where the Elasticsearch service works fine if launched
> interactively
> > but terminates immediately if launched w

Re: Confluence write access to a space

2016-12-14 Thread Matt Foley

No, node.data and node.master are both correctly set to true (with Ambari’s 
agreement/participation) in the elasticsearch.yml file in CONF_DIR, and this is 
being correctly picked up by ES when launched interactively.  I really think 
this is in the service management stuff in /etc/init.d/elasticsearch and 
/usr/share/elasticsearch/bin/elasticsearch .  Remains to be proven, of course…

The reason I think ES isn’t even being successfully launched by systemd, is 
there is zero logging anywhere, except in ambari where it shows nothing but a 
successful service launch.  No files created in /var/log/elasticsearch, which 
all scripts agree is the value of LOG_DIR, despite permissions set to 
“drwxr-xr-x. elasticsearch elasticsearch” 

Thanks,
--Matt

On 12/14/16, 2:07 PM, "David Lyle" <dlyle65...@gmail.com> wrote:

Aha! There's your problem. :)

Kidding aside, that is weird. I would expect the ES instance to come up and
go status red right away, not up and die.

I did have a horrible, horrible hack that made all that work, it involved
modifying the stored es templates to both have node.master and node.data
set to true in
/var/lib/ambari-server/resources/common-services/ELASTICSEARCH (from
memory, path may be a bit off). It occurred to me that an easy incremental
step toward to full METRON-608 would be to simply expose the templates to
the config pages...

-D...

On Wed, Dec 14, 2016 at 4:59 PM, Matt Foley <mfo...@hortonworks.com> wrote:

> David,
> No, it’s in my METRON-608 single-node deployment  :-)
>
> On 12/14/16, 12:46 PM, "David Lyle" <dlyle65...@gmail.com> wrote:
>
> Hi Matt and Jon,
>
> FWIW, Metron with the MPack has been tested extensively on CentOS 7.
> Works
> like a champ. The issue right now is that sensor install is a CentOS
> 6/Ansible proposition.
>
> I haven't seen the issue with ES that you're experiencing, Matt. Is
> that in
> a 1 Master 3 Data node ES config or something else?
>
> -D...
>
>
> On Wed, Dec 14, 2016 at 3:29 PM, Matt Foley <mfo...@hortonworks.com>
> wrote:
>
> > I hope we will try to support Centos7. Many of my company’s
> customers are
> > requiring new installs to use RHEL 7 or Centos 7 rather than 6.  In
> > addition, we have the benefit that Centos 6 RPMs basically always
> run fine
> > in Centos7 (barring heavy-duty low-level system manipulations that
> Metron,
> > Kibana, and ES just don’t use), ES doesn’t distinguish between them,
> and
> > Kibana install support for Centos7 is already in place in the
> Mpack.  And
> > of course the Hadoop Stack runs fine in 7.
> >
> > For what it’s worth, I’ve been working with the Mpack installation 
on
> > Centos7 for the last few weeks, and it works fine except for a
> persistent
> > issue where the Elasticsearch service works fine if launched
> interactively
> > but terminates immediately if launched with same arguments as a
> service.
> > Hope to find the cause in the next couple days.  (If anyone knows
> why,
> > would love to hear.  The daemonized launch scripts come from ES, not
> from
> > Metron, so “should just work”.)
> >
> > Thanks,
> > --Matt
> >
> > On 12/13/16, 6:46 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
> >
> > For now, I'm not sure what the solution is, but I would think
> choosing
> > one
> > specific list of required/supported software (including OS) and
> > documenting
> > that thoroughly is the right start.  Something like what
> currently
> > exists
> > for the vagrant side of things.
> >
> > That said, long term I would love to see broader support and
> > automatically
> > generating documentation derived from the code.  Isn't CentOS 6
> still
> > preferred over 7?  That's what I've been working on solely for
> that
> > reason.
> >
> > Jon
> >
> > On Tue, Dec 13, 2016, 21:07 Matt Foley <ma...@apache.org> wrote:
> >
> > > In my work on METRON-608, I’ve found a lot of small but
> significant
> > bugs
> > > in the existing

Re: Confluence write access to a space

2016-12-14 Thread Matt Foley

David,
No, it’s in my METRON-608 single-node deployment  :-)

On 12/14/16, 12:46 PM, "David Lyle" <dlyle65...@gmail.com> wrote:

Hi Matt and Jon,

FWIW, Metron with the MPack has been tested extensively on CentOS 7. Works
like a champ. The issue right now is that sensor install is a CentOS
6/Ansible proposition.

I haven't seen the issue with ES that you're experiencing, Matt. Is that in
a 1 Master 3 Data node ES config or something else?

-D...


On Wed, Dec 14, 2016 at 3:29 PM, Matt Foley <mfo...@hortonworks.com> wrote:

> I hope we will try to support Centos7. Many of my company’s customers are
> requiring new installs to use RHEL 7 or Centos 7 rather than 6.  In
> addition, we have the benefit that Centos 6 RPMs basically always run fine
> in Centos7 (barring heavy-duty low-level system manipulations that Metron,
> Kibana, and ES just don’t use), ES doesn’t distinguish between them, and
> Kibana install support for Centos7 is already in place in the Mpack.  And
> of course the Hadoop Stack runs fine in 7.
>
> For what it’s worth, I’ve been working with the Mpack installation on
> Centos7 for the last few weeks, and it works fine except for a persistent
> issue where the Elasticsearch service works fine if launched interactively
> but terminates immediately if launched with same arguments as a service.
> Hope to find the cause in the next couple days.  (If anyone knows why,
> would love to hear.  The daemonized launch scripts come from ES, not from
> Metron, so “should just work”.)
>
> Thanks,
> --Matt
>
> On 12/13/16, 6:46 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
>
> For now, I'm not sure what the solution is, but I would think choosing
> one
> specific list of required/supported software (including OS) and
> documenting
> that thoroughly is the right start.  Something like what currently
> exists
> for the vagrant side of things.
>
> That said, long term I would love to see broader support and
> automatically
> generating documentation derived from the code.  Isn't CentOS 6 still
> preferred over 7?  That's what I've been working on solely for that
> reason.
>
> Jon
>
> On Tue, Dec 13, 2016, 21:07 Matt Foley <ma...@apache.org> wrote:
>
> > In my work on METRON-608, I’ve found a lot of small but significant
> bugs
> > in the existing Mpack (version 0.3.0).  These bugs integrate with a
> lot of
> > the oddball tweaks specified in the draft Install docs I saw.  I
> would like
> > to submit a PR for these bugs, but they must be accompanied by
> changes in
> > the Install doc.
> >
> > How do we want to manage this?  Should we version the Install doc?
> Or
> > have multiple sections for the different versions?  Or have
> footnotes or
> > sidebar comments about which version certain paragraphs do and do
> not apply
> > to?
> >
> > I’m inclined to use sidebars within the document, because the
> dependencies
> > aren’t just on which versions of HDP, Ambari, and Metron you use.
> They
> > also depend on Python 2.6 vs 2.7, and Centos 6 vs 7.
> >
> > Thanks,
> > --Matt
> >
> >
> > On 12/13/16, 4:13 PM, "zeo...@gmail.com" <zeo...@gmail.com> wrote:
> >
> > Sorry about the delay.  To be honest I delayed once I saw the
> > management UI
> > PR because I was going to wait for that to be merged into master
> > before I
> > did my mpack install.  My initial thought is that there may need
> to be
> > some
> > sort of a merger between what you provided and the Hortonworks
> blog
> > post,
> > as I would prefer a single, comprehensive post over
> fragmentation.
> >
> > Jon
> >
> > On Tue, Dec 13, 2016, 18:29 Dima Kovalyov <
> dima.koval...@sstech.us>
> > wrote:
> >
> > > Thank you Jon,
> > >
> > > Just wondering if you have finished polishing the document? I
> will
> > have
> > > some time this week to polish and publish it if there is
> anything
&g

1 2 >

1 - 100 of 123 matches

Mail list logo