[GitHub] incubator-metron pull request #408: METRON-608 Mpack to install a single-nod...

2017-01-02 Thread mattf-horton
GitHub user mattf-horton opened a pull request:

https://github.com/apache/incubator-metron/pull/408

METRON-608 Mpack to install a single-node test cluster

This "metron-mpack-singlenode" is very similar to "metron-mpack", and can 
be usefully compared with it for code review, altho it also includes many bug 
fixes proposed in METRON-634.  As it says in the Jira, this is a short-term fix 
by providing a completely separate Mpack just for the single-node scenario, 
until we can work the bugs out of an mpack that works on all sizes of cluster.

This was tested on a Centos7 VM with 16GB of RAM, and all prerequisites 
installed, including Python 2.7.11.  The usual install process was followed, 
with Ambari 2.4.2 and HDP stack 2.5.3.  After successful startup, a few 
thousand 'bro' packets were injected, and were seen to correctly propagate 
through parser, enricher, and indexer topologies, and finally into 
elasticsearch.

An outstanding issue is that the configuration validator in the mpack 
doesn't work.  It fails with "unknown error" and must be ignored.  The 
multi-node mpack already had this problem, and I have run out of time to try to 
fix it here.  Consequently, my enhancement of trying to automatically flag the 
user if less than five Storm Supervisor Slots are allocated (commit 9ad8706), 
is untested.  I would like to open a new Jira for this issue, since it was 
pre-existing before these additions.
Thanks,
--Matt

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mattf-horton/incubator-metron METRON-608

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #408


commit 05571b501222b91457bd2f1b2ccc096af015f4cb
Author: mattf-horton 
Date:   2016-12-08T05:31:40Z

METRON-608 Mpack to install a single-node test cluster

commit f2245900cd94389bc1c004c60f1606a49ca6ea8c
Author: mattf-horton 
Date:   2016-12-10T01:01:41Z

Add quicklinks to Elasticsearch Service page in Ambari, for health and 
indexes.

commit 542a6f01376a656b539040992e9fa727efd8a047
Author: mattf-horton 
Date:   2016-12-10T01:02:40Z

METRON-608 fix several bugs, especially in METRON use of 'es_url'

commit fb646fdda345107b0debd8395743630cc5e0
Author: mattf-horton 
Date:   2016-12-10T09:05:43Z

update version to 0.3.0

commit 8e30344e0b50fdff0cf8a1a66e7f41868a384171
Author: mattf-horton 
Date:   2016-12-10T09:09:30Z

Merge/rebase updates from 0.3.0

commit 9ad8706e5209ca9506710ca6498ac387e667ddf0
Author: mattf-horton 
Date:   2016-12-13T02:03:07Z

METRON-608 add service_advisor validation for number of Storm slots, 
enhance README, and a couple bug fixes

commit d4c97e6de4838265b72139727a5fd1082388117b
Author: mattf-horton 
Date:   2016-12-13T05:48:22Z

tweak pom.xml file

commit 95d83702ef844bc94c2ce54bccaebb7b7cb0513a
Author: mattf-horton 
Date:   2016-12-20T00:54:07Z

METRON-608 multiple small bug fixes

commit b5b3a348527c03bc778a5fc5dd0d94ec3a00470b
Author: mattf-horton 
Date:   2016-12-20T01:39:37Z

rebase to Dec 19, 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Confluence write access to a space

2017-01-02 Thread Dima Kovalyov
Thank you Matt,

I haven't seen 634 before. Should I merge my tickets as sub-tasks
addressing some of the points your bring up there?

Also, I have a note about elasticsearch_config_path which is set in
metron-deployment/roles/metron_streaming/defaults/main.yml. It seems
like it is not used anywhere in the code base.

Please let me know how I should proceed with two tickets I have created
that are relevant to yours.

- Dima

On 01/01/2017 04:43 AM, Matt Foley wrote:
> Hi Dima,
> Great to have the how-to doc in the wiki where it belongs.  Now we have a doc 
> to edit as we improve the install process :-)
>
> Did you look at https://issues.apache.org/jira/browse/METRON-634 before 
> opening METRON-642?  Please see my comments in the Jira for METRON-642.
> Thanks,
> --Matt
>
>
>
> On 12/30/16, 11:44 PM, "Dima Kovalyov"  wrote:
>
> Hey,
> 
> I wanted to finish what I've started with document for Metron with HDP
> 2.5, so I have migrated document (with minor text fixes and
> clarifications) to here:
> 
> https://cwiki.apache.org/confluence/display/METRON/Metron+with+HDP+2.5+bare-metal+install
> Old google doc was replaced with the link to this article.
> 
> I also, created number of pull requests to fix minor bugs here and there
> and created these two tickets: METRON-641 and METRON-642.
> Please let me know if I did something out of proper procedure.
> 
> Also, I agree that we should eventually strip HDP related steps from the
> document, so in the end it will be like:
> 1. Build Mpack
> 2. Add to Ambari
> 3. Assigned Masters and Slave
> 4. PROFIT
> But since we are where we are, let's leave it like that and fix all the
> bugs first.
> 
> p.s. have a happy holidays everyone
> 
> - Dima
> 
> On 12/16/2016 04:21 AM, Matt Foley wrote:
> > I seem to have found the difficulty.  It will NOT show up on any system 
> that has /bin/java defined, which may account for why other folks with 
> Centos7 test systems aren’t seeing the behavior.
> >
> > On my Centos7 test system, it so happens that /bin/java is not defined, 
> even though $JAVA_HOME is correctly defined, and “$JAVA_HOME/bin” is in the 
> PATH.  In Centos7, when services launch through the (new in 7) systemctl 
> process, it drops all inherited environment variables and starts over fresh.  
> Although the systemd launch script 
> /usr/lib/systemd/system/elasticsearch.service does read in the 
> /etc/sysconfig/elasticsearch as an “EnvironmentFile”, it does not include 
> JAVA_HOME.
> >
> > When, eventually, the user-level launcher script at 
> /usr/share/elasticsearch/bin/elasticsearch gets invoked, JAVA_HOME is still 
> undefined.  But it looks for $JAVA_HOME/bin/java, so if “/bin/java” is linked 
> in the file system, then it’s good!  But if not, the launcher script dies.  
> Regrettably that launcher script, even though it is fairly complex, does not 
> write to any log file, and its stdout was closed long ago by the 
> service-level launcher.  So I had to hack it to see what it was doing.
> >
> > The solution is to simply write JAVA_HOME={{java64_home}} into the 
> elastic-sysconfig template.
> >
> > BTW, while munging thru code I reached the conclusion that 
> elastic-env.sh is basically orphaned.  Does anyone know of scripts that 
> source it? (Of course elastic-env.xml is still important, I’m only asking 
> about the elastic-env.sh file templated from it.)
> >
> > Thanks,
> > --Matt
> >
> >
> > On 12/14/16, 2:41 PM, "Matt Foley"  wrote:
> >
> > No, node.data and node.master are both correctly set to true (with 
> Ambari’s agreement/participation) in the elasticsearch.yml file in CONF_DIR, 
> and this is being correctly picked up by ES when launched interactively.  I 
> really think this is in the service management stuff in 
> /etc/init.d/elasticsearch and /usr/share/elasticsearch/bin/elasticsearch .  
> Remains to be proven, of course…
> >
> > The reason I think ES isn’t even being successfully launched by 
> systemd, is there is zero logging anywhere, except in ambari where it shows 
> nothing but a successful service launch.  No files created in 
> /var/log/elasticsearch, which all scripts agree is the value of LOG_DIR, 
> despite permissions set to “drwxr-xr-x. elasticsearch elasticsearch”
> >
> > Thanks,
> > --Matt
> >
> > On 12/14/16, 2:07 PM, "David Lyle"  wrote:
> >
> > Aha! There's your problem. :)
> > 
> > Kidding aside, that is weird. I would expect the ES instance to 
> come up and
> > go status red right away, not up and die.
> > 
> > I did have a horrible, horrible hack that made all that work, it 
> involved
> > modifying the stored es templates to both have node.master and 
> node.data
> > set to true in

Re: Custom Storm Topologies

2017-01-02 Thread Matt Foley
Should we consider a script calling capability that can launch a streaming 
script and keep it alive and fed, long-term, rather than launching the script 
anew every time the Stellar function is invoked?  I’m thinking two basic rules: 
 Write a line, read a line; and always have a timeout.  Prob need a UID of some 
sort for a cache of running process objects.

--Matt

On 1/2/17, 8:50 AM, "Carolyn Duby"  wrote:


Inserting a script inline is ok for low throughput and prototyping but once 
you get higher throughput (millions of events per second), it’s probably going 
to be a bottleneck.


For Metron-571 you might want to consider a java based extension plugin 
similar to Eclipse plugins.

Thanks
Carolyn

On 12/31/16, 5:22 PM, "Tyler Moore"  wrote:

>Thanks Jon,
>
>I'll look over the tutorial and put something together for the SHELL_EXEC
>stellar function.
>I don't believe I have permissions to assign in Jira if you want to assign
>to me my username is devopsec.
>I'll post back details and we can review security issues
>
>Regards,
>
>Tyler Moore
>Software Engineer
>Phone: 248-909-2769
>Email: moore.ty...@goflyball.com
>
>
>On Sat, Dec 31, 2016 at 9:46 AM, zeo...@gmail.com  wrote:
>
>> Casey did a tutorial on how to add your own Stellar function here
>>  - there is not an existing
>> function that does this (current functions are listed here
>> > metron-platform/metron-common#stellar-core-functions>).
>> I noticed that some of the Stellar function documentation was a bit dated
>> so I've opened a PR to update it here
>> .
>>
>> As this is something I need as well, I'd be happy to assist you where I
>> can.  Perhaps you want to self-assign METRON-571
>> ?  I do have some
>> security concerns with a SHELL_EXEC function because it could result in 
RCE
>> - if that's the route you go I could probably help with a thorough secure
>> code review.
>>
>> Jon
>>
>> On Fri, Dec 30, 2016 at 10:43 PM Tyler Moore  
wrote:
>>
>> Thank you everyone for your suggestions,
>>
>> I believe that kicking off the function via stellar would be the optimal
>> solution. If anyone has an example of calling external code via stellar
>> that would be very helpful. Thanks!
>>
>> Regards,
>>
>> Tyler Moore
>> IT Specialist
>> tyler.math...@yahoo.com
>> 248-909-2769 <(248)%20909-2769>
>>
>> > On Dec 30, 2016, at 17:54, Otto Fowler  wrote:
>> >
>> > They are all extension points.
>> >
>> >> On December 30, 2016 at 16:34:58, zeo...@gmail.com (zeo...@gmail.com)
>> wrote:
>> >>
>> >> Right but unless I'm missing something, both of those options are more
>> >> rigid and the MaaS service would have an unnecessary delay as opposed 
to
>> >> doing it entirely in Stellar.  Unless there's a reason to do otherwise
>> that
>> >> I'm missing, I would think doing this in Stellar gives you a more 
timely
>> >> and (re)configurable end result.
>> >>
>> >> Jon
>> >>
>> >>> On Fri, Dec 30, 2016, 16:22 Otto Fowler 
>> wrote:
>> >>>
>> >>> I think there are a couple of things you can do here.  There way to 
get
>> >>> something else into the split is to have another adapter to split to,
>> which
>> >>> is what I think you mean.  You can also integrate with MaaS and 
create
>> a
>> >>> service that you can call via STELLAR.
>> >>>
>> >>>
>> >>>
>> >>> On December 30, 2016 at 15:08:48, Otto Fowler 
(ottobackwa...@gmail.com
>> )
>> >>> wrote:
>> >>>
>> >>> Or a Maas service?
>> >>>
>> >>>
>> >>> On December 30, 2016 at 13:52:06, zeo...@gmail.com (zeo...@gmail.com)
>> >>> wrote:
>> >>>
>> >>> Depending on the details it sounds like a much simpler solution would
>> be
>> >>> to
>> >>> handle this in a Stellar function.
>> >>>
>> >>> Jon
>> >>>
>>  On Fri, Dec 30, 2016, 13:27 Tyler Moore  
wrote:
>> 
>>  Happy Holidays Metron Devs!
>> 
>>  Could anyone lend me some guidance on customizing the storm 
topologies
>> >>> in
>>  metron? What I am am trying to accomplish:
>> 
>>  1) Add a method to the threat intel joiner bolt that sends an http
>> post
>>  with the score of the threat to a remote rest api. This will
>> >>> conditionally
>>  trigger notifications based on user settings in another database 

Re: Custom Storm Topologies

2017-01-02 Thread Carolyn Duby

Inserting a script inline is ok for low throughput and prototyping but once you 
get higher throughput (millions of events per second), it’s probably going to 
be a bottleneck.


For Metron-571 you might want to consider a java based extension plugin similar 
to Eclipse plugins.

Thanks
Carolyn

On 12/31/16, 5:22 PM, "Tyler Moore"  wrote:

>Thanks Jon,
>
>I'll look over the tutorial and put something together for the SHELL_EXEC
>stellar function.
>I don't believe I have permissions to assign in Jira if you want to assign
>to me my username is devopsec.
>I'll post back details and we can review security issues
>
>Regards,
>
>Tyler Moore
>Software Engineer
>Phone: 248-909-2769
>Email: moore.ty...@goflyball.com
>
>
>On Sat, Dec 31, 2016 at 9:46 AM, zeo...@gmail.com  wrote:
>
>> Casey did a tutorial on how to add your own Stellar function here
>>  - there is not an existing
>> function that does this (current functions are listed here
>> > metron-platform/metron-common#stellar-core-functions>).
>> I noticed that some of the Stellar function documentation was a bit dated
>> so I've opened a PR to update it here
>> .
>>
>> As this is something I need as well, I'd be happy to assist you where I
>> can.  Perhaps you want to self-assign METRON-571
>> ?  I do have some
>> security concerns with a SHELL_EXEC function because it could result in RCE
>> - if that's the route you go I could probably help with a thorough secure
>> code review.
>>
>> Jon
>>
>> On Fri, Dec 30, 2016 at 10:43 PM Tyler Moore  wrote:
>>
>> Thank you everyone for your suggestions,
>>
>> I believe that kicking off the function via stellar would be the optimal
>> solution. If anyone has an example of calling external code via stellar
>> that would be very helpful. Thanks!
>>
>> Regards,
>>
>> Tyler Moore
>> IT Specialist
>> tyler.math...@yahoo.com
>> 248-909-2769 <(248)%20909-2769>
>>
>> > On Dec 30, 2016, at 17:54, Otto Fowler  wrote:
>> >
>> > They are all extension points.
>> >
>> >> On December 30, 2016 at 16:34:58, zeo...@gmail.com (zeo...@gmail.com)
>> wrote:
>> >>
>> >> Right but unless I'm missing something, both of those options are more
>> >> rigid and the MaaS service would have an unnecessary delay as opposed to
>> >> doing it entirely in Stellar.  Unless there's a reason to do otherwise
>> that
>> >> I'm missing, I would think doing this in Stellar gives you a more timely
>> >> and (re)configurable end result.
>> >>
>> >> Jon
>> >>
>> >>> On Fri, Dec 30, 2016, 16:22 Otto Fowler 
>> wrote:
>> >>>
>> >>> I think there are a couple of things you can do here.  There way to get
>> >>> something else into the split is to have another adapter to split to,
>> which
>> >>> is what I think you mean.  You can also integrate with MaaS and create
>> a
>> >>> service that you can call via STELLAR.
>> >>>
>> >>>
>> >>>
>> >>> On December 30, 2016 at 15:08:48, Otto Fowler (ottobackwa...@gmail.com
>> )
>> >>> wrote:
>> >>>
>> >>> Or a Maas service?
>> >>>
>> >>>
>> >>> On December 30, 2016 at 13:52:06, zeo...@gmail.com (zeo...@gmail.com)
>> >>> wrote:
>> >>>
>> >>> Depending on the details it sounds like a much simpler solution would
>> be
>> >>> to
>> >>> handle this in a Stellar function.
>> >>>
>> >>> Jon
>> >>>
>>  On Fri, Dec 30, 2016, 13:27 Tyler Moore  wrote:
>> 
>>  Happy Holidays Metron Devs!
>> 
>>  Could anyone lend me some guidance on customizing the storm topologies
>> >>> in
>>  metron? What I am am trying to accomplish:
>> 
>>  1) Add a method to the threat intel joiner bolt that sends an http
>> post
>>  with the score of the threat to a remote rest api. This will
>> >>> conditionally
>>  trigger notifications based on user settings in another database (the
>>  backend processing logic is on another platform).
>>  The score should be available within the JSONObject but I am not an
>> >>> expert
>>  with storm and I am not completely understanding what conditions
>> >>> constitute
>>  when the threat feed is considered an "alert" in metron. Please
>> clarify.
>> 
>>  2) How would I add an external dependency, my http rest java class, to
>> >>> the
>>  metron maven build process? More specifically, if I was adding a
>> custom
>>  class that needed accessed by a bolt in storm, how would I add this in
>>  maven as a dependency. I have limited experience with maven but, my
>>  understanding is that I would add it to the pom.xml ​and recompile.
>>  Although, the metron quick dev platform is built on a vm, would I need
>> >>> to
>>  account for this? Please advise.
>> 
>>  ​Regards,​
>> 
>>  Tyler Moore

Re: Long-term storage for enriched data

2017-01-02 Thread Carolyn Duby
Avro is a format that contains both the data and the schema.  Here is a quick 
summary:

https://avro.apache.org/docs/current/


Thanks
Carolyn



On 1/1/17, 8:41 PM, "Matt Foley"  wrote:

>I’m not an expert on these things, but my understanding is that Avro and ORC 
>serve many of the same needs.  The biggest difference is that ORC is columnar, 
>and Avro isn’t.  Avro, ORC, and Parquet were compared in detail at last year’s 
>Hadoop Summit; the slideshare prezo is here: 
>http://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet
>
>It’s conclusion: “For complex tables with common strings, Avro with Snappy is 
>a good fit.  For other tables [or when applications “just need a few columns” 
>of the tables], ORC with Zlib is a good fit.”  (The addition in square 
>brackets incorporates a quote from another part of the prezo.)  But do look at 
>the prezo please, it gives detailed benchmarks showing when each one is better.
>
>--Matt
>
>On 1/1/17, 5:18 AM, "zeo...@gmail.com"  wrote:
>
>I don't recall a conversation on that product specifically, but I've
>definitely brought up the need to search HDFS from time to time.  Things
>like Spark SQL, Hive, Oozie have been discussed, but Avro is new to me I'll
>have to look into it.  Are you able to summarize it's benefits?
>
>Jon
>
>On Wed, Dec 28, 2016, 14:45 Kyle Richardson 
>wrote:
>
>> This thread got me thinking... there are likely a fair number of use 
> cases
>> for searching and analyzing the output stored in HDFS. Dima's use case is
>> certainly one. Has there been any discussion on the use of Avro to store
>> the output in HDFS? This would likely require an expansion of the current
>> json schema.
>>
>> -Kyle
>>
>> On Thu, Dec 22, 2016 at 5:53 PM, Casey Stella  wrote:
>>
>> > Oozie (or something like it) would appear to me to be the correct tool
>> > here.  You are likely moving files around and pinning up hive tables:
>> >
>> >- Moving the data written in HDFS from /apps/metron/enrichment/${
>> > sensor}
>> >to another directory in HDFS
>> >- Running a job in Hive or pig or spark to take the JSON blobs, map
>> them
>> >to rows and pin it up as an ORC table for downstream analytics
>> >
>> > NiFi is mostly about getting data in the cluster, not really for
>> scheduling
>> > large-scale batch ETL, I think.
>> >
>> > Casey
>> >
>> > On Thu, Dec 22, 2016 at 5:18 PM, Dima Kovalyov 
> 
>> > wrote:
>> >
>> > > Thank you for reply Carolyn,
>> > >
>> > > Currently for the test purposes we enrich flow with Geo and 
> ThreatIntel
>> > > malware IP, but plan to expand this further.
>> > >
>> > > Our dev team is working on Oozie job to process this. So meanwhile I
>> > > wonder if I could use NiFi for this purpose (because we already using
>> it
>> > > for data ingest and stream).
>> > >
>> > > Could you elaborate why it may be overkill? The idea is to have
>> > > everything in one place instead of hacking into Metron libraries and
>> > code.
>> > >
>> > > - Dima
>> > >
>> > > On 12/22/2016 02:26 AM, Carolyn Duby wrote:
>> > > > Hi Dima -
>> > > >
>> > > > What type of analytics are you looking to do?  Is the normalized
>> format
>> > > not working?  You could use an oozie or spark job to create 
> derivative
>> > > tables.
>> > > >
>> > > > Nifi may be overkill for breaking up the kafka stream.  Spark
>> streaming
>> > > may be easier.
>> > > >
>> > > > Thanks
>> > > > Carolyn
>> > > >
>> > > >
>> > > >
>> > > > Sent from my Verizon, Samsung Galaxy smartphone
>> > > >
>> > > >
>> > > >  Original message 
>> > > > From: Dima Kovalyov 
>> > > > Date: 12/21/16 6:28 PM (GMT-05:00)
>> > > > To: dev@metron.incubator.apache.org
>> > > > Subject: Long-term storage for enriched data
>> > > >
>> > > > Hello,
>> > > >
>> > > > Currently we are researching fast and resources efficient way to 
> save
>> > > > enriched data in Hive for further Analytics.
>> > > >
>> > > > There are two scenarios that we consider:
>> > > > a) Use Ozzie Java job that uses Metron enrichment classes to
>> "manually"
>> > > > enrich each line of the source data that is picked up from the 
> source
>> > > > dir (the one that we have developed already and using). That is
>> > > > something that we developed on our own. Downside: custom code that
>> > built
>> > > > on top of Metron source code.
>> > > >
>> > > > b) Use NiFi to listen for indexing Kafka topic -> split stream by
>> > source
>> > > > type -> Put every source type in