Re: HCP in Cloud infrastructures such as AWS , GCP, AZURE

2018-10-22 Thread Carolyn Duby

Hive 3.0 works well with block stores.  You can either add it to your Metron 
cluster or spin up an ephemeral cluster with Cloudbreak:

1. Metron streams into HDFS in JSON.
2. Compact daily with Spark into ORC format and store in block store (S3, ADLS, 
etc).
3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using LLAP.
4. If querying externally from block store is too slow, try adding more LLAP 
cache or load data into HDFS prior to analysis.

If you are using the Metron Alerts UI, you will need solr which works well only 
on fast disk.   To keep costs down, reduce the context stored in Solr using the 
following techniques:
1. Only index the fields you might search on.
2. Reduce the formats you store in Solr to only those you will want to see in 
the Alerts UI.
3. Reduce the length of time you store data in Solr.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 10/19/18, 7:18 AM, "deepak kumar"  wrote:

>Hi All
>I have a quick question around HCP deployments in cloud infra such as AWS.
>I am planning to run persistent cluster for all event streaming and
>processing.
>And then run transient cluster such as AWS EMR to run batch loads on the
>data ingested from persistent cluster.
>Have anyone tried this model ?
>Since data volume is going to be humongous ,cloud is charging lot of money
>for data io and storage.
>Keeping this in mind , what could be the best cloud deployment of hcp
>components assuming there is going to be ingest rate of 10TB per day .
>
>Thanks in advance.
>
>
>Regards,
>Deepak


Re: Investigator UI meta-alerts

2018-07-03 Thread Carolyn Duby
Hi Oliver

I still saw Meta alerts even when I was filtering for alerts = true but I am 
using an earlier version.

You may want to try filtering by score instead.  A meta-alert should have a 
non-zero score if it includes alerts.

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 7/2/18, 5:13 AM, "Oliver Fletcher"  wrote:

>Hi Guys,
>
>
>I have a quick question regarding the usability of meta-alerts within the 
>investigator UI. We have a high(ish) volume log source (firewall logs, with 
>accept packets being logged). Threat intelligence feeds will match connections 
>to rouge IP addresses and the investigator UI is showing hits with a threat 
>score as advertised.
>
>
>The issue I'm experiencing is that I have to place a filter 'is_alert:true' 
>within the search bar, otherwise I'll pull in millions of non-interesting 
>events. This view gives me a powerful threat score alert feed, however, when I 
>merge together a group of alerts into a meta-alert, it will not appear in this 
>filtered search any more (because I've specified 'is_alert:true'). If I remove 
>this filter I'll have to trundle through a few billion events to find the 
>meta-alert! It's effectively disappeared into the ether.
>
>
>Have I implemented this abnormally? It seems that the investigator UI could do 
>with an implicit is_alert:true filter? Then allowing meta-grouped alerts to 
>float into this implicit search base?
>
>
>Cheers,
>
>Oliver Fletcher
>
>?
>
>--
>G-RESEARCH believes the information provided herein is reliable. While every 
>care has been taken to ensure accuracy, the information is furnished to the 
>recipients with no warranty as to the completeness and accuracy of its 
>contents and on condition that any errors or omissions shall not be made the 
>basis of any claim, demand or cause of action.
>The information in this email is intended only for the named recipient.  If 
>you are not the intended recipient please notify us immediately and do not 
>copy, distribute or take action based on this e-mail.
>All messages sent to and from this e-mail address will be logged by G-RESEARCH 
>and are subject to archival storage, monitoring, review and disclosure.
>G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington 
>House, 19-30 Alfred Place, London WC1E 7EA.
>Trenchant Limited is a company registered in England with company number 
>08127121.
>--


Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

2018-06-27 Thread Carolyn Duby
Another reason for the original string is that you may not want to extract all 
components of the original event into JSON.  If you look at Windows events you 
will want to have the original event but you will not want to extract 
everything because they are very verbose.   

You should have a choice on the sensor type whether you want to include the 
original string in the index not.

Thanks  

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/25/18, 8:02 PM, "Simon Elliston Ball"  wrote:

>The original string serves purposes well beyond debugging. Many users will
>need to be able to prove provenance to the raw logs in order to prove or
>prosecute an attack from an internal threat, or provide evidence to law
>enforcement or an external threat. As such, the original string is
>important.
>
>It also provides a valuable source for the free text search where parsing
>has not extracted all the necessary tokens for a hunt use case, so it can
>be a valuable field to have in Elastic or Solr for text rather than keyword
>indexing.
>
>That said, it may make sense to remove a heavy weight processing and
>storage field like this from the lucene store. We have been talking for a
>while about filtering some of the data out of the realtime index, and
>preserving full copies in the batch index, which could meet the forensic
>use cases above, and would make it a matter of user choice. That would
>probably be configured through indexing config to filter fields.
>
>Simon
>
>On 25 June 2018 at 23:43, Michel Sumbul  wrote:
>
>> Depending on the source of data, it might be interesting to bypass a step
>> that the user concider useless.
>> For example if you have a source of data that dont need profiling and you
>> want to have it ingested like the other source to allow the  SOC analyst to
>> use it in there analysis. To have everything at the same place.
>>
>> How can we bypass it for a specific sensor?
>>
>> 2018-06-25 23:38 GMT+01:00 James Sirota :
>>
>> > There is a way to wire the system to bypass enrichment and profiling, but
>> > you would then bypass a lot of key features of the system.  It would be
>> > unwise to do that.
>> >
>> > 25.06.2018, 15:13, "Michel Sumbul" :
>> > > Hi Casey,
>> > >
>> > > Thats make completely sense.
>> > > Short question, if there is no enrichment or no profiling, does the
>> > message
>> > > still pass through the enrichment/profiling topic?
>> > >
>> > > If yes, do you think its possible to imagine a way that for messages
>> that
>> > > doesn't need enrichment or profiling to skip the topic and to go
>> directly
>> > > to the next one? This is again to avoid in/out in kafka.
>> > >
>> > > Thanks for the explaination,
>> > > Michel
>> > >
>> > > 2018-06-23 3:58 GMT+01:00 Casey Stella :
>> > >
>> > >>  Hey Michel,
>> > >>
>> > >>  Those are good questions and there were some reasons surrounding
>> that.
>> > In
>> > >>  fact, historically, we had fewer topologies (e.g. indexing and
>> > enrichment
>> > >>  were merged). Even earlier on, we had just one giant topology per
>> > parser
>> > >>  that enriched and indexed. The long story short is that we moved this
>> > way
>> > >>  because we saw how people were using metron and we gained more
>> insight
>> > >>  tuning Metron. That led us down this architectural path.
>> > >>
>> > >>  Some of the reasons that we went this way:
>> > >>
>> > >> - Fewer large topologies were a nightmare to tune
>> > >>- Enrichment would have different memory requirements than,
>> say,
>> > >>parsers or indexing
>> > >>- You can adjust the kafka topic params per topology to adjust
>> > the
>> > >>number of partitions, etc.
>> > >> - Having the separate topologies gives a natural set of extension
>> > points
>> > >> for customization and enhancement (e.g. you want a phase between
>> > parsing
>> > >> and enrichment).
>> > >> - Decoupling the topologies lets us

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Carolyn Duby
Agreed….Streaming enrichments is the right solution for DNS data.

Do we have a web service for writing enrichments?

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/13/18, 6:25 AM, "Charles Joynt"  wrote:

>Regarding why I didn't choose to load data with the flatfile loader script...
>
>I want to be able to SEND enrichment data to Metron rather than have to set up 
>cron jobs to PULL data. At the moment I'm trying to prove that the process 
>works with a simple data source. In the future we will want enrichment data in 
>Metron that comes from systems (e.g. HR databases) that I won't have access 
>to, hence will need someone to be able to send us the data.
>
>> Carolyn: just call the flat file loader from a script processor...
>
>I didn't believe that would work in my environment. I'm pretty sure the script 
>has dependencies on various Metron JARs, not least for the row id hashing 
>algorithm. I suppose this would require at least a partial install of Metron 
>alongside NiFi, and would introduce additional work on the NiFi cluster for 
>any Metron upgrade. In some (enterprise) environments there might be 
>separation of ownership between NiFi and Metron.
>
>I also prefer not to have a Java app calling a bash script which calls a new 
>java process, with logs or error output that might just get swallowed up 
>invisibly. Somewhere down the line this could hold up effective 
>troubleshooting.
>
>> Simon: I have actually written a stellar processor, which applies stellar to 
>> all FlowFile attributes...
>
>Gulp.
>
>> Simon: what didn't you like about the flatfile loader script?
>
>The flatfile loader script has worked fine for me when prepping enrichment 
>data in test systems, however it was a bit of a chore to get the JSON 
>configuration files set up, especially for "wide" data sources that may have 
>15-20 fields, e.g. Active Directory.
>
>More broadly speaking, I want to embrace the streaming data paradigm and tried 
>to avoid batch jobs. With the DNS example, you might imagine a future where 
>the enrichment data is streamed based on DHCP registrations, DNS update 
>events, etc. In principle this could reduce the window of time where we might 
>enrich a data source with out-of-date data.
>
>Charlie
>
>-Original Message-
>From: Carolyn Duby [mailto:cd...@hortonworks.com] 
>Sent: 12 June 2018 20:33
>To: dev@metron.apache.org
>Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
>I like the streaming enrichment solutions but it depends on how you are 
>getting the data in.  If you get the data in a csv file just call the flat 
>file loader from a script processor.  No special Nifi required.
>
>If the enrichments don’t arrive in bulk, the streaming solution is better.
>
>Thanks
>Carolyn Duby
>Solutions Engineer, Northeast
>cd...@hortonworks.com
>+1.508.965.0584
>
>Join my team!
>Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions 
>Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try 
>https://community.hortonworks.com 
><https://community.hortonworks.com/answers/index.html>
>
>
>On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:
>
>>Good solution. The streaming enrichment writer makes a lot of sense for 
>>this, especially if you're not using huge enrichment sources that need 
>>the batch based loaders.
>>
>>As it happens I have written most of a NiFi processor to handle this 
>>use case directly - both non-record and Record based, especially for Otto :).
>>The one thing we need to figure out now is where to host that, and how 
>>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>>putting the code in my github at least in the next few days, while we 
>>figure out a more permanent home.
>>
>>Charlie, out of curiosity, what didn't you like about the flatfile 
>>loader script?
>>
>>Simon
>>
>>On 12 June 2018 at 18:00, Charles Joynt 
>>wrote:
>>
>>> Thanks for the responses. I appreciate the willingness to look at 
>>> creating a NiFi processer. That would be great!
>>>
>>> Just to follow up on this (after a week looking after the "ops" side 
>>> of
>>> dev-ops): I really don't want to have to use the flatfile loader 
>>> script, and I'm not going to be a

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Carolyn Duby
I like the streaming enrichment solutions but it depends on how you are getting 
the data in.  If you get the data in a csv file just call the flat file loader 
from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need the
>batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this use
>case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how to
>handle releases of a nifi-metron-bundle. I'll probably get round to putting
>the code in my github at least in the next few days, while we figure out a
>more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile loader
>script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt 
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at creating
>> a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side of
>> dev-ops): I really don't want to have to use the flatfile loader script,
>> and I'm not going to be able to write a Metron-style HBase key generator
>> any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi
>> 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>>
>> 4. Create a new "dns" sensor in Metron
>> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
>> settings to push this into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>> "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>> "sensorTopic": "dns",
>> "parserConfig": {
>> "shew.table": " dns",
>> "shew.cf": "dns",
>> "shew.keyColumns": "name",
>> "shew.enrichmentType": "dns",
>> "columns": {
>> "name": 0,
>> "type": 1,
>> "data": 2
>> }
>> },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which looks
>> more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -Original Message-
>> From: Casey Stella [mailto:ceste...@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We
>> construct the key very specifically in Metron, so it's unlikely to work out
>> of the box with the NiFi processor unfortunately.  The key that we use is
>> formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>- type - The enrichment type
>>- indicator - the indicator to use
>>- hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that allows
>> the keys to be uniformly distributed among the regions and prevents
>> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
>> construct the key and pass it in, then you

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-01 Thread Carolyn Duby
Hi Charles - 

I think your best bet is to create a csv file and use the flatfile_loader.sh  
This will be easier and you won’t have to worry if the format of Hbase storage 
changes:

https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#loading-utilities


The flat file loader is located here:

https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/scripts/flatfile_loader.sh


Here is an example of an enrichment that maps a userid to a user category.

Here is the csv mapping the userid to a category.  For example tsausner has 
user category BAD_GUY.

[centos@metron-demo-4 rangeraudit]$ cat user_enrichment.csv 
tsausner,BAD_GUY 
ndhanase,CONTRACTOR 
svelagap,ADMIN 
jprivite,EMPLOYEE 
nolan,EMPLOYEE

Create an extractor config file that maps the columns of the csv file to 
enrichments.  The indicator_column is the key for the enrichment.   


[centos@metron-demo-4 rangeraudit]$ cat user_extraction.json 
{
  "config" : {
"columns" : {
 "user_id" : 0
,"user_category" : 1 
}
,"indicator_column" : "user_id"
,"type" : "user_categorization"
,"separator" : ","
  }
  ,"extractor" : "CSV"
}

This is an optional step where you can specify where to use the enrichments in 
Metron, when you import the enrichment data.  You can skip this step if the 
enrichments are already configured or you can add them later.
This config file applies the user_categorization enrichment using the reqUser 
field as the key.  
 
[centos@metron-demo-4 rangeraudit]$ cat 
rangeradmin_user_category_enrichment.json 
{
  "zkQuorum": 
"metron-demo-2.field.hortonworks.com:2181,metron-demo-0.field.hortonworks.com:2181,metron-demo-1.field.hortonworks.com:2181",
  "sensorToFieldList": {
"rangeradmin": {
  "type": "ENRICHMENT",
  "fieldToEnrichmentTypes": {
"reqUser": [
  "user_categorization"
]
  }
}
  }
}

The command below imports the enrichment mappings into Hbase and adds the 
enrichment to the rangeradmin sensor data.   The result is that when a ranger 
admin event is enriched, metron will use the reqUser field value as a key into 
the user_categorization enrichment.  If the value of the field is present in 
the CSV data the enriched event will have a new field indicating the user 
category:

[centos@metron-demo-4 rangeraudit]$ 
/usr/hcp/1.4.0.0-38/metron/bin/flatfile_loader.sh -e user_extraction.json -t 
enrichment -i user_enrichment.csv -c t -n 
rangeradmin_user_category_enrichment.json


Base will look similar to this:

hbase(main):002:0> scan 'enrichment'
ROW  COLUMN+CELL
   
 \x01\x12\x8Bjx@d.\xF3\xBF\xD3\xB2\x column=t:v, timestamp=1518118740456, 
value={"user_category":"BAD_GUY ","user_id":"tsausner"}  
 81\xEB\xB5\xD2\x00\x13user_categori
   
 zation\x00\x08tsausner 
   
 /\xA8\xEB\xB1\xE0N\xBE\xCBv?\xCAz9\ column=t:v, timestamp=1518118740540, 
value={"user_category":"ADMIN ","user_id":"svelagap"}
 xF6;\xD3\x00\x13user_categorization
   
 \x00\x08svelagap   
   
 l\xF1F\x83t\xD6x\xF9\xBEwrk3\x00M2\ column=t:v, timestamp=1518118740522, 
value={"user_category":"CONTRACTOR ","user_id":"ndhanase"}   
 x00\x13user_categorization\x00\x08n
   
 dhanase  



After the enrichment data is in Hbase, create an event and add it to the 
rangeradmin topic.  For example if the reqUser field is set to nnolan, the 
enriched event will have the following fields:

enrichments:hbaseEnrichment:reqUser:user_categorization:user_category
EMPLOYEE

enrichments:hbaseEnrichment:reqUser:user_categorization:user_id
nnolan



Thanks

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/1/18, 6:26 AM, "Cha

Suricata parser

2017-09-25 Thread Carolyn Duby

Is anyone working on a Suricata parser?  

https://suricata-ids.org/


I was not able to find an enhancement request for it.

Thanks
Carolyn


Re: So we graduated...

2017-04-20 Thread Carolyn Duby
Nice!  That's great news!



Sent from my Verizon, Samsung Galaxy smartphone


 Original message 
From: David Lyle 
Date: 4/20/17 5:16 PM (GMT-05:00)
To: dev@metron.apache.org
Subject: Re: So we graduated...

Outstanding! Great work everyone. Building a TLP worthy community is
difficult and worthy work, congratulations all!

-D...

On Thu, Apr 20, 2017 at 5:12 PM, Casey Stella  wrote:

> For anyone paying attention to incubator-general, it will come as no
> surprise that we graduated as of last night's board meeting.  We have a
> press released queued up and planned for monday along with a PR (METRON-687
> at https://github.com/apache/incubator-metron/pull/539).
>
> It escaped my notice that the graduation was talked about on
> incubator-general; otherwise I'd have sent this email earlier and been less
> cagey in 687's description.  Even so, I'd like to ask that everyone keep it
> to themselves until monday morning after the press release gets out the
> door.  I know the cat is out of the bag, but it'd be nice to have a bit of
> an embargo.
>
> Thanks!
>
> Casey
>